Commander Vee is the result of a two-day sprint to pull over 850 blog posts from an old HTML blog into a shiny new WordPress one. It may not be the easiest, most elegant set of scripts ever. But it did save us 45 hours of manual content entry, so there's that!
Whatever you want (no, seriously). Commander Vee is not a polished plug-and-play script, but that doesn't mean you can't use it to inspire a polished plug-and-play script!
So glad you asked.
In our case, we needed to pull all of these blog posts off of a site with cross-domain security enabled. While it was a big site, it wasn't big big, so we pulled down a local copy of all the posts we needed using
wget -i urls_to_dl.txt
The following assumes that you have a folder on your localhost containing all of the *.html
files you wish to scrape.
- Point line 11 of
./crawler.php
to the folder containing all the*.html
files. - Open
./crawler.php
in your browser and let the script run. - Remove the trailing comma and manually add wrapper curly braces to
json.txt
. - Enjoy your JSONified site... Or keep going!
The following assumes that you have files containing the JSON produced by ./crawler.js
loaded on your localhost at some address.
- Place this file in the base folder of your WordPress theme.
- Modify the 'Load up the JSON' section (line 11) to point at your JSON files. Note that each post category requires a separate JSON file.
- Modify the calls to
importPages()
on line 82 to match the code you tweaked on step 2. Additionally, provide the category IDs you wish to associate with the posts from each file. - Ensure your WordPress install contains categories that match the IDs you provided in step 3.
- Run this file (we used Chrome, but experiment away!).
- Verify that the file ran without PHP errors, and then check out your newly imported posts!
To tweak this file to your specific needs, modify the extract<thing_to_extract>()
functions in crawler.js
to select the necessary elements.
In order to add fields:
- Add an
extract<new_field_name>()
method that returns the information you need. - Add a call to your new function to
processData()
. - Add a default value for your field to the initial page object (line 211) in the main code block.
- Add any new logic to
./custom-post-importer.php
to get your field into WordPress.
Follow @paper_leaf on Twitter.
© 2016 Paper Leaf Design
License: GNU General Public License - Version 3
Since this little collection of code saved me from doing a full-time week of copy and paste, I named it after the keyboard shortcut for 'paste'!
Why not command- ... see? cee? sea?
That's why.