After Twittering for a few months, I started to feel uncomfortable about not owning my data, and wanted an automated way to store a copy of each Tweet for posterity. Another installation of WordPress would be perfect as a Twitter backup repository (alternatively, you could copy all of your tweets to a dedicated category within your main WP installation, but I chose to do it in a separate install, since I wasn’t looking for integration with my main blog.
There were really two problems to solve:
1) Have new Tweets automatically hoovered into the WP backing store.
2) Get all of my older Tweets ported into the system as well.
Here’s the resulting site. It’s not really intended for public viewing – I don’t care if people browse it, but it’s really just a backup system in the form of a WordPress site.
Part 1 is pretty easy; Part 2 was more complicated. Here are recipes for both procedures.
Store new Tweets in a WordPress blog
Obviously, install a fresh copy of WordPress.
Next, install the TwitterTools plugin. Set it to “Create a blog post from each of your tweets” and not much else. By default, it will check for new Tweets every 15 minutes.
Cake – That’s all there is to it! Set it and forget it.
Get all of your old Tweets into WordPress
This part is quite a bit more complex, but you should be able to get through it in 15-30 minutes. The challenge here is that Twitter does not offer an export function, and the API only lets you grab the most recent 20 or so entries. There is, however, a web-based tool called TweetDumpr that gets around the API limit, presumably by surfing Twitter’s “Older” links and scraping the content.
Obtain a dump of your entire Twitter timeline from TweetDumpr. It will arrive in CSV format. Unfortunately, TweetDumpr won’t give you your entire timeline if you’ve been Twittering for a while – it can hoover out more tweets than the official API will give you access to, but will only go back in time as far as Twitter’s “Older” option allows. So unfortunately I’ve got a big gap in my repository. You may be luckier than I was.
Unfortunately, you can’t import a CSV file directly into WordPress. You’re going to need to convert the exported data to XML, then massage and tweak it to match the XML schema of a native WordPress XML export/import file.
Open the CSV file in Excel and pull down File | Save as. Choose the format “Excel 2004 XML Spreadsheet” and save with a different name.
Now you need a valid WordPress XML export file to compare to. The idea is that you’re going to open the two files side by side in a text editor and, with a bit of search/replace-fu, make the Excel XML export assume the same shape and form of a WordPress XML file. Go to your WordPress back-end and click Manage | Export to download a sample export file.
Open the two files in a decent programmer’s text editor (I like TextMate) – any editor that can handle regular expressions will do. The goal is to give the Excel-generated XML file all of the critical elements of a WordPress XML file. Be careful when editing the XML — if you fail to close a container properly, the process will break.
Here’s the basic editing recipe – your mileage may vary. Note that we don’t need to re-create ALL fields from the example file – we just need to make sure the data we actually want to use is in the right format. Our resulting file is going to be a lot simpler looking than the WP sample file you downloaded.
- Delete the top section of the XML – everything from the 1st line down to the
- From the WP export file, copy everything from the 1st line down to the first
channel line. Paste that into the top of the Excel XML file.
- Delete the
Table line from the Excel XML file.
- Go to the bottom of the Excel XML file and delete everything from the
Worksheet line to the end of the file. Now paste the last two lines of the WP export file at the end of the Excel file.
- Now for search/replace-fu. In the Excel XML file, perform these replacements. Note that WP has a concept of post titles, while Twitter does not, which is why we cram “Untitled” into each title field. We’re also assuming that none of your tweets start with the string “2008-”.
<Data ss:Type="DateString">2008-(if you have any posts from 2007, repeat this step for 2007).
[replace with nothing]
Now for the fancy regex. We need to replace the capital T in the middle of the datestamps with a space. We only want to match a T that’s surrounded by a digit on either side – otherwise we’ll match all capital Ts in your Twitter history. Be sure you’re doing a regular expression search for this step (look for a regex checkbox in your search/replace dialog):
Your modified file should now be ready to import into WordPress. In WP, go to Manage | Import and select the WordPress import type. Navigate to your modified file and give it a shot.