Get Your Twitter Timeline into WordPress
After Twittering for a few months, I started to feel uncomfortable about not owning my data, and wanted an automated way to store a copy of each Tweet for posterity. Another installation of WordPress would be perfect as a Twitter backup repository (alternatively, you could copy all of your tweets to a dedicated category within your main WP installation, but I chose to do it in a separate install, since I wasn’t looking for integration with my main blog.
There were really two problems to solve:
1) Have new Tweets automatically hoovered into the WP backing store.
2) Get all of my older Tweets ported into the system as well.
Here’s the resulting site. It’s not really intended for public viewing – I don’t care if people browse it, but it’s really just a backup system in the form of a WordPress site.
Part 1 is pretty easy; Part 2 was more complicated. Here are recipes for both procedures.
Store new Tweets in a WordPress blog
Obviously, install a fresh copy of WordPress.
Next, install the TwitterTools plugin. Set it to “Create a blog post from each of your tweets” and not much else. By default, it will check for new Tweets every 15 minutes.
Cake – That’s all there is to it! Set it and forget it.
Get all of your old Tweets into WordPress
This part is quite a bit more complex, but you should be able to get through it in 15-30 minutes. The challenge here is that Twitter does not offer an export function, and the API only lets you grab the most recent 20 or so entries. There is, however, a web-based tool called TweetDumpr that gets around the API limit, presumably by surfing Twitter’s “Older” links and scraping the content.
Obtain a dump of your entire Twitter timeline from TweetDumpr. It will arrive in CSV format. Unfortunately, TweetDumpr won’t give you your entire timeline if you’ve been Twittering for a while – it can hoover out more tweets than the official API will give you access to, but will only go back in time as far as Twitter’s “Older” option allows. So unfortunately I’ve got a big gap in my repository. You may be luckier than I was.
Unfortunately, you can’t import a CSV file directly into WordPress. You’re going to need to convert the exported data to XML, then massage and tweak it to match the XML schema of a native WordPress XML export/import file.
Open the CSV file in Excel and pull down File | Save as. Choose the format “Excel 2004 XML Spreadsheet” and save with a different name.
Now you need a valid WordPress XML export file to compare to. The idea is that you’re going to open the two files side by side in a text editor and, with a bit of search/replace-fu, make the Excel XML export assume the same shape and form of a WordPress XML file. Go to your WordPress back-end and click Manage | Export to download a sample export file.
Open the two files in a decent programmer’s text editor (I like TextMate) – any editor that can handle regular expressions will do. The goal is to give the Excel-generated XML file all of the critical elements of a WordPress XML file. Be careful when editing the XML — if you fail to close a container properly, the process will break.
Here’s the basic editing recipe – your mileage may vary. Note that we don’t need to re-create ALL fields from the example file – we just need to make sure the data we actually want to use is in the right format. Our resulting file is going to be a lot simpler looking than the WP sample file you downloaded.
- Delete the top section of the XML – everything from the 1st line down to the Worksheet line.
- From the WP export file, copy everything from the 1st line down to the first channel line. Paste that into the top of the Excel XML file.
- Delete the Table line from the Excel XML file.
- Go to the bottom of the Excel XML file and delete everything from the Worksheet line to the end of the file. Now paste the last two lines of the WP export file at the end of the Excel file.
- Now for search/replace-fu. In the Excel XML file, perform these replacements. Note that WP has a concept of post titles, while Twitter does not, which is why we cram “Untitled” into each title field. We’re also assuming that none of your tweets start with the string “2008-”.
- Replace
<Row>with<item><title>Untitled</title><wp:status>publish</wp:status> - Replace
</Row>with</item> - Replace
<Data ss:Type="String">2008-with<Data ss:Type="DateString">2008-(if you have any posts from 2007, repeat this step for 2007). - Replace
+00:00</Data></Cell>with+00:00</DateData></Cell> - Replace
<Cell><Data ss:Type="String">with<content:encoded><![CDATA[ - Replace
</Data></Cell>with]]></content:encoded> - Replace
<Cell><Data ss:Type="DateString">with<wp:post_date> - Replace
</DateData></Cell>with</wp:post_date> - Replace
+00:00with[replace with nothing]
Now for the fancy regex. We need to replace the capital T in the middle of the datestamps with a space. We only want to match a T that’s surrounded by a digit on either side – otherwise we’ll match all capital Ts in your Twitter history. Be sure you’re doing a regular expression search for this step (look for a regex checkbox in your search/replace dialog):
- Replace
(\d)T(\d)with$1 $2
Your modified file should now be ready to import into WordPress. In WP, go to Manage | Import and select the WordPress import type. Navigate to your modified file and give it a shot.
August 20th, 2008 at 8:02 pm
I host and backup MT, Gallery, Wiki content for my extended family because I want to know we have it as long as we want. But I often wonder should we all just hand it over to Google (Flickr, Blogger, et al) and being a mini data center. My Google Docs collection has grown meaningful enough that I want to back it all up… I guess with offline. And then my 3 years of gmail… imap it all off?
What is the breakdown of your twitter input? Ex. 40% home Mac, 30% work Mac, 10% iPhone, etc…
August 20th, 2008 at 8:03 pm
that would be.. “stop being a mini data center”
August 21st, 2008 at 12:23 am
Jeb, I’m with you. See also Web 2.0 Is Sharecropping.
Before I got the iPhone I’d say my Twitter breakdown was 75% home, 25% work (all via Twitterific). Now I’d say it’s close to an even split, though if really busy at work I’ll shut down Twitterific entirely, like I did today.
You?
September 8th, 2008 at 1:36 pm
i’d been thinking about this too, sort of. i wanted to funnel my twitter, flickr, and blog into one social stream on my site. i wrote a little script for that and you can check it out in action.
http://machine501.com/
September 9th, 2008 at 12:16 am
Nice aggregator, Robert!
October 8th, 2008 at 8:43 pm
Thanks for this post! When I use twitter tools though, it creates the blog post fine – but it populates the header and the body with the same content.
I don’t see a configuration option to remove the header or the body… but I see that your site is much more elegant! How’d you do it? :)
October 9th, 2008 at 12:03 am
Shripiya – Do you mean the title and the body? There is no concept of a title in a Twitter post. So you need to alter your blog template to simply not show the title – just show metadata for the post, as you see at birdhouse.org/tweets.
October 9th, 2008 at 5:19 pm
Well, the issue is that I am importing into my existing blog and it populates the tweet in both the title and the body – ick! So, I guess I need to figure out how to set up a separate template for just that category…?
October 9th, 2008 at 11:16 pm
Ah, an existing blog… I see the problem. I’d ask at the Twitter Tools site, see what they say.