You say you're self-sufficient, but you don't dig your own coal. -Robert Wyatt
 
June 6th, 2010

Building a Bucketlist Site with Django

Half a year ago, I got this crazy idea to build a site where people could log and record all the things they wanted to accomplish before they died. But more than just simple list-making, I wanted to make it easy for people to tell stories about their goals, and to add images and video. I wanted to let people “follow” other people’s lists, to receive email when their friends accomplished their goals, to start discussions about getting the most out of life. I wanted it to be a place where people could get inspired by the goals of others, and to easily make copies of those goals in their own bucketlists.

The result is bucketlist.org.

I had a pre-existing love affair with the Python-based Django framework – there was never a question of what platform to build on. But no matter how good the platform, the devil’s in the details.
(more…)

May 12th, 2010

Allowing Secure User Input with Django

Building a site that needs to accept formatted user input? There’s no way you’re going to let random users input any old HTML – you’d open the door to all kinds of cross-site-scripting attacks and other nastiness. Nor can you just filter out the tags you consider dangerous – that road is fraught with peril. The only solution is to white-list a small subset of tags and unceremoniously drop the rest.

There are two layers to the problem – how to support formatted text on the front-end, and how to process submitted text on the back-end.

For the front-end, some developers are drawn to the Markdown syntax – a supposedly user-friendly wiki-like syntax that can be re-rendered as safe HTML. But while Markdown may look friendly to developers, it doesn’t to normal users – trust me on this. Even for tech-savvy users, Markdown requires that you place syntax instructions on your site (inelegant). A better solution is to use a rich text editor for the web, like TinyMCE or WYMEditor.

Ever notice that you often see rich text editors in content management systems run by trusted users, but seldom on public-facing web pages? That’s because it’s tricky to do securely, and without giving users enough rope to hang themselves formatting-wise.

With a bit of configuration though, you can deploy public-facing rich textareas securely, allowing only the input of tags you specify. But you can’t stop there – all the user has to do is disable Javascript in the browser to bypass your rich text editor. You must process submitted text on the back-end with the same set of rules in your view logic.

(more…)

April 21st, 2010

Reading .rst doc files with sphinx

Quite a few re-usable Django apps and Python modules come with documentation in text files ending with a .rst extension. The formatting in them is odd, but they’re more-or-less readable.

To this day, I haven’t encountered a single package that explained why docs were formatted this way. I knew there had to be an explanation, but hadn’t gotten around to looking it up, and basically just waded through. Finally went looking for an answer today. Turns out .rst files use a simple markup syntax called restructured text and you can generate nicely formatted HTML (and other documentation formats) out of them if you have python’s sphinx module installed. For the benefit of future googlers, here’s how to get up and running quickly:

1
2
3
4
$ pip install sphinx
$ cd docs
$ mkdir out
$ sphinx-build . out

Now take a look in the “out” directory and you’ll find the same set of files as a collection of handsomely formatted HTML docs.

Sphinx goes pretty deep, and I’m looking forward to exploring it for future documentation projects. For now I’m just happy to have an alternative to squinting.

March 30th, 2010

Stuck Between Stations Redux

The little music writing project I run with some friends, Stuck Between Stations, is now officially three years old. Until yesterday, we were still running with the original design, left over from a time when narrow content columns were in vogue (usability studies still say 420px is the ideal content column width for maximum readability). Trouble is, we run a lot of embedded video on the site, and YouTube/Vimeo have increasingly been defaulting to much wider video dimensions since more and more people have high-resolution displays. Web developers started assuming a baseline pixel resolution of 1024 a few years ago.

But simply widening the old design wasn’t really an option, since it all hung off a photographic banner image that came with a WordPress theme, and so couldn’t be altered. Decided to chuck it all and start from scratch. Chose the Titan theme as a starting point and went from there. Dug up shots of old radio dials from Google Images and pulled a new banner together, keeping only the broadcast tower from the original design.

Was able to run a series of search/replace operations in the database to increase the size of all the embedded videos already on the site. Interesting to see how many different aspect ratios we had accrued without even trying. Also interesting to see how many of the videos had been “Removed due to violation of terms of service.” Seems like the big publishers have been digging deep in YouTube’s bowels to find and skewer copyright violations, even if they do provide free publicity.

Added a bunch of new features while I was working:

Pretty happy with the results, though the banner still feels a bit crude to me. We’re no Flavorwire, but without a few dozen more unpaid writers and some Sand Hill investment, this is about as good as it gets for a while. Would love a plug if you’ve got one to give!


March 16th, 2010

Is Canvas the End of Flash?

Loose notes from SXSW 2010 panel discussion Is Canvas the End of Flash?. This debate is really heating up as more browsers gain Canvas support and sentiment seems to be rapidly turning against Flash. But how feasible is it to consider the canvas element a real Flash replacement? Five panelists hashed it out, with excellent points on all sides. Very useful session.
(more…)

March 16th, 2010

Why Your Baby is Ugly – Effective Dashboard Design

Loose notes from SXSW 2010 session Why Your Baby is Ugly – Effective Dashboard Design, with Aaron Hursman of Hitachi Design. Though I’ve only ever worked on one dashboard system, I am interested in data visualization, and this was an excellent crossover session for both dataviz and information design concepts.
(more…)

March 16th, 2010

Prototyping Web Apps – Nobody Loves a Wireframe

Loose notes from SXSW 2010 session Prototyping Web Apps – Nobody Loves a Wireframe, with Darren Delaye and Michael Leggett of Google. I’m more of a back-end guy than a designer, but with an increasing interest in design considerations and usability. This became one of the most useful sessions of the conference for me.
(more…)

March 16th, 2010

RIP Content Management System

Loose notes from SXSW 2010 session RIP Content Management System by Drupal creator Dries Buytaert.

Unfortunately, the “R.I.P. part of the session title was never addressed, nor were any of Drupal’s core shortcomings or architectural annoyances. This was unfortunately just a 30-minute informercial for Drupal.

Would really have preferred to have heard Dries talk about plans to address Drupal’s deep archtitectural problems like lack of object orientation, lack of an ORM, lack of MVC, and annoying templating system. Took notes anyway.
(more…)

March 15th, 2010

Wow, That’s Cool… Fun With HTML5 Video

Loose notes from the SXSW 2010 session Wow, That’s Cool… Fun With HTML5 Video, with Michael Dale of Wikimedia and Christopher Blizzard of Mozilla.
(more…)

March 15th, 2010

HTML5: Tales from the Development Trenches

Loose notes from the SXSW 2010 session HTML5: Tales from the Development Trenches, in two parts (history lesson and examples). With Bruce Lawson of Opera and Martin Kliehm of namics.
(more…)

March 14th, 2010

Coding for Pleasure: Developing Killer Spare-Time Apps

Loose notes from the SXSW 2010 session Coding for Pleasure: Developing Killer Spare-Time Apps, hosted by :

Gina Trapani of Lifehacker and now author of Google Wave book. Also made BetterGmail and ThinkTank;
Matt Haughey – Fuelly – public social miles per gallon site, also creator of MetaFilter (now a 4-employee corporation); Adam Pash – MixTape.me (playlist/music sharing site). Also Belvedere and Texter.
(more…)

March 14th, 2010

Server-Side Javascript

Loose notes from SXSW 2010 session Javascript: The Front and the Back of It, on using server-side Javascript to reduce the pain points of the few non-DRY areas left in MVC stacks.

(more…)

March 13th, 2010

Is WordPress Killing Web Design?

Loose notes from SXSW 2010 session: Is WordPress Killing Web Design

Good question – I’ve been asking myself this lately. Unfortunately the session quickly devolved into a lot of platitudes and stating of the obvious. Yes, design has been commoditized and is no longer an “elite” activity. Yes, your site is as creative as you make it, it has nothing to do with the CMS you use. All pretty much goes without saying. Took notes for half an hour, then headed to the HTML5 discussion… which was full and not allowing more people in.
(more…)

March 13th, 2010

Web Fonts: The Time Has Come

Loose notes from SXSW 2010 session Web Fonts: The Time Has Come

(more…)

November 17th, 2009

What a Traffic Spike Looks Like

This blog has been chugging along at around 300 visitors per day for the past few months (it was much better back before Twitter nearly keeled my urge to blog at all). But the recent Drupal or Django article went a little viral, and things have been nutty over the past 48 hours:

Spike

WP-SuperCache has held up admirably, scarcely a performance blip felt on the Birdhouse VPS.

November 11th, 2009

Drupal or Django? A Guide for Decision Makers

Target Audience

drupliconThere’s a large body of technical information out there about content management systems and frameworks, but not much written specifically for decision-makers. Programmers will always have preferences, but it’s the product managers and supervisors of the world who often make the final decision about what platform on which to deploy a sophisticated site. That’s tricky, because web platform decisions are more-or-less final — it’s very, very hard to change out the platform once the wheels are in motion. Meanwhile, the decision will ultimately be based on highly technical factors, while managers are often not highly technical people.

django-logo-negativeThis document aims to lay out what I see as being the pros and cons of two popular web publishing platforms: The PHP-based Drupal content management system (CMS) and the Python-based Django framework. It’s impossible to discuss systems like these in a non-technical way. However, I’ve tried to lay out the main points in straightforward language, with an eye toward helping supervisors make an informed choice.

This document could have covered any of the 600+ systems listed at cmsmatrix.org. We cover only Drupal and Django in this document because those systems are highest on the radar at our organization. It simply would not be possible to cover every system out there. In a sense, this document is as much about making a decision between using a framework or using a content management system as it is between specific platforms. In a sense, the discussion about Drupal and Django below can be seen as a stand-in for that larger discussion.

Disclosure: The author is a Django developer, not a Drupal developer. I’ve tried to provide as even-handed an assessment as possible, though bias may show through. I will update this document with additional information from the Drupal community as it becomes available.

(more…)

November 8th, 2009

django-treedata: DataSF Contest Winner

treewordle-150x150Recently I was invited to participate in the California Data Camp and DataSF App Contest hosted by California Watch and spot.us. The unconference would feature lots of discussion about making use of publicly available data sets to improve quality of life. The App Contest challenged developers to choose one of the many data sets available at DataSF.org and build something cool with it in a relatively short period of time.

Long story short — my contest entry, which explores San Francisco’s database of publicly maintained trees and plants, won the competition! Full details, and downloadable source code, available at my Scripts and Utilities site.

Thanks so much to David Cohn of Spot.us and all of the conference organizers and supporters. Thanks also to J-School webmaster for Chuck Harris for his contributions to the project. It was a great day, and winning the competition was a total surprise. Now I just need a city to take the source code and run with it.

spot.us has covered the event live throughout the day.

Huffington Post mentioned django-treedata in Sophisticated Tree Hugging: the Pure Joy of Public Data

November 5th, 2009

Birdhouse 960

960 Blog look different? At first glance, not by much, but I’ve just completed a massive cleanup of the back-end, replacing the old HTML/CSS with the 960 Grid System, starting with the 960bc (blank canvas) WordPress theme. While I was at it, took the opportunity to search/replace out a bunch of old non-semantic code buried in the posts, updated or replaced a lot of plugins, and killed off a few old features that had out-lived their usefulness.

The biggest news: After years of preaching the HTML validation gospel to students, I still hadn’t gotten around to trying to make my own platform validate… but the Foobar Blog finally does! Well, almost. There will always be 3rd party code outside your control that can’t be hammered into shape. The biggest offenders here are embedded Flickr slideshows and WordPress’ own embedded Gallery feature. Ugh. But aside from that, we’re pretty darn close to clean. Everything I can control validates at least.

The old design had accreted slowly over the years, from a patchwork of parts built and gathered. Original intention was to go for a clean break and adopt a modern 3rd-party theme, but the more I searched, the more I felt like I loved the “Cheap Thrills” design that’s evolved here (not available for download, sorry). So I decided to port Cheap Thrills to 960. It wasn’t all roses, since the divs in this theme hug each other so tightly, while 960 assumes margins everywhere. A lot of fiddling with negative margins, and I haven’t  solved the equal height divs problem quite yet. Will do soon.

New in this pimplementation:

  • Much wider content area. Goal is to be able to show full-width video and slideshows, plus code samples that don’t fold to the next line or stretch out of the content space.
  • Syntax highlighting for code samples (example)
  • Tag cloud (see sidebar) – I’ve been tagging random articles for a long time but didn’t want to display a cloud until there were enough of them to warrant it. Still haven’t gone through and tagged the entire site history, but the cloud is picking up steam.
  • General cleanup. Cruft removal. So. Much. Cruft.
  • Somewhat wider sidebar – more room for Image from Nowhere and Recent Comments. Some of the old Images from Nowhere look a bit stretched but future ones will be generated larger.
  • Replaced my old handmade RSS-based Twitter integration with Twitter for WordPress. Super clean – much better for DIY theme builders than the usual TwitterTools.
  • The old Democracy plugin for polling appears to have been abandoned. Replaced it with the much cleaner WP-Polls, which also meant manually copying all of the old Poll data into the new system (ugh!). See the Pollster section.
  • Replaced the old contact form  in the shacker contacter with the much simpler Contact Form 7.
  • Nips and tucks galore.

Process took way longer than expected of course – everything does – but these things had been gnawing away at me for a long time now. Feels great to have it all done. Haven’t done any cross-browser testing yet – let me know if something doesn’t look right for you.

Can’t say enough good things about 960 Grid. We’ve standardized on it at work, and it really does make life easier. Not without its warts, but much more pleasant than the YUI grid it replaces.

October 20th, 2009

Generating RSS Mashups from Django

I recently got to work on an interesting Django side project: the Bay News Network – a directory of Bay Area bloggers and hyperlocal news sites. The goal of the site was three-fold:

  1. To create a many-to-many directory of local sites that matched our editorial criteria
  2. To let site owners log in and edit their own listings
  3. To both consume and produce RSS feeds from the listed sites

The first two were pretty standard Django approaches – develop data models and editing interfaces using Django forms and re-usable apps like django-profiles and django-registration. The third goal turned out to be more interesting. We not only had to gather RSS feeds from more than 100 external sites several times per day, we needed to re-mix them (e.g. provide an integrated feed representing all blogs that cover Food, or all blogs that cover Oakland).

“Consuming” RSS feeds meant we needed to integrate feeds from the external sites into our own site. At the most basic level, this was pretty straightforward using Mark Pilgrim’s excellent Universal Feed Parser, which turns the real-world’s tag soup of disparate, incompatible RSS formats  into a reliable data format you can step through in your code or templates. This worked well enough until I realized that grabbing and parsing external feeds in real-time was just not going to scale, performance-wise. Plus, we still had the RSS mashups to build, and would clearly need to be storing feed entries in our own database in order to sort them by category, etc.

Thus began the hunt for good feed aggregation systems for Django. Most roads pointed to django-planet, planet planet, and FeedJack, which are systems for gathering content from external sites and importing it into a single aggregated site. These were close to what I wanted, but weren’t great on the re-usability side. Since I already had  existing models to define the sites, their owners, and their feeds, I didn’t want to rewrite all my models to work with another system’s conception of how things should be laid out. I also didn’t feel like plowing through their source code to chop out and rewrite just the bits I wanted. Eventually realized that I was looking for a few lines of code to work with my system, not a whole external system.

The surprising solution came from the Community section of the official Django project web site. The Django developers keep the code that drives djangoproject.com in subversion along with the source code to Django itself. And the code that drives that section of the site is really lightweight. So I did a subversion checkout of the Aggregator app, and found that all I really needed from it was its update_feeds.py script, which itself is a wrapper around Universal Feed Parser, tweaked to talk to my own models.

Two gotchas to be aware of:

  1. The app includes a bundled templatetags directory with a file called aggregator.py. But the name of the app itself is “aggregator.” I was getting strange import errors in various places before I discovered on the django-users mailing list that Django doesn’t like it when an app name matches a templatetag name. Easily fixed by renaming the templatetag.
  2. My first runs of update_feeds.py went fine, but later started erroring out with database integrity errors. The GUID field on the FeedItem model is set to unique=True, which prevents your database from storing any one FeedItem more than once. That’s great, but it was dishing up integrity errors for some reason. I fixed this by changing this line in update_feeds.py:
1
feed.feeditem_set.get(guid=guid)

to:

1
FeedItem.objects.get(guid=guid)

Once I was able to get the updater to run consistently without error, I needed to get it running via cron. The trick to running a Python script that talks to the Django ORM from a crontab is that you must supply the full Python paths in the environment to cron – it doesn’t pick them up automatically from the environment of the user that runs the cron job. This worked for me:

1
2
3
PYTHONPATH=/home/bnn/projects:/home/bnn/projects/bnn
DJANGO_SETTINGS_MODULE=bnn.settings
20 15 * * * python /home/bnn/projects/bnn/scripts/update_feeds.py 2>&1

Producing Feeds

With the harvesting system up and running, and all content coming into the datbase associated with blogs that were in turn categorized by “beat” and geographical area, outputting aggregated RSS feeds was a simple matter of using Django’s native syndication framework as documented. This went into urls.py:

1
2
3
4
5
6
7
8
feeds = {
    'all': AllFeeds,
    'cat': CategoryFeeds,
    'area': BeatFeeds,
}
 
# Feeds
url(r'^feeds/(?P.*)/$', 'django.contrib.syndication.views.feed', {'feed_dict': feeds}),

… and I created a file feedgenerator.py to contain the three corresponding classes and their querysets, using Holovaty’s sample code from chicagocrime.org as a starting point.

September 27th, 2009

Python-MySQL Connections and Snow Leopard

Apparently I’m not the only one having trouble getting MySQL and Python to play nice under OS X — last February’s post on getting the two to cooperate under OS X has generated a ton of traffic. Now I’ve upgraded to Snow Leopard and faced a handful of new challenges (but eventually got it working). Rather than scatter my notes, I’ve updated the original post with Snow Leopard instructions.

September 7th, 2009

Populate Mailman Lists from Django Projects

I spent much of the summer building an intranet in Django for Miles’ school. Since the school is a co-op, we need to keep track a lot of stuff – charges, credits, and obligations, parents, students, teachers, family jobs, committee membership, the board, etc. etc. I’m happy with how the site came out, but unfortunately can’t share it here, since it’s a private site.

One of the goals of the rebuild was to put an end to the laborious manual process of maintaining the school’s multiple overlapping mailing lists. Since all of those relationships, people types, and groups were already stored in the intranet’s database, I figured it should be possible to run various queries and populate Mailman mailing lists from them directly. Due to the messy nature of the real world, the process was a lot trickier than it sounds on paper, but I eventually did get a smoothly working list generation system up and running, talking to our Django system and working with virtually no manual intervention. Members can update their own profiles and find that their mailing list subscription address has changed automatically a few hours later. Administrators can give someone a new family job or board position and that person will find themselves subscribed to the right mailing list for it later that day.

Since there isn’t much published out there on making these two systems (Django and Mailman) play nicely together, I decided to publish the scripts and document the recipe I used to get it all working. Hope someone finds the system useful.

June 27th, 2009

django-profiles: The Missing Manual

The User model in Django is intentionally basic, defining only the username, first and last name, password and email address. It’s intended more for authentication than for handling user profiles. To create an extended user model you’ll need to define a custom class with a ForeignKey to User, then tell your project which model defines the Profile class. In your settings, use something like:

1
AUTH_PROFILE_MODULE = 'accounts.UserProfile'

To make it easier to let users create and edit their own Profile data, James Bennett (aka ubernostrum), who is the author of Practical Django Projects and the excellent b-list blog, created the reusable app django-profiles. It’s a companion to django-registration, which provides pluggable functionality to let users register and validate their own accounts on Django-based sites.
(more…)

May 20th, 2009

Webcasting with Django

The Knight Digital Media Center, which runs on Django, hosts week-long workshops for working journalists who come from around the country to learn multimedia and internet technology skills. We fill many of our lunch and dinner sessions with talks by journalism industry experts and pundits, and webcast their presentations live. After workshops are over, we post the archived video for posterity. There’s more to handling multi-day, multi-part live and archived video with Django and a genuine streaming server than meets the eye, so thought I’d break it down.

An “event” can last any number of days, and can include any number of presentations, each of which may or may not include a webcast. While the event is in progress, you want the ability to advertise a single URL, where all of the live webcasts will happen. But for the archives, which is where the vast majority of viewing happens over the course of time, you want a separate page/URL for each presentation. Presentation pages include details on that speaker, summaries of what was presented, and optional downloads of PowerPoint or Keynote presentations. Our Presentation model is foreign-keyed to a master Event model (or, in our case, the Workshop model).

Because they’re time-based, synchronous events, webcasts are different from typical web pages. There are five possible “states” a webcast page can be in at any given time, all of which require different things to be inserted into the view:

Upcoming: The event is announced but there’s nothing yet to show. Tell user that webcast will be live at posted time (along with schedule).

In progress: The event is occurring. Insert appropriate object code to embed live QuickTime stream.

Concluded: The live webcast has ended, but the archives haven’t yet been prepared and posted (this can take us a few days). Tell user to come back soon.

Archive: The archived video is prepared and available on the streaming server for posterity. Insert appropriate object code to display streamed archive file from QuickTime Streaming Server.

External: We sometimes host events at other locations on campus, in which case UC Berkeley handles the webcasting rather than us. If so, we need to link from our events database to theirs. Insert appropriate message and link.

In Django, we represent these choices with the typical CHOICES construct:

1
webcast_state = models.CharField(max_length=4,choices=WEBCAST_STATE_CHOICES)

… which ends up looking like this in the Django admin:

webcast_state

Depending on the current state, different content (text or object/embed code) is inserted into the page in real time (using simple conditionals in Django templates). The Django admin thus becomes a handy tool our student helpers can use to make the master workshop page embed the right thing in the right place at the right time without requiring tech skills. Remember, during the course of a workshop week, all video is happening in the master Workshop page – later, streaming video archives will go into separate Presentation pages and be automatically linked to from the parent Workshop page.

Stream Handles

At the J-School, we use QuickTime Streaming Server, in part because it’s free, and in part because all of our  workstations and most of our servers are Macs. We’ve contemplated switching to Flash streaming, but the simplicity of keeping everything Mac-native keeps us on QTSS for now.

Embedding a stream from an external QTSS server is not quite as straightforward as embedding a typical QuickTime movie. Video comes from QTSS over the rtsp:// protocol, rather than http://. And there’s the catch: You can’t embed an rtsp stream directly into a web page — instead, you need to embed a fake QuickTime movie (a “reference movie”), which is actually a text file with the .mov extension. That text file simply references the full URL of the rtsp stream coming from QTSS. The contents of a reference movie file might look like this:

1
2
3
<?xml version="1.0"?>
<?quicktime type="application/x-quicktime-media-link"?>
<embed src="rtsp://streamer.domain.edu/events/131.humanity_2.0.mov" />

Here’s where things get interesting as far as Django is concerned. We don’t want to have to create a physical reference movie for every single stream we serve. And yet, at the HTML level, we have to embed something that looks like a reference to a physically external movie file, e.g.:

1
2
3
4
5
6
7
8
9
10
<object classid="clsid:02BF25D5-8C17-4B23-BC80-D3488ABDDC6B" 
 width="480" height="376" 
  codebase="http://www.apple.com/qtactivex/qtplugin.cab">
 <param name="SRC" value="/presentations/webcast-archive.227.ref.mov">
 <param name="AUTOPLAY" value="true">
 <param name="CONTROLLER" value="true">
 <embed src="/presentations/webcast-archive.227.ref.mov" 
  width="480" height="376" autoplay="true" controller="true" 
  pluginspage="http://www.apple.com/quicktime/download/">
</object>

So how can we make Django think that /presentations/webcast-archive.227.ref.mov is an actual file on the server, which in turn contains the correct reference to the rtsp stream coming from the streaming server? In effect, it’s a “view within a view.”

webcast_setup
Click for larger version

Displaying the presentation page is straightforward Django – I won’t get into that here. But here’s how the “view within a view” stuff works. In the object section of the presentation page template there is a reference to:

1
<param name="SRC" value="/presentations/webcast-archive.{{object.id}}.ref.mov">

which resolves to something like:

1
<param name="SRC" value="/presentations/webcast-archive.267.ref.mov">

When the browser hits that line, it requests /presentations/webcast-archive.267.ref.mov from the server, which in turn triggers this entry in urls.py:

1
2
3
url(r'^presentations/webcast-archive.(?P<pres_id>\d+).ref.mov$',
'workshops.views.presentation_webcast_archive',
name='workshops_presentation_webcast_archive'),

So after the presentation page has been rendered by Django and sent to the browser, a second (very simple) view, presentation_webcast_archive, is called, which is simply:

1
2
3
4
5
6
7
8
9
10
11
12
13
def presentation_webcast_archive(request, pres_id):
    """
    Generate a virtual QuickTime reference movie on the fly,
    to be embedded in presentation webcast pages.
    """
 
    pres = get_object_or_404(Presentation,id=pres_id) 
 
    return render_to_response( 'workshops/presentation_webcast_archive.txt',
        {
            'p': pres,
        }, context_instance=RequestContext(request),
    )

That view spits out the same presentation object to a different template, presentation_webcast_archive.txt, which consists of:

1
2
3
<?xml version="1.0"?>
<?quicktime type="application/x-quicktime-media-link"?>
<embed src="rtsp://domain.edu/events/{{p.webcast_path}}/{{p.webcast_filename}}" />

Where webcast_path and webcast_filename are fields on the model representing the physical location of the QuickTime media on the streaming server (not the web server). After a workshop week is over, staff only need to hint the saved archive files, upload them to a directory and filename on the streaming server, enter those paths in the Django admin, and check the “Has Webcast” box. The rest is automatic.

In a previous, PHP-based version of this system, we had to prepare an actual reference movie for every archive stream we hosted. By using this “view within a view” technique, Django has let us remove that part of the workflow.

February 21st, 2009

Python-MySQL Connections on Mac OS

Update: This entry has been updated for Snow Leopard.

In all of Mac-dom, there are few experiences more painful than trying to get Python tools to talk to a MySQL database. Installing MySQL itself is easy enough – Sun provides a binary package installer. Python 2.5 comes with Mac OS X. If you enable Apache and PHP, your PHP scripts will talk to your installed MySQL databases just fine, since PHP comes bundled with a MySQL database connector. But try to get up and running with Django, TurboGears, or any other Python package where MySQL database access could be useful (or needed), and you’re in for a world of hurt.

Update: I finally did manage to get Python and MySQL playing nice together, but it took a few more contortions beyond what’s described in the recipes found scattered around the interwebs. I’ve added my solution at the end of this post.

(more…)

January 30th, 2009

Who Owns Your RSS?

In a case with far-reaching implications for the widespread practice of automated aggregation of headlines and ledes via RSS, GateHouse Media has, for the most part, won its case against the New York Times, who owns Boston.com, who in turn run a handful of community web sites. Those community sites were providing added value to their readers in the form of linked headlines, pointing to resources at community publications run by GateHouse. The practice of linked headline exchange is healthy for the web, useful for readers, and helpful for resource-starved community publications. However, for reasons that are still not clear (to me), GateHouse felt that the practice amounted to theft, even though the Boston.com sites were publishing the RSS feeds to begin with.

Trouble is, RSS feeds don’t come with Terms of Use. Is a publicly available feed meant purely for consumption by an individual, and not by other sites? After all, the web site you’re reading now is publicly available, but that doesn’t mean you’re free to reproduce it elsewhere. The common assumption is that a site wouldn’t publish an RSS feed if it didn’t want that feed to be re-used elsewhere. And that’s the assumption GateHouse is challenging.

Let’s be clear – this is not a scraping case (scraping is the process of writing tools to grab content from web pages automatically when an RSS feed is not available). Boston.com was simply utilizing the content GateHouse provided as a feed. I would agree that scraping is “theft-like” in a way that RSS is not, but that’s not relevant here.

In a weird footnote to all of this, GateHouse initially claimed that Boston.com was trying to work around technical measures they had put in place to prevent copying of their material. Those “technical measures” amounted to JavaScript in its web pages, but boston.com was of course not scraping the site — they were merely taking advantage of the RSS feeds freely provided by GateHouse. In other words, they were putting their “technical measures” in their web pages, not in their feed distribution mechanism, missing the point entirely.

GateHouse seems primarily concerned with the distinction between automated insertion of headlines and ledes (e.g. via RSS embeds) vs. the “human effort” required to quote a few grafs in a story body. Personally, I don’t see how the two are materially different, or how one method would affect GateHouse publications more negatively or positively than the other. If anything, now that GateHouse has gotten its way, they’re sure to receive less traffic.

The result is that Boston.com has been forced to stop using GateHouse RSS feeds to automatically populate community sites with local content. If cases like this hold sway, there will soon be a burden on every site interested in embedding external RSS feeds to find out whether it’s OK with each publisher first.

PlagiarismToday sums up the case:

It was a compromise settlement, as most are, but one can not help but feel that GateHouse just managed to bully one of the largest and most prestigious new organizations in the world.

Also:

The frustrating thing about settlements, such as this one, is that they do not become case law and have no bearing on future cases. If and when this kind of dispute arises again, we will be starting over from square one.

I’m trying to figure out who benefits from this decision… and I honestly can’t. GateHouse loses. Boston.com loses. Community web sites with limited resources lose. And readers lose. Something’s rotten in the state of Denmark.