Sane Password Strength Validation for Django with zxcvbn

While many admins and blog posts tell users that length is by far the most important factor in creating strong passwords/passphrases, the majority of password input fields are giving them a set of hide-bound rules: Eight characters, at least one upper- and one lowercase letter, some digits and punctuation marks, etc.

Even though it includes dictionary words, a passphrase like:

Sgt. Pepper's Mr. Kite

is far stronger than:


(there’s a world of difference between 22 characters and 9, from a cracking perspective). But many password input fields would reject the first one. No wonder users are confused by the process of creating strong passwords!
Continue reading


Django Unit Tests Against Unmanaged Databases

A Django project I’m working on defines two databases in its config: The standard/default internal db as well as a remote legacy read-only database belonging to my organization. Models for the read-only db were generated by inspectdb, and naturally have managed = False in their Meta class, which prevents Django from attempting any form of migration on them.

Unfortunately, that also prevents the Django test runner from trying to create a schema mirror of it during test runs. But what if you want to stub out some sample data from the read-only database into a fixture that can be loaded and accessed during unit tests? You’ll need to do the following:

  • Tell Django to create the second test database locally rather than on the remote host
  • Disable any routers you have that route queries for certain models through the remote db
  • Tell Django to override the Managed = False attribute in the Meta class during the test run

Putting that all together turned out to be a bit tricky, but it’s not bad once you understand how and why you need to take these steps. Because you’ll need to override a few settings during test runs only, it makes sense to create a separate to keep everything together:

from project.local_settings import *
from django.test.runner import DiscoverRunner

class UnManagedModelTestRunner(DiscoverRunner):
    Test runner that automatically makes all unmanaged models in your Django
    project managed for the duration of the test run.
    Many thanks to the Caktus Group:

    def setup_test_environment(self, *args, **kwargs):
        from django.db.models.loading import get_models
        self.unmanaged_models = [m for m in get_models() if not m._meta.managed]
        for m in self.unmanaged_models:
            m._meta.managed = True
        super(UnManagedModelTestRunner, self).setup_test_environment(*args, **kwargs)

    def teardown_test_environment(self, *args, **kwargs):
        super(UnManagedModelTestRunner, self).teardown_test_environment(*args, **kwargs)
        # reset unmanaged models
        for m in self.unmanaged_models:
            m._meta.managed = False

# Since we can't create a test db on the read-only host, and we
# want our test dbs created with postgres rather than the default, override
# some of the global db settings, only to be in effect when "test" is present
# in the command line arguments:

if 'test' in sys.argv or 'test_coverage' in sys.argv:  # Covers regular testing and django-coverage

    DATABASES['default']['ENGINE'] = 'django.db.backends.postgresql_psycopg2'
    DATABASES['default']['HOST'] = ''
    DATABASES['default']['USER'] = 'username'
    DATABASES['default']['PASSWORD'] = 'secret'

    DATABASES['tmi']['ENGINE'] = 'django.db.backends.postgresql_psycopg2'
    DATABASES['tmi']['HOST'] = ''
    DATABASES['tmi']['USER'] = 'username'
    DATABASES['tmi']['PASSWORD'] = 'secret'

# The custom routers we're using to route certain ORM queries
# to the remote host conflict with our overridden db settings.
# Set DATABASE_ROUTERS to an empty list to return to the defaults
# during the test run.


# Set Django's test runner to the custom class defined above
TEST_RUNNER = 'project.test_settings.UnManagedModelTestRunner'

With that in place, you can now run your tests with:

./ test --settings=project.test_settings

… leaving settings untouched during normal site operations. You can now serialize some data from your read-only host and load it as a fixture in your tests:

class DirappTests(TestCase):

    # Load test data into both dbs:
    fixtures = ['auth_group.json', 'sample_people.json']


    def test_stub_data(self):
        # Guarantees that our sample data is being loaded in the test suite
        person = Foo.objects.get(id=7000533)
        self.assertEqual(person.first_name, "Quillen")

Displaying Django User Messages with Angular.js

Django’s Messages framework is an elegant workhorse, and I’ve never built a Django site that didn’t use it for displaying success/failure/info messages to users after certain actions are taken (like logging in successfully or adding an item to a cart).

But wouldn’t it be cool if you could use that functionality client-side, delivering user messages to be processed as JSON data rather than statically outputting messages to generated HTML? On a recent project, I needed to do this because Varnish caching doesn’t let you mark page fragments as non-cacheable, so statically generated messages were not an option. But there are all sorts of reasons you might want to handle Django Messages client-side.


Here’s how to accomplish the job in a really lightweight way, without the need for a full-blown REST API app like Django Rest Framework or Tastypie, and with Angular.js (which is, IMO, the best of the current crop of JavaScript application frameworks).
Continue reading


django-allauth: Retrieve First/Last Names from FB, Twitter, Google

Of the several libraries/packages available for setting up social network logins for Django projects, I currently find django-allauth the most complete, with the best docs and the most active development. Doesn’t hurt that the lead dev on the project is super friendly and responsive on StackOverflow!

But not everything about it is intuitive. After wiring up Twitter, Facebook and Google as login providers, I found that first and last names were not being retrieved from the remote services when an account was successfully created. I also, frustratingly, could find only the most oblique references online to how to accomplish this.

There are a couple of ways to go about it – you can either receive and handle the allauth.account.signals.user_signed_up signal that allauth emits on success, or set up allauth.socialaccount.adapter.DefaultSocialAccountAdapter, which is also unfortunately barely documented.

I decided to go the signals route. The key to making this work is in intercepting the sociallogin parameter your signal handler will receive when an account is successfully created. I then installed a breakpoint with import pdb; pdb.set_trace() to inspect the contents of sociallogin. Once I had access to those goodies, I was able to post-populate the corresponding User objects in the database.

This sample code grabs First/Last names from Twitter, Facebook or Google; season to taste:

# When account is created via social, fire django-allauth signal to populate Django User record.
from allauth.account.signals import user_signed_up
from django.dispatch import receiver

def user_signed_up_(request, user, sociallogin=None, **kwargs):
    When a social account is created successfully and this signal is received,
    django-allauth passes in the sociallogin param, giving access to metadata on the remote account, e.g.:

    sociallogin.account.provider  # e.g. 'twitter' 

    See the socialaccount_socialaccount table for more in the 'extra_data' field.

    if sociallogin:
        # Extract first / last names from social nets and store on User record
        if sociallogin.account.provider == 'twitter':
            name = sociallogin.account.extra_data['name']
            user.first_name = name.split()[0]
            user.last_name = name.split()[1]

        if sociallogin.account.provider == 'facebook':
            user.first_name = sociallogin.account.extra_data['first_name']
            user.last_name = sociallogin.account.extra_data['last_name']

        if sociallogin.account.provider == 'google':
            user.first_name = sociallogin.account.extra_data['given_name']
            user.last_name = sociallogin.account.extra_data['family_name']

Family & Home

Center for Investigative Reporting

cir A year and a half ago, I left the Berkeley J-School to experience life in a high-energy web development shop with central campus. I learned a ton in that short time – the Agile process, Angular.js, building sites as Single Page Applications, strict separation between back-end and front-end systems, rigorous code review processes, and much more. And I had the opportunity to work with a crew of Java, Ruby, and Javascript rock stars, from whom I’ve learned so much.

Since my career to date had been as a web tech generalist (i.e. one person wearing all the hats), I found the experience incredibly illuminating. And yet… the project and I had some “creative differences” which ultimately resulted in me leaving the department at the end of May.

I’ve spent the past month working on personal and freelance projects, studying, and job hunting. I longed to work with journalists again, and really missed working with Django, which still feels like the most natural and effective way to build highly customized data-driven web sites I’ve ever encountered. At the same time, I wanted to make sure that my work had some kind of higher purpose – I wanted to be part of something with social and political impact.

Unfortunately, I wasn’t able to find anything on campus that really fit my requirements, and finally made the tough decision to start looking off-campus.

Today, I’m thrilled to say that I believe I’ve found the perfect fit, as a full-time Django developer at the Center for Investigative Reporting in Berkeley.

At The Center for Investigative Reporting (CIR), we believe journalism that moves citizens to action is an essential pillar of democracy. Since 1977, CIR has relentlessly pursued and revealed injustices that otherwise would remain hidden from the public eye. Today, we’re upholding this legacy and looking forward, working at the forefront of journalistic innovation to produce important stories that make a difference and engage you, our audience, across the aisle, coast to coast and worldwide.

CIR recently merged with the Bay Citizen and California Watch, two excellent journalism organizations that have had myriad overlapping projects with the J-School over the years. In fact, walking around the CIR offices today, I’m meeting former J-School students and instructors I haven’t seen in years – kind of a homecoming!

I’ll be enthusiastically¬†starting work in mid-July. Yes, it’s tough to say goodbye to the University, but it really is an ideal evolutionary step for me right now.



Building mod_wsgi with EasyApache for WHM/cPanel

Note: These instructions are for root owners of WHM/cPanel systems, not end users.

If you want to run Django sites on a cPanel server, you’ll probably want to use the mod_wsgi Apache module. There are plenty of instructions out there on compiling mod_wsgi, but if you create it outside of the cPanel system, will vanish each time you run easy_apache to upgrade your apache and php.

The key is to install this mod_wsgi for cPanel module. But before you go there, you’re going to want a more recent version of Python installed, since RedHat and CentOS still ship with Python 2.4, which will be deprecated by Django soon. However, you can’t overwrite the system-provided Python because yum and Mailman depend on it.

Download Python 2.7 (or whatever the latest is) into /usr/local/src. It’s critical that you build Python with shared libraries enabled, since mod_wsgi will be wanting to use them. So unpack the Python archive and cd into it, then:

./configure --enable-shared
make install

You’ll get a new build of python in /usr/local/bin, without disrupting the native version in /usr/bin. Any user wanting python2.7 to be their default can add this to their .bash_profile:


You’ll also get new libpython shared objects in /usr/local/lib. When you go to build mod_wsgi, easy_apache will need to look for python libs in that location. I found that copying the libs into standard library locations such as /lib and /usr/lib as suggested here didn’t do the trick. What did work was to add a system configuration file pointing to the new libs. Do this:

cd /etc/
echo "/usr/local/lib/" > python27.conf

Now you’re ready to build mod_wsgi through easy_apache. Download custom_opt_mod-mod_wsgi.tar.gz from this ticket at google code and run:

tar -C /var/cpanel/easy/apache/custom_opt_mods -xzf custom_opt_mod-mod_wsgi.tar.gz

That unpacks the module into the right location so that easy_apache will find it and present it as a build option. Run easy_apache as usual (either via script or through WHM) and select the mod_wsgi option. When complete, you’ll find along with all your other modules in /usr/local/apache/modules. The best part is, this will now become part of the default easy_apache build process, so Django sites won’t break when you rebuild apache+php in the future.

Many thanks to challgren for creating the module and to Graham Dumpleton for all of his mod_wsgi evangelism and support.


Migrating from Django-Tagging to Taggit

When Bucketlist launched a year ago and I needed a good app to let users create a taxonomy for their life goals, django-tagging was the main contender, and that’s what we went with.

Django-tagging worked pretty well overall, but had one critical bug: Because it only had a tag “name” field but no slug field, users could enter tags with slashes in them. Accessing lists of those tags would then generate a 500 error – a bad user experience, unclean, and I was getting tired of seeing the error reports. Unfortunately, django-tagging hasn’t been been updated in quite a while – starting to look like abandon-ware.

At Djangocon 2010, buzz was that Alex Gaynor’s django-taggit was picking up the slack and becoming the go-to tagging library for Django. Unfortunately, Taggit provides no migration strategy to move your existing tag base over. I held off on migration hoping one would appear, then finally decided this week to try it myself. Thought I’d document the process for others in the same boat.
Continue reading


Shorter URLs with Base62 in Django

URL shorteners have become a hot commodity in the age of Twitter, where every byte counts. Shorteners have their uses, but they can also be potentially dangerous, since they mask the true destination of a link from users until it’s too late (shorteners are a malware installer’s wet dream). In addition, they work almost as a second layer of DNS on top of the internet, and a fragile one at that – if a shortening company goes out of business, all the links they handle could potentially break.

On, a Django site that lets users catalog life goals, I’ve been using numerical IDs in URLs. As the number of items stored started to rise, I watched my URLs getting longer. Thinking optimistically about a hypothetical future with tens of millions of records to serve, and inspired by the URL structure at the Django-powered photo-sharing site, decided to do some trimming now, while the site’s still young. Rather than rely on a shortening service, decided to switch to a native Base 62 URL schema, with goal page URIs consisting of characters from this set:

BASE62 = "abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"

rather than just the digits 0-9. The compression is significant. Car license plates use just seven characters and no lower-case letters (base 36), and are able to represent tens of millions of cars without exhausting the character space. With base 62, the namespace is far larger. Here are some sample encodings – watch as the number of characters saved increases as the length of the encoded number rises:

Numeric Base 62
1 b
22 w
333 fx
4444 bjG
55555 o2d
666666 cN0G
7777777 6Dwb
88888888 gaYdK
999999999 bfFTGp
1234567890 bv8h5u

I was able to find several Django-based URL shortening apps, but I didn’t want redirection – I wanted native Base62 URLs. Fortunately, it wasn’t hard to roll up a system from scratch. Started by finding a python function to do the basic encoding – this one did the trick. I saved that in a in my app’s directory.

Of course we need a new field to store the hashed strings in – I created a 5-character varchar called “urlhash” … but there’s a catch – we’ll come back to this.

The best place to call the function is from the Item model’s save() method. Any time an Item is saved, we grab the record ID, encode it, and store the return value in urlhash. By putting it on the save() method, we know we’ll never end up with an empty urlhash field if the item gets stored in an unpredictable way (site users can either create new items, or copy items from other people’s lists into their own, for example, and there may be other ways in the future — we don’t want to have to remember to call the baseconvert() function from everywhere when a single place will do — keep it DRY!)).

Generating hashes

So in

from bucket.utils import BASE10, BASE62, baseconvert


def save(self):

    # Do a bunch of stuff not relevant here...

    # Initial save so the record gets an ID returned from the db
    super(Item, self).save()

    if not self.urlhash:
        self.urlhash = baseconvert(str(,BASE10,BASE62)     

Now create a new record in the usual way and verify that it always gets an accompanying urlhash stored. We also need to back-fill all the existing records. Easy enough via python shell:

from bucket.models import Item
from bucket.utils import BASE10, BASE62, baseconvert

items = Item.objects.all()
for i in items:
    i.urlhash = baseconvert(str(,BASE10,BASE62)
    print i.urlhash

Examine your database to make sure all fields have been populated.

About that MySQL snag

About that “snag” I mentioned earlier: The hashes will have been stored with mixed-case letters (and numbers), and they’re guaranteed to be unique if the IDs you generated them from were. But if you have two records in your table with urlhashes ‘U3b’ and ‘U3B’, and you do a Django query like :

urlhash = 'U3b'
item = Item.objects.get(urlhash__exact=urlhash)

Django complains that it finds two records rather than one. That’s because the default collation for MySQL tables is case-insensitive, even when specifying case-sensitive queries with Django! This issue is described in the Django documentation and there’s nothing Django can do about it – you need to change the collation of the urlhash column to utf8_bin. You can do this easily with a good database GUI, or with a query similar to this:

ALTER TABLE `db_name`.`db_table_name` CHANGE COLUMN `urlhash` `urlhash` VARCHAR(5) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL COMMENT '' AFTER `id`;

or, if you’re creating the column fresh on an existing table:

ALTER TABLE `bucket_item` ADD `urlhash` VARCHAR( 5 ) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL AFTER `id` , ADD INDEX ( `urlhash` )

Season to taste. It’s important to get that index in there for performance reasons, since this will be your primary lookup field from now on.

Tweak URL patterns and views

Since the goal is to keep URLs as short as possible, you have two options. You could put a one-character preface on the URL to prevent it from matching other word-like URL strings, like:

but I wanted the shortest URLs possible, with no preface, just:

Since I have lots of other word-like URLs, and can’t know in advance how many characters the url hashes will be, I simply moved the regex to the very last position in – this becomes the last pattern matched before handing over to 404.

url(r'^(?P<urlhash>\w+)/$', 'bucket.views.item_view', name="item_view"),

Unfortunately, I quickly discovered that this removed the site’s ability to use Flat Pages, which rely on the same fall-through mechanism, so I switched to the “/i/B3j” technique instead.

url(r'^i/(?P<urlhash>\w+)/$', 'bucket.views.item_view', name="item_view"),

Now we need to tweak the view that handles the item details a bit, to query for the urlhash rather than the record ID:

from django.shortcuts import get_object_or_404

def item_view(request,urlhash):        
    item = get_object_or_404(Item,urlhash=urlhash)

It’s important to use get_object_or_404 here rather than objects.get(). That way we can still return 404 if someone types in a word-like URL string that the regex in can’t catch due to its open-endedness. Note also that we didn’t specify urlhash__exact=urlhash — case-sensitive lookups are the default in Django queries, and there’s no need to specify the default.

If you’ve been using something like {% url item_view %} in your templates, you’ll obviously need to change all instances of that to {% url item_view item.urlhash %} (you may have to make similar changes in your view code if you’ve been using reverses with HttpResponseRedirect).

Handling the old URLs

Of course we still want to handle all of those old incoming links to the numeric URLs. We just need a variant of the original ID-matching pattern:

url(r'^(?P\d+)/$', 'bucket.views.item_view_redirect', name="item_view_numeric"),

which points to a simple view item_view_redirect that does the redirection:

def item_view_redirect(request,item_id):
    Handle old numeric URLs by redirecting to new hashed versions
    item = get_object_or_404(Item,id=item_id)
    return HttpResponseRedirect(reverse('item_view',args=[item.urlhash]))

Bingo – all newly created items get the new, permanently shortened URLs, and all old incoming links are handled transparently.