Loose Notes from Djangocon 2010

It’s been inspiring to watch the growth of the Django developer community, and the increasing traction the platform is getting from high-profile sites. NASA, The Onion, Washington Post, Mozilla, PBS, and many other prominent organizations are discovering the power of deploying on a pure Python framework, rather than on an opinionated CMS written in PHP that gets in your way as much as it helps. I was lucky to attend the first Djangocon at Google headquarters a couple of years ago, and lucky again to be able to attend the conference in Portland, OR this year.

Three solid days of panels on topics ran the gamut from low-level detail-oriented sessions like tips on working with forms to high-level recommendations from experts on things like scaling to high-traffic situations, automating the deployment process, and what could be done better. As with any conference, 3/4 of the value is in the panels, and the other 1/4 is in the networking – meeting and talking with people working with the same toolchains, exchanging tips and helping one another. I learned a ton this year. There were surprises, too – from everyone getting their own pony in their shwag bag  to the visit from Oregon congressman David Wu, to the realization that I wasn’t the most junior developer in the room, to the discovery that you could get full to the point of bursting at a vegan restaurant.

About that pony: It all started during a discussion on what features should go into the next version of Django, when someone said “I want a pony!” The feature under discussion was delivered, and the person got their pony. That led to the creation of playful sites like djangopony.com and My Little Django. Hilarious at the time, but honestly, I think the meme has played itself out, and may have just jumped the shark with everyone getting their own pony this year. I love the pony, and I love my new Pinky Pie, but I’m ready for the meme to go away now.

While most sessions were highly technical, one of the highlights was the keynote presentation by Eric Florenzano of the Djangodose podcast, “Why Django Sucks (And How We Can Fix It).” video | slides . The talk generated some controversy, but that’s healthy and good. The talk was refreshing for its honesty and forthcoming with actual solution proposals on most points. Django appeals to enterprise in part because it takes a conservative approach toward change, but the atmosphere of the platform must remain on its toes to stay competitive and forward-thinking.

Newly launched: whydjango.com – to become a collection of case studies explaining why Django is a good fit for organizations and enterprises. I plan to submit case studies for the Graduate School of Journalism and the Knight Digital Media Center soon.

Took copious notes at most of the sessions, but have only edited them lightly – apologies for typos and incomplete sentences. And sorry this is so long! (I didn’t have time to make it shorter). Downloadable slides from many of the talks are available here. And of course I only attended half of the sessions by definition. Full list of sessions here. Want to watch the whole thing? Videos of the sessions are already up!

Sessions

Rewriting addons.mozilla.org in Django

addons.mozilla.org and support.mozilla.org ran on CakePHP for years, but the team were having trouble doing everything they wanted to do, and more problems scaling. This talk summarized the transition to Django.

165 million requests/month

24 web servers, each w 16 wsgi processes

Each object has a ton of relations. e.g. the description field goes to a Translations table.

Using a multi-db router

Their Querysets are BIG

They use Jinja2 instead of Django’s native templates.

They run tests with nose

.filter() is slow. Especially when you chain querysets – django clones the data structure and modifies the new one.

DELETE CASCADE is a dangerous thing for us. This is good for normalized dbs, but for a denormalized db, it causes problems.

Django does a SELECT before every UPDATE – an extra query before every update. Django should trust that if an object has an ID, that it exists.

44,000 lines of PHP ended up as 12,500 lines of Python

Russel Keith McGee: MySQL is great as long as you don’t need to JOIN anything. But every FK lookup is a join – source of performance issues (hence denormalization).

Jinja lets you have a bit more logic in your templates, but is still not a full programming language. You don’t lose that separation of logic that you have with Django templates.

Topics of Interest

Talk by James Bennett, django’s release manager, on the state of the platform as seen by a core committer.

14 people have commit bit to trunk (very small considering the size of django). To accept a new committer, every existing committer has to give you a +1 vote – must be unanimous. James thinks they’ve been too cautious about handing out commit privs.

Django didn’t always have a formal release process. When it was formalized, it may have been made *too* formal. The release cycle between 1.1 and 1.2 was far longer than the 9-month formal process (3 months of feature requests, 3 months of building, 3 months to polish).

Django Forms

A formset puts a “management form” into your template. If you build a custom form for a formset, it’s easy to damage the management form.

Dynamic forms:

  • An editor may get more or different fields than an author.
  • You might want to provide choices based on system data (picklist that shows your friends)
  • DONT try to use “code factory” to solve this problem.
  • Do dynamic forms by overriding __init__ instead of using a code factory. Cleaner and more maintainable.

You should NOT be putting a ton of custom save stuff in the view code – all of that should go into the form’s save() method. This makes things more re-usable – you can add more pieces of logic elsewhere without redundant field save calls.

django-uni-form provides one thing django is really lacking – a DIV-based output. django-uni-form solve this – more accessible forms.

{% load uni_form_tags %}

{{ my_form|as_uni_form}}

Don’t be afraid to roll your own. Isolate logic at any cost. Template logic must be isolated as much as python code. Extend the default form classes that come with Django. Copy them, inherit from them, etc. … so your output is controlled by a single file, not a bunch of files scattered around. Don’t reinvent the wheel.

Building a custom CMS

Talk from the masters at Lincoln Loop on building custom CMSs for large clients

1) Planning and Methodology

– Be critical of requirements
– Be careful of individual pet projects
– Create a prototype as fast as possible
– Populate it with “real data” as soon as you can.
– Load test your prototype to find the bottlenecks
… and iterate until you are done.

Compare the performance you are getting with the performance you’re expecting.

Develop in iterations helps keep project metrics in check (you know the things that always get abandoned).

Document with Sphinx: Any new dev should be able to bootstrap the project using only the docs.

Repeatable deployments:
– Use pip + fabric or Buildout
– Setting up a new env should be fully automated

Tests and continuous coverage. Smooth communication (ongoing) is critical. Dedicated IRC room. Integration server (WIP). Backlog monitoring (Redmine). Maintain good docs with wikis and sphinx.

Key success factors:

– Experience helps
– Things always take longer than expected
– Migrating old, undocumented business logic nearly impossible to estimate

Continuously refine estimates . Demo finished features on a fixed schedule.

– Kep your build working
– Keep stakeholders in the loop

2) Dealing with Legacy Data Stores

Two opposiing facotrs :

– The real migration will only ever happen one time
– The data is the business – it has to be handled properly.

Plan ahead:

– How are you going to keep migration plan in sync with changes in your new app?
– Don’t underestimate the work required to have a workable process
– The modify-migrate-test cycle is SLOW if you take a naive approach

A convenient way to migrate data is important because it will need to be run often.

– Determine the max allowable downtime/read-only window
– This is your target. Every change to the migration script must be below this number

Optimize the ease of run rather than the speed.

– Make your script flexible in terms of resource use.
– Put it in the cloud
– … or another computer

Your new data model will evolve and you will need to run this often.

Pitfalls:

– Other frameworks and dbs use different conventions or even other data types for FK than standard Django
– Mapping to Django User can be problematic
– Any large dataset is inconsistent
– Fix the integrity at the source if possible
– Naive migration strategy will leave you a process that takes many hours or even days
– Legacy content files might have markup and structure not trivial to deal with
– HTML tags/snippets
– entities
– character encodings
– various page layouts

Migration Tips:

– Visual inspection of the dataset
– Gather as much legcy app logic and SQL as you can up front. orking backwards thorugh relationships is tedious.
– Check your assumptions

Django-specific:

Use inspectdb!

Features for Migration Toolset

– Pause/clean break
– Resume
– Progress meter
– Logging
– Profiling
– Partial/range
– Graceful error handling – catch exceptions. In production, you want to fail spectacularly – no errors allowed.

3) Building It

Pitfalls

It’s easy to create a system that is completely unDjango-ish due to external constraints and influence from legacy system. Be careful of this path – try to stick to the Django way wherever possible.

There is often a temptation for developers new to Django to structure code or use idioms from other frameworks/languages

Trying to “clone” a large legacy system might lead you astray of commonly accepted Djago best practices. Possible consequences:

– Reusable apps won’t plug in well
– Updates to Django trunk won’t go well
– New external devs might be lost/need training
– Integrating existing Django projects is HARD

Study up on best practices – Google it!

Don’t carbon copy the old application. Avoid trying to recreate the legacy workflow. Frameworks don’t match 1 to 1.

UX has evolved. Look for ways to optimize workflow now that we have better tools and paradigms.

Learn from existing OSS projects.

Customize or build your own?

Favor 3rd party components but not be afraid to fork early. Pick out the best parts and use those.

– Use your network of trust to evaluate an app
– Cheeck up the code quality metrics
– Test coverage
– Complexity
– Documentation
– Author/number of followers

4) Bad News: Django Isn’t Perfect

Common complaints

– Settings for reusable apps. Different apps have different ways of making settings

- ORM doesn’t support the feature you need

– A nifty view can generate thousands of queries. Study them carefully.

– Forms, ModelForms, and ForSets can be adventure once past basic use cases

– HTML4,5, XHTML compliance with Django forms

– auth.User
– username unique
– Replacing overriding means you lose many 3rd party apps
– Ready for checkin tickets go to purgatory
– Admin customization can get tricky. The admin doesn’t fit every use case.

Django is well suited for building a CMS, but like any tool, you need to be aware of the tradeoffs in order to maximize productivity and minimize mistakes.

Customizing the admin

The Django admin is one of its most powerful features, and a great selling point. No other framework or CMS has anything quite like it, right out of the box. It’s often possible to deploy with the default Django admin after defining your data models, with little customization and no custom admin views – it’s that good. But there are situations like custom workflows and “outside” situations where it doesn’t quite fit. Most of the time you can get it to fit the org’s workflow model well enough, but some study may be needed.

Django admin doesn’t always fit mental models. For example, apps aren’t organized by context. No real workflow built in. Guy is picking on satchmo quite a bit.

Missing features: Editors may be used to error recovery (no undo), wysiwyg editors, inline help systems, built-in documentation, cross-model search. If you hand the admin to these users, they WILL make the comparison.

Everyone makes mistakes. The admin is unforgiving of this. GMail does a good job with undo, etc.

Poor display for complex models.

A form is a  conversation with your users.

Ask to look at your clients’ previous tools – people are used to conventions and will complain if they’re not present.

ModelAdmin Pro: Easy for one-off projects. Cons: Requires javascript, hard to bundle for redistribution.

Custom templates: Think of admin as a reusable application. Key templates:
– admin/base.html
– admin/index.html
– admin/change_form.html
– admin/change_list.html

For a specific app: admin/my_app/change_form.html
For a specific model: admin/app/model/change_form.html

Extend, don’t override!
– Use {{block.super}} to extend blocks.
– Extend a symlink of the admin templates in the event of recursion (you can extend a template from itself)
– Extend the extrahead block in base.html for admin-wide media

Modeladmin/modelform hacking:

class CustomUserAdmin()

admin.site.unregister(User)
admin.site.register(User,CustomUserAdmin)

Row-level permissions – stick the User into the queryset when requesting objects.

If you can’t do it in ModelAdmin, you can probably do it in ModelForms. Demo of how to create a widget that overrides USStateField to exclude Guam, etc.

Use a debugger for sanity – ipdb.set_trace()

Final option: Build the views yourself. But the admin wasn’t built to do everything. If you’re trying to bend it so far that you feel like it’s breaking… don’t. If you build custom admin views, you MUST use permissions decorators thoroughly and carefully.

Think about your users, not your database. Match user expectations. The admin provides a lot of hooks – use them!

Domain Specific Frameworks

Talk by Sean O’Connor on frameworks within the framework

At one end of spectrum you have very low level stuff – sockets, talking to hardware, etc. These libs have very little opinion about anything. They make you work hard to get your big thing done but they let you be very specific.

At the other end of the spectrum things are very easy to use, are very specific, and are extremely opinionated. A blog’s gonna work the way it’s going to work.

In between the two are frameworks, which are all about compromise and balance. The right point between flexibility and rigidity.

Examples: Celery – a lib for making it easy to handle delayed task execution. “I want to do this and I want to do it later.”

ImageKit: Like sorl – creates thumbnails in a way that won’t drive you insane. Offers management command:
manage.py ik flush myapp

So it’s a wrapper around PIL – takes a lot of heavy lifting you’d have to write yourself otherwise.

Piston: For writing REST APIs.

Common patterns in DSFs:

– Decorators (for views) (you can also decorate a method on a class)
– Registration – Django admin is a good example of this. Register a class and an admin view. Couples two classes together for a use case.
– Providing base classes to be inherited

Explicit is better than implicit, but without being overly verbose.

A meta-class is like a hook : When the modify what was declared in the class.

Make integration with other libraries as optional as possible. e.g. Celery can either be used with Django, or independently.

Care and Feeding of Ponies

First round of lightning talks — 5-minute introductions to various projects of interest going on in the Django world. So much meaty stuff to discover and play with here!

ORM tips

Case studies for whydjango.com
– Submit UC Berkeley Lightning Talks

Russel KM – President of the Django Software Foundation

Infrastructure – need devs to refresh djangoproject.com and code. and blog.

Enterprise/Promo site needed. Currently djangopeople.org and djangopackages.org

dsf-volunteers Google Group mailing list

———-

Chris Heisel – autodiscover – heisel.org

for app in settings.INSTALLED_APPS:
try app_path = …

“I’m sorry I can’t hear you over the sound of how awesome I am.”

Is autodiscover a bit too magical? Sometimes my app gets called three times when it doesn’t need to.

————

Eric Holscher – readthedocs.org

All built on Sphinx, which is also used by Django and Python.

Offload responsibility for hosting/posting the docs for your project. With a post-commit hook, your changes go live on readthedogs immediately.

———-

Isaac Kelley – Servee — on github and pip

Servee is a use-anywhere WYSIWYG tool for any Django Model. What’s that you say? “Big deal, I know HTML. You dummy” Ah Yes! These tools aren’t for you. They’re for your mom*.

Oh! Wait! They’re for you too. We also have a shiny new out of the way toolbar (think django-debug-toolbar) for some doesn’t-feel-quite-right-in-the-admin functions, like re-arranging menus and photo galleries.

———

Logbook – An alternative logging app for Django

Django logging is not suited for web apps.

Provides a central registry of loggers that makes uni testing a pain.

The same registry also causes issues when libraries want to start logging.

Nobody likes logging because default config is painful, useful config for libraries. Who sets up the logging config?

——-

Peer to Peer university

School of Webcraft is a Mozilla Drumbeat training program.

——-

Opus services platform: Deploy new projects easily.

Auto-deployer deploys a collection of apps into a managed project. Alternative to Google App Engine.

———

django-alfajor – Testing tool that makes it easy to do testing for Ajax as well.

It’s a selenium wrapper in Python.

Modeling Changes

Talk on working with complex model relationships by core developer Malcolm Treddinick

Source code for talk: http://tinyurl.com/dj-models

Challenges of a school – a person can be an alumni and a staffer at the same time. Baseball players move between teams, teams move between leagues. What’s stable? Players and coaches belong to teams, umpires belong to leagues, not teams.

Don’t fall in love with ManyToManyField(through…) – has uses, but not always easiest to read. Abstract base classes are useful in these “almost the same” situations.

Learning a new codebase – Justin Lilly

– Exploring code provides hooks to attach future knowledge
– Gives you a vocabulary to operate in
– Be tolerant of fuzziness

In the first pass you have to be willing to say “I don’t know what this method does but I have to trust that it does what it says it does.”

Directory structures will reveal the app’s archetype.

When looking at models, take a coarse look. Gauge what you’re getting into. Next, look at fields on the model. Finally, look at the interfaces provided by the model. At this point, the function’s purpose is irrelevant. Names matter.

Tools that can help.

– django-extensions can provide graphs of your models
– grep, sed, awk can provide insight
– (C|E) Tags (in a function call you can go back to the function definition)
– Class Browser

urls.py is your public API to users. Django-extensions has a “show_urls” flag to list these in a concise way.

Views are your application’s verbs.

– Often one of the least tested components
– Tend to have tons of stuff going on inside
– Typically many more views than anything else.

Exploratory reading is just the first step.

– Lightweight in time and effort
– Provides a framework for success
– Plenty of tools to help you

How to read code

Not so much about “how to read code” as about “How to read other people’s code.”

– Take notes on which functions call which
– Re-factor as you go
– Pair with someone, if you can (like reading the Talmud together, questioning as you go)

For small libraries, work per feature

There is a secret sauce in most libraries and apps. Find out what that is and figure out how it works.

Finding the “interesting” files.

Using “git log –numstat” and awk, we can find files which have received lots of attention. You can even use this tool to find files associated with closed tickets, assuming you mention them in your commit message.

git log –numstat | awk -f -/file.awk | sort -rn | head -n5

Find a bug and fix it

– Great way to get involved if you can fid the right tickets
– Wins good will from the maintainers
– Introduces you to bits an pieces of the code at a time
– Must be careful not to overextend yourself

To debug code is to understand code

Best to start at a URL and finish at a response. Gives you a thread of logic to follow which is all grouped along a common theme, your URL. Helps you see what’s happening in a given context, including side-effects.

Lifecycle of a request:

Request -> Middleware -> urls.py -> Views -> Interesting bits -> Middleware -> Response (usually to template)

Tests are an important part of understanding code

– Validating assumptions
– Constant feedback
– Allows you to change things without fear of breaking things

And now for the don’ts

– Don’t start solving bugs without getting the vocabulary
– Don’t get hung up on style
– Don’t be afraid to ask for help
– Don’t be afraid to question the status quo

Pony Pwning

Django security talk by security expert Adam Baldwin

When budgets get cut, quality and QA get cut. When something gets delivered, clients rarely care about security.

30% of failures come from incompetence or ignorance

9% of failures are “needle in the haystack” – your tests aren’t going to catch these.

1% of failures are 0 days – you have no control over

XSS – Everyone’s heard of it, but everyone ignores it. Understand it, take it seriously. Doesn’t Django deal with this for us? But django encoder only does the big 5 punctuation marks. The template system can take a context and give inexperienced template editors enough rope to hang themselves. You can turn autoescape off – any data will be shoved back to the user. Similar with |safe filter, and mark_safe(). These get used all the time in apps that take user input.

Make sure there are quotes on attribute values in HTML templates – otherwise users could put javascript in an href. Django can’t help you here – you must examine templates worked on by designers to make sure they’re doing this right. Use the “swingset” to test for common pitfalls.

– Consider OWASP ESAPI – provides output encoders
– Audit templates
– Audit reusables
– Educate designers

File Uploads

Evil avatars?
– Images can contain PHP
– ImageField does not care.
– ImageField does not check extensions.
– File uploads often are put in unprotected directories

If you have PHP turned on, you could get code execution from an uploaded file. So:

– Check file extensions
– Disable PHP

Direct Object Access

“Not found” vs. Forbidden – Don’t tell the user how your application works. This is minor, but you don’t need to leak this info. Instead, log it and return a 404.

There are http verbs other than GET and POST. “/objects/delete/2″ should probably use DELETE. Consider using django-piston for REST.

Click jacking – when you go to a malicious site, the site is framed in an invisible iframe and steals clicks. Demo’d creating a new user in the django admin without the user’s knowledge. Set x-frames-options DENY header means “Don’t let this page be framed.” This is better than frame breakout code. Use django-xframeoptions middleware to do this.

mod_security can mess with the django admin, but is great at trapping bad behavior and blocking it – e.g. if there’s a script tag in any user input, it’s toast.

Logging and monitoring are critical, they’re reactive, not proactive.

Use W3AF http://w3af.sourceforge.net/
skipfish http://code.google.com/p/skipfish/

Big Problems in Django, Mostly Solved

Eric Holscher on the flood of excellent solutions to common problems we’ve seen over the past year.

Most of the things Flickr’s Cal Henderson brought up two years ago have been crossed off the list.

Search: Haystack. Uses Django concepts really well (works a lot like Admin, with autodiscover()). Whoosh is pure python. Solr is better for large-scale deployments (good example of an upgrade path). The usual queryset API is laid right on top of haystack (exclude, order_by, etc.)

Supports faceting (fields defined on search index have an interface), highlighting, More Like This, Easy Customization.

Faceting lets user search by date, author, etc. And you can set it up in an hour.

DOCUMENTATION

Django’s is awesome, yours should be too. Sphinx is becoming the defacto documentation tool.
Uses ReStructuredText. Easily generates PDFs. Link between your own and other docs. Themes.

DATABASE MIGRATIONS

All about South – has become the defacto standard. Safe, painless data migration.

DELAYED EXECUTION

Celery – run commands out of process. Make your site feel faster so they don’t need to wait for the whole process to run before returning the page. Replaces need for cron. Good error reporting, can run on multiple machines. Handles errors really well. Magically handle async needs.

REMOTE CONTROL

Fabric – Control your production deployment from the command line, and build it all in python.

DEPLOYMENT

gunicorn – Run ngnix in front of it and you can use runserver for deployment. Is this safe?

PACKAGING

pip + virtualenv

Adds community standards to something everyone was doing differently. No more pythonpath hacking.

APIs

TastyPie now competes with Piston – more customizable serialization.

TAGGING

django-tagging was always the defacto standard, now Django Taggit is replacing it. Use for new projects.

DEBUGGING IN DEVELOPMENT

debug toolbar!

PROFILES AND REGISTRATION

(obviously)

FILTERING

A. Gaynor’s django-filter — exposes right hand side of filters as nice widgets, like drill-down list views.

FINDING PACKAGES

djangopackages.com is becoming the main resource.

*** UNSOLVED PROBLEMS ***

TEMPLATE TAGS

Five or six different implementations – anyting the *right* way?

LOGGING

Nothing great – will it be logbook?

MODEL INTROSPECTION

_Meta is the quasi-hidden way to do it, but needs to be turned into a real API

CLASS BASED VIEWS

Done so many ways, nothing canonical. Maybe coming in 1.3 or 1.4?

NOTIFICATION EMAILS

Generate emails when events happen – should be a standardized way to do it. There’s django-notification, which is two apps in one and solves 70% of the problem.

DEBUGGING IN PRODUCTION

Pain in the ass. How do you figure out what’s going wrong in a production environment?

OPENID/OAUTH

Too many competing libraries. What’s the best way? Piston does oauth.

Clean Code

Peter Baumgartner on the secrets of keeping your codebase crystal clear, concise, and readable by others through the use of excellent automation tools.

You are running tests, right?

Django’s test runner will fail with some 3rd party apps. Django Nose can help get you around this.

Use coverage to see what parts of your code are not being tested. 100% coverage doesn’t mean a whole lot – you can still fail under any number of conditions.

Linting – Keeping your code clean.
Static analysis of your project looking for “suspicious” code such as:

– Unused imports
– Missing imports and methods
– Mismatched function signatures

Linting tools

– Pylint
– PyFlackes
– PyChecker

– pylint provides the most thorough evaluation
– integrates with hudson’s violation plugin
– overall score is easy to track and set minimum expectations
– Checks for things like missing docstrings, good names.

Automation

Keep it simple, stupid.
The Joel test: Can you make a build in just one step?
Good programmers are lazy – they’ll never do it unless it’s easy
If you can’t automate it, you can’t do continuous integration

Automation methods

Buildout
Fabric
Makefile
Bash script

The “yes” trick: Pipe “w”s to stdout to answer any questions from pip installer:
yes w | pip install -q -r requirements.txt

Continuous Integration

1) Build
2) Test
3) Report
4) Repeat

Do this with one of:
cron / buildbot / hudson / cruisecontrol / bitton / integrity / ponybuild

“Just use Hudson”

Hudson is:

Mature, simple to set up, lots of plugins, handles many simultaneous projects. Includes a mini http server running on 8080.

Reusable apps

Alex Gaynor on the problems (and solutions) to using re-usable apps with Django. This was an amazing talk, but Gaynor is too fast for me – scrambling to keep up during this one.

The promise of reusable apps has not panned out to be as reusable as promised (by James Bennett :)

Problem is that different systems have different business logic. Do comments have markup? Can all users comment or just some users? Should comments key off a related slug? Does a forum belong to a group? Does it need a slug? Etc etc.

But there are solutions!

Class-based views.
Do less (make a reusable framework so others can write their own business logic around them) (example: Badges)

Alex’ favorite reusable apps:
– django-registration
– django.contrib.auth
– django-taggit

Alex wrote taggit because django-tagging was not very reusable.

[stopped taking notes]

Maintaining and Updating an Aging Django Project

Shawn Rider – PBS TeacherLine
Launched 2000 – Online teacher training, serving 10k teachers per year
In 2006 it was a ColdFusion site. Complete rebuild was needed. Considered Rails, PHP frameworks, but saw JKM’s presentation on Django at OSCon.
PBS has officially adopted Django as their preferred solution for web apps.

– Speed of development
– Code quality
– Modularity of framework (loose coupling)
– Django admin
– Active community
– Python

No slipshod or haphazard programming going on.

“Django is named after a jazz musician. Jazz is dead. Therefore Django won’t last.” (This coming from perl devs on the same team).

Over 4 months, 2 devs built the new TeacherLine site (and it would now take 2 months due to better Django comprehension).

There are whole other worlds of geeks that you don’t know about – with their own languages and touchstones.

Moved from traditional to cloud hosting.

Django is opinionated in a generally good way. A culture of self-criticism is good (it’s what enabled us to break things for Django 1.0 so we could move forward).

django-config: A multi-site deployment solution – has worked well for them.

Mistakes:

– Never override the User model (you’ll have to patch every re-usable app you pull in).
– Make tests right away
– Never underestimate the admin (they built a home-brew admin site of their own, which they didn’t need to do)

Things are getting better:

– Continuous ORM improvements
– Django forms are now awesome
– Enhanced security protections
– Authorization back ends

Upgrading the framework

– To take advantage of framework upgrades, you must schedule the work in your project. Managers are reluctant to approve time spent on things they can’t see. How to sell it:

– It will lower the cost of future development
– It will alleviate a pain point felt by staff processes (point to features, like NewForms or aggregates)

They were using svn and this left them with very little hair on their heads. Moving to git improved things

Things we’d like to improve

– Remote API
– Haystack/SOLR
– Email backends
– DB master/slave and sharding
– Leverage admin better

Ponies we want

– Multi=site config out of the box
– A better way to know when modules are loaded into memory (wsgi loading sometimes misses them)
– More robust event handling – signals are good but we need signals++

The upgrade is not the fun part. The fun part is building the new app that takes advantage of features in the upgraded framework.

Technical Design Panel – Core Developers Q&A

The core committers weather the storm and answer developer questions head-on.

Justin Braun – GeoDjango
Karen Tracy – Random debugging
Russel KM – Core developer
Brian Rosner – Admin and forms
Janis Leidel
Gary Wilson

Responses to Eric Florenzano’s talk:

All agree they need more committers. “Domain experts” who have access rights to certain parts of the tree. Maybe an “experimental branch” with 100 committers, then the core devs just pull that down when it’s stable.

When do backwards incompatible changes get introduced? Maybe 2.0. But right now there’s not a lot of asking for things that are important enough to go through that pain. Django 2.0 might be the Python 3 upgrade, and we know everyone’s going to have to go through code changes anyway with that upgrade.

Will contrib get thrown out? Can this be done without everyone using pip? Maybe the docs do need to start recommending pip. Do beginners still gain a lot from having everything bundled with Django? Karen believes yes (and I agree). Contrib could become a massive tree of stuff. There’s always been argument about what should go in there. We can see parts of South going into core after it’s primitives are in, but Celery is totally different – there’s no reason for it go in.

What’s coming in 1.3? Very few features, and a lot of bug fixes. Python logging is probably the biggest one, and that’s not even that big. There’s a LOT of stuff that’s been hanging around for a long time.

DVCSs: Nothing prevents any of you from using the DVCS tool of your choice. You can clone Jannis’ up-to-date repo. But it’s not obvious to the public that you can do this.

The single biggest problem facing the core dev team is the bottleneck of tickets. Opening up more core dev positions is the key, but first need to figure out how to do it smartly.

Best part of Django now? What sets it apart from other frameworks? GIS for sure. Documentation certainly – we have a very literate community that writes quite well (still room for improvement of course).

Team really wants to see a designer move into the core team, so we can refresh CSS and Javascript into the admin.

Treehugging

Talk on working with hierarchically structured data by Brian Luft

Where useful:

Hierarchical Categories
Site navigation
ecommerce product catalog
Social network/recommendations
Threaded comments

A table is not a tree – we need a few tricks.

Models in the past to deal with this:

– Adjancency List
Self-referential FK pointing to a row in the same table
Fast writes, slow reads
Fragile update operations
You have to know what level in the tree your item is at when you look it up
Easty to orphan sub-trees
Easy to star with, but maintiing the tree integrity takes a lot of extra work

– Nested Sets
Tree the data as a group of sets
Every time you insert into the table you have to re-calculate all the numbers
Efficient reads
High maintenance cost for write/delete

– Materialized paths
Every node in the tree has a “path” attribute
Use LIKE selects to find the path for a given row
Queries are simple and fast
Effectively denormalizes parent/child FKs
Writes are slow
Requires maintenance when categories change

Django Apps

– django-mptt
– django-treebeard

These handle all the nitty gritty at the API level. You create your models as normal and they take care of giving you extra attributes and methods.

Treebeard supports all three approaches above
MPTT does nested sets only
Both provide Move Node forms in the admin
MPTT provies a TreeNodeChoiceField for forms/admin
MPTT is being actively maintained (part of the django=cms project)
django-treebeard has slightly more active development

Front end:
Treebeard has get_annotated_list method
MPTT has a few nice template tags/filters

Real world examples:
treebeard: django-page-cms
mptt: django-cms

– django-treemenus is a fork of treebeard, and optimized for menu creation
– Neo4j – another nosql database, designed for graphs
– Suckerfish/Superfish (jquery)

Typewar – A Case Study

Talk by James Tauber of typewar.com — a unique Django use case. The techniques and algorithms they use to keep the site humming.

Design mockups – Eschewed wireframes because they were their own client, and design comps for direct template mockups

Texture generation:  Dynamically generated backgrounds

Glyph generation:

– Deliver as images
– Generate with PIL
– Has the filenames
– Per-typeface scaling factor (to make sure all fonts were the same size)

Badges via http://github.com/eldarion/brabeion

One huge table stores the history of the site – site history can be replayed. Stats are stored in other tables. Denormalized because these need to be sorted (performance).

Bayesian average – a weighted avg between what we know about an individual and what we know about an entire group. If you get 20/100 I can say you know 20% of your fonts. But if I’ve only asked you 1 question, I can’t really say you know 100% of your fonts. The number of questions matters. The more you answer, the more I know about you and the less I need to rely on group stats to predict your accuracy.

Queuing of tasks done with celery. Big challenge is migrating an unauth’d user to an auth user, while keeping their stats so far. Go back and update rows in the db that belong to the same session key.

DB is on a dedicated 1GB slice. Increased kernal.shmax to 25% of system memboery. Increased shared buffers to 250MB.

– Used django-piston for the iPhone API.

– Rewrote some views to be more class-based to work for both web and API

– Introduced non-twitter authentication

– Glpyhs stored on iPhone downloaded on leveling). New glyph packs downloaded as you go up through levels (10MB app limit).

– iPhone knows answer before sending to server (requires trusted/secure app). Advantage is that they can tell user the answer while it’s communicating with the server in the background – makes UX much smoother.

They now have tons of data about how confusing people find certain typefaces.

25% of users come back at least 10x

Multiple OSS projects have come out of it.

Has been great marketing for eldarion

Why Django Sucks (and how we can fix it)

Eric Florenzano, djangodose.com

This was one of the highlights of the conference. Watch the video here.

Apps are a mess. “I need avatars. I find django-avatars. Now I need to add a custom field to it. Now I have to fork.” This is because apps that provide models are inherently inflexible. Providing abstract base/model classes introduces configuration headaches.

Everyone assumes PositiveIntegerField primary keys, but UUID primary keys have lots of advantages. Especially for the User table. Using Integer fields for PK makes sharding difficult. To achieve this for User means you have to fork everything.

Field naming:

{{entry.body}} {{ item.description }}

Different field names have emerged for the same “kind” attribute. To reuse the template code we’re forced tweak templates.

Will class-based views save the day? You can just subclass, right? Differeint ideas on how to actually implement them. Little to no consensus. Where do you put customized view subclasses? Trades complexity and configuration for flexibility. Where do you put them? urls.py is getting overloaded. Create a new app? Now we have an extra level of indirection, making code less clear.

Base problem: The app as the sole level of abstraction is too broad.

Generic foreign keys is usually the wrong solution. They’re great for flexibility (e.g. comments) but bad for configuration. These things should usually have concrete FKs.

Eric’s test with a blog app and benchmarking showed memory usage going up and performance going down steadily, per release since 1.0.

MONOLITHIC SETTINGS

Can’t change them per-request in a thread-safe way. Makes multi-tenancy nearly impossible. Causes headaches for deployment. Prevents decomposition of Django projects. Essentially a global, which Django shys away from.

Others get this right – Flask, CherryPy, web.py all have an App object, which has its own settings.

COMMUNITY

It’s a tight-knit community, but it’s exclusionary. No django.contrib app has ever been created by a non-core-committer.

Why the heck isn’t Alex Gaynor a core committer?

Good ticket ideas with patches, tests, etc. go nowhere – like truncatechars. So it sits in “design decision needed” forever.

BADTERIES (Bad Batteries)

What’s wrong with auth? is_staff is coupled to admin. first, last name are culturally limited. Integer primary key leaks info about the number of users you have. get_profile() is inelegant and inefficient. Tied to the sessions system – most sites don’t use sessions for anything other than login. Why not use a secure cookie?

Webdesign filter – what is this doing in Django? Web designers care about a lot more than lorem ipsum.

Databrowse. Docs says it’s new when it’s not. Doesn’t support pagination. Why is it even in Django? Should be a 3rd party app.

IMPROVEMENTS POSSIBLE

Introduce focused abstraction layers.
Late-binding configuration option
Monitor performance
Rip off flask
Move to a DVCS

Apps: Abstractions should be made with a narrower focus. USE models, don’t expose them. The model itself doesn’t need to be the API. Make it so you can pass extra keywords to create(). Allow to swap out the implementation to use a different storage layer. Expose it over the network as a service layer.

Late-binding FKs could be a solution.

Performance? We just need to be aware of it. What are the environmental impacts of new features that get committed? If a new feature results in performance impact, it must be justified.

Memory? Provide mechanisms to shut off unused Django machinery. If internationalization is not needed, let us turn it off!

Flask gets this really right. http://bit.ly/flask

Kill Contrib!

Pip is pretty good now. Users are likely going to use pip on their first sit down anyway (e.g. South). If it doesn’t make sense to split it out, then call it like it really is – a core app. Core apps should be Sessions and Auth (but NOT Admin, split that out – not sure I agree with this).

If Admin were not there we could release updates to the admin without releasing Django!Each app could have different comitters. It woudl foster innovation in the commuity. Might have to start “blessing” apps officially.

Add More Core Devs

We have releases now – most people don’t run trunk. Trunk can break sometimes and it’s not the end of the world. Django’s bigger problem isn’t qulaity control, it’s lack of participation. Solution for this is to loosen the reins a bit.

Switch to git/github. Commit bit would be less important. Much easier to do experimental branches. Easier for people to stay up to date on what’s going on. Frankly, marketing. It’s both a technical and a social thing. If you’re using trac and svn, it makes the project look old.

3 thoughts on “Loose Notes from Djangocon 2010

  1. Christopher Clarke

    Great job summarizing the talks for those of us who could not attend.
    Adds some context to the slides.

  2. Shawn Rider

    Thanks for the great notes on the Djangocon talks. One small correction: Up top you mention NPR uses Django. As far as I know, they do not. (I believe you meant to type PBS, which is a forgivable mistake since many folks confuse NPR and PBS.) I envy the NPR content API, but not their PHP ways. Maybe we can convince them to migrate to Python…

  3. shacker Post author

    Hey Shawn – Thanks for the correction! I did indeed conflate NPR and PBS in my mind. I *do* know the difference :) I’ll fix that now.

Leave a Reply