Conditional Aggregates in Django

by Jason Lantz — last modified Sep 25, 2009 03:50 PM
Filed Under:

Django's new aggregation system is awesome and has allowed a lot of functionality in Voter Universe to exist. One area the aggregation engine doesn't handle is conditional counts. For example, the aggregates system allows you to easily get a count of how many people are in household, but what if you want to know how many people in the household are Female? Keep reading to find out how we solved this problem.

There is currently a ticket in the Django issue tracker (#11305) proposing to add conditional counts to Django's aggregation framework.  While implementing the ability to create a sum of donations in the new Donors module of Voter Universe, I found myself having to tackle this problem myself.

I wound up basing my solution on this code linked as a comment to ticket #11305 by user Alex.  The problem I ran into was when trying to use it against a column that is a foreign key.  Take the following example code using the CountIf class:

condition = {
    'donations__status': "'received'",
    'donations__recipient__isnull': True
}
queryset.annotate(
    num_donations = CountIf(
        'donations__amount', 
        condition=condition,
    )
)

The intention of the donations__recipient__isnull condition is to find all donations that don't have a recipient which in Voter Universe are considered donations to the campaign (donations with a recipient are used to track donations to other campaigns).  This is important because I was trying to get a count of all donations for a given donor given to the campaign.  I ran into two problems.

SQL Escaping Values

First, notice the double quote and single quote around received.  It appears the CountIf class isn't properly sql escaping the value, but this simple hack of single quotes contained inside double quotes did the trick.  A better long term solution is to properly escape the values in the CountIf class, but this worked for now.

Foreign Key Conditions

The second problem was bigger and the solution a bit more hackish.  When the query is built, this is the COUNT statement that CountIf produces:

COUNT(CASE (`donors_donation`.`status` = 'received'  
            AND `donors_recipient`.`id` IS NULL)

The problem here is that the table donors_recipient is not joined into the query.  The CountIf implementation is using Django's ORM layer to build a where query clause, but doesn't handle if the where query clause requires a new join to be added to the query.  Thus, while the donors_recipient table exists in the database, it's not a part of this query causing a query error on execution.  After fighting with this for a while, I came up with a modified CountIf statement that can hack around this problem (and by hack, I mean hack!):

from django.db import models

class SQLCountIf(models.sql.aggregates.Aggregate):
    is_ordinal = True
    sql_function = 'COUNT'
    sql_template= '%(function)s(CASE %(condition)s WHEN true THEN 1 ELSE NULL END)'

class CountIf(models.Aggregate):
    name = 'COUNT'

    def add_to_query(self, query, alias, col, source, is_summary):
        sql, params = query.model._default_manager.filter(**self.extra['condition']).query.where.as_sql()
        for search, replace in self.extra.get('sql_replace', {}).items():
            sql = sql.replace(search, replace)
        self.extra['condition'] = sql % tuple(params)
        aggregate = SQLCountIf(col, source=source, is_summary=is_summary, **self.extra)
        query.aggregates[alias] = aggregate

The new part here is the addition of a new optional keyword argument called sql_replace.  This argument contains a dictionary with the search string as the key and the replacement value as the value.  Using this, I can get conditional counts against id values of conditional fields by doing something like this:

condition = {
    'donations__status': "'received'",
    'donations__recipient__isnull': True
}
sql_replace = {'`donors_recipient`.`id`': '`donors_donation`.`recipient_id`'}
queryset.annotate(
    num_donations = CountIf(
        'donations__amount', 
        condition=condition,
        sql_replace=sql_replace
    )
)

As I said, it's not pretty and highly hackish, but it does get the job done.  This allows you to skip the join and go straight off of the foreign key's id.

I'm sure there's a more elegant way to do this.  I will spend more time once I have some to figure out a cleaner model, but for now, if you need to do conditional counts on foreign key fields, you have an interim solution.

Conditional Sums in Django

Posted by Jason Lantz at Sep 25, 2009 04:52 PM
After posting, I realized I also needed to do conditional Sums to sum the amount of all donations for a given user. The Sum requires an additional argument, which I set as sum_column. This is used in the template to get the value to sum for all rows matching the condition. Here's the classes:


class SQLSumIf(models.sql.aggregates.Aggregate):
    is_ordinal = True
    sql_function = 'SUM'
    sql_template= '%(function)s(CASE %(condition)s WHEN true THEN %(sum_column)s ELSE NULL END)'

class SumIf(models.Aggregate):
    name = 'SUM'

    def add_to_query(self, query, alias, col, source, is_summary):
        sql, params = query.model._default_manager.filter(**self.extra['condition']).query.where.as_sql()
        for search, replace in self.extra.get('sql_replace', {}).items():
            sql = sql.replace(search, replace)
        self.extra['condition'] = sql % tuple(params)
        aggregate = SQLSumIf(col, source=source, is_summary=is_summary, **self.extra)
        query.aggregates[alias] = aggregate

And here's an example implementation of the classes (sorry for the app specific code, but I'm in a hurry):

        annotation_name = 'num_donations'
        if not self.recipient_isnull:
            annotation_name = 'num_donations_other'
        annotation = {
            annotation_name: CountIf(
                'donations',
                condition={'donations__status': "'received'", 'donations__recipient__isnull': self.recipient_isnull},
                sql_replace={'`donors_recipient`.`id`': '`donors_donation`.`recipient_id`'}
            )
        }
        queryset = queryset.annotate(**annotation)

Conditional Sums in Django

Posted by Ken at Jan 14, 2010 02:42 PM
Trying to use the ticket's code first as a starting point and keep running into an error:

'CountIf' object has no attribute 'lookup'

When I try to use your code instead, I get this error too. Any ideas on why this is happening?

Conditional Sums in Django

Posted by Jason Lantz at Jan 19, 2010 12:43 PM
Hi Ken,

Apologies for the delayed response... I've been dealing with our first son who just passed a month old over the weekend.

I remember running into the exact same error when playing with this code, but can't remember the cause at the moment. If you could either send me the code (jason at THIS DOMAIN) or post it on a paste site somewhere and let me know, I'll take a look and see if I can help figure it out.

Conditional Sums in Django

Posted by Josh Levinger at Nov 12, 2010 07:40 PM
I'm got the same problem as Ken, a "'CountIf' object has no attribute 'lookup'" when I try and use the provided code from the ticket on Django 1.2. Ideas?