Benchmarking Tornado's sessions

Tornado, being otherwise great, lacks session support which makes building even slightly complex websites a hurdle. I created a fork which adds session support to Tornado. So far it enables 6 different storage engines for sessions. I was curious how fast they are and how it would affect Tornado's performance (requests per second served) overall.

Methodology

I carried out a simple benchmark. All possible configurations were left in their default state. I used two servers, bot from Rackspace, both were located at the same datacenter. The first was used to run Apache Bench to simulate server load. It was run with 300 concurent requests, up to 10 000 requests total.

The second server was used to run the sample Tornado app. Nginx served as a load balancer to four different Tornado processes (one for each core). This is the recommended way of deploying a Tornado app. The server hardware was a 2.2 GHz quad core AMD Opteron with 2GB of RAM.

I always ran ab to prepopulate the storage with data before doing the measured reading. This simulated older, stored sessions. The Tornado process was a simple request handler, which stored data from a GET parameter to the session.

No session performance

To have a baseline of Tornado's performance on the machine, I checked out the source code and installed it. The handler script was slightly altered, because, obviously, no sessions were available. Tornado scored 1626 req/s.

File based sessions

I didn't expect much of file based sessions. It's a naive implementation where all sessions are stored in a single file. It's not suitable for production, but it's ok when developing and testing. Due to poor performance I had to change ab parameters to 10 concurent requests and 1000 total to get at least some results. First run ended with approx. 160 req/s, but next batch dropped to 32 req/s and the third didn't even finish. As I said, this way of storing sessions is good for testing purposes, at most.

Directory based sessions

If you want your sessions to be stored in files, use directory based sessions. This solution offers traditional approach of having one file per session. Of course, with lot of users there will be a lot of sessions (and a lot of files), but then again, modern filesystems can handle a large amount of small sized files easily. Plus with hundreds of thousands of users you probably wouldn't be using this solution.

Directory based sessions performed reasonably well. The best run was 869 req/s, or 53.4% of the original performance. However, over time, as the amount of session files in the directory rose, it fell down to 608 req/s. The filesystem was ext3. I suspect reiserfs or JFS would score better.

Mysql based sessions

Tornado ships with a simple MySQL layer on top of MySQLdb module and because it is a popular choice among many web developers, I implemented support for MySQL session storage. Furthermore, it will be nice to see how it compares to its NoSQL cousins.

I used MySQL server v5.1.37 with default configuration. The results were a bit unstable. Apache bench reported 1171, 1216 and 1353 req/s in three consecutive runs. That's 83 % when counting the best run. I didn't investigate the root of the inconsistent performance. Test runs showed something between 1200 and 1300 req/s, with the mysqld process often consuming the whole capacity of one core.

Memcached based sessions

Being a non-persistent key-value store, Memcached has an obvious advantage over MySQL. At least on paper. I used Memcached 1.4.4. built from source. The best result was 1473 req/s (90.6 %), but the other two measured runs clocked at 1106 and 1202 req/s. Again, I don't know why the big difference occured. The code uses pylibmc which is the fastest Python lib for interacting with memcached.

Redis based sessions

If you want persistency with session data, you can use Redis instead of Memcached. Redis is a simple, fast key-value store with advanced features.

I used v1.2.1 built from source. Redis scored very well with three consistent runs at around 1410 req/s, the best one showed 1418 req/s (87.2 %).

MongoDB based sessions

MongoDB is the last supported storage engine and it was a shock. I don't know how the 10gen gal and guys do it, but MongoDB is FAST. All measured runs returned over 1500 req/s (1520, 1577 and 1582 req/s) which is a) supersonic and b) stable. That is 97.4 % of the original, no-sessions Tornado performance.

To be honest, MongoDB scored 960 req/s in one of the test runs. It was because the way it works - it allocates hard disk space by creating zero-filled files, starting from 2 MB, continuously up to 2 GB per file if the database needs more space. This one time the allocation happended during the test run (it was recorded in the mongod output log so it wasn't hard to find out the reason), hence the bad performance. However, the space allocation is infrequent and in real world it would rarely be a problem.

Graph and data

I put the benchmark data along with the Tornado handler up on GitHub. The graph shows worst and best runs for easy comparsion.

Session-performance-graph

Conclusion

Assuming you want to store sessions along with your app's other data, I would recommend using Redis or MongoDB; it depends on your use case. Redis is fast, easy to set up and work with and offers persistence so it wins over Memcached. If you're building something more complex, MongoDB is the way to go. It's fast, fun and addictive with great support from the authors. For developers seeking the traditional, SQL approach, only MySQL is available at the moment. I may add PostgeSQL and SQLite support in the future. Get in touch if you need it or watch the repo to be aware of the latest changes.