News Feeds are a challenging but solved problem. Facebook, Twitter, and others have built massively scalable news feed architectures, but there are still lots of implementation questions for smaller sites looking to add this increasingly common feature.
There is an excellent post about News Feed architecture on Quora (here), so this post focuses instead on how to implement one. This approach leverages some of the cool built-in features of the persistent in-memoery key-value store Redis (and also, in this particular case, Rails, but it would be easy to translate to Django, Node.js, or any other web framework).
News Feed Architecture
One brief note about architecture: since it's impractical to simply query the activity of 500 friends, there are two general approaches for building scalable news feeds:
- Fan-out-on-read (do these queries ahead of time and cache them)
- Fan-out-on-write (write follower-specific copies of every activity so when a given user asks for a feed you can retrieve it in one, simple query)
This example uses the latter, fan-out-on-write. The biggest bottleneck in this scenario is, unsurprisingly, write speed. If a user has 100,000 followers and they update their status, then you need to copy (or point to) this status update with 100,000 additional writes. Fortunately, Redis has extremely fast write performance. It's read performance isn't too shabby either, so it's also ideal for fast retrieval.
So let's say we have a user that has 100 followers, and this user posts a new status update. The new status update is inserted into our primary datastore (Postgresql, Mongodb, etc.), and now on an after_create hook we want to "push" this update to all 100 followers. (As a side note: to be safe, you probably want to handle these writes asychronously in a messaging or jobs queue like Resque, Delayed Job, or Beanstalkd if you're using Rails. Pushing to 100,000 followers could take a few seconds or longer).
The general approach is this: we will give each user a sorted set in Redis (that will serve as their "feed"), and every time someone they follows posts an update we will push a presentation version of the post into their feed. Then when they log in, to get their feed we simply need to make one (fast) query to Redis to grab the sorted set.
Our first challenge is that Redis only stores strings, so we need to a few simple helper methods to encode our models in a string format (in this case JSON). We also need to helpers to generate unique keys.
There are three helpful group structures in Redis: lists, sets, and sorted sets. While lists have the fastest performance, if you want pagination the easiest way is to use sorted sets, with a timestamp as the score. This way, you can do simple range queries to get older or newer items in the feed.
Here are some simple methods for retrieving a feed in reverse chronological order (standard for a news feed) then decoding it back into Ruby objects. There are also methods for pushing to the feeds, and an important method for trimming the feeds, since the performance of sorted sets is dependence on the length of the set (so you want to prevent them from getting too long).
To get this working in a Rails app all you would need to do is drop in the feed methods to your User model, and drop in the push methods to your Post or Status model. If you're using Rails 3, the easiest way to do this is with app concerns (which I wrote about in a recent blog entry).
Hopefully this was informative! If you have any feedback or suggestions please let me know.
If you enjoyed this article you can follow me on Twitter