This is the twenty-seventh episode of the StackOverflow podcast, where Joel and Jeff interview Alexis “kn0thing” Ohanian and Steve “spez” Huffman, the founders and co-creators of Reddit.

  • Jeff is at the Professional Developers’ Conference in Los Angeles. He co-presented a session with Phil Haack on ASP.NET MVC, which you can view online if you are so inclined. Also while at PDC, Jeff participated in a brief roundtable meeting with Ray Ozzie, who had some completely unsolicited and very positive things to say about Stack Overflow!
  • If you didn’t know, Joel was the genesis of one of the earliest branded reddits — the Joel on Software reddit
  • We discuss the Reddit switch from Lisp to Python, and the way Reddit stored raw user passwords in the database.
  • We use a weighting algorithm based on Reddit’s when we calculate “hotness” in Stack Overflow.
  • Reddit, like us, ran for quite a while on a single server. Moving to a second server, splitting the database, provided solid gains for Reddit and is something we just did as well. According to Steve, splitting off the database is easy — making sure you can scale to multiple application web servers is the difficult part, because of shared state caching.
  • Steve is a big fan of HAProxy which is a single software load balancer in front of all ~20 reddit servers.
  • Reddit’s first big partner is The Independent, for branded content. Although there is also WeHeartGossip. Joel and I feel these sorts of communities — to be truly successful — need personalities associated with them that are emblematic of the values and goals of that community.
  • We briefly discuss some of the protection mechanisms Reddit has in place to prevent abuse and spammers. Steve and Alexis have much more experience dealing with abusive users than we do. Per Steve, the Reddit mantra is “anything goes”, so they try to do as little as possible to inhibit users. We generally agree that the volume of badness is remarkably small. Most users behave responsibly — and this isn’t just an optimistic opinion, it’s based on actual data. That’s the good news!
  • Behold the power of the Reddit audience: they may just have saved the world by sending a crowbar to the CERN Hadron collider.

If you’d like to submit a question to be answered in our next episode, record an audio file (90 seconds or less) and mail it to podcast@stackoverflow.com. You can record a question using nothing but a telephone and a web browser. We also have a dedicated phone number you can call to leave audio questions at 646-826-3879.

The transcript wiki for this episode is available for public editing.

 
«
»

Based on traffic levels last week — we’re at and beyond where we were at launch — I decided it was time to pursue adding a second server.

The second stackoverflow.com server has identical specifications to our first server, that is:

  • Windows Server 2008 x64
  • Dual Quad-Core Xeon E5320 (1.8 GHz)
  • 4 GB RAM
  • 271GB SAS hard drive

As I’ve mentioned before, one of the most obvious scaling strategies for us is to move the database to its own, private server. We were thinking about upgrading to SQL Server 2008, so this was also a logical time to do that.

As of Sunday night, stackoverflow.com is now a two-server system: web on one server, database on the other. They are connected to each other through a dedicated crossover gigabit ethernet connection.

I have to give massive credit to Brent Ozar here, who not only helped us tune the database, but also contributed a huge chunk of his own time. Brent wrote a blog post about his experience working with the Stack Overflow databases, if you’re curious. Brent works for Quest Software and he is, without a doubt, a database ninja. So if you have any difficult SQL Server problems — or in our case, blazingly obvious newbie problems — maybe you should check out Brent’s SQL Server wiki.

While many queries are faster under SQL Server 2008, and the tooling is dramatically and indisputably better (intellisense for queries!), there is one downside for us: SQL Server 2008 is slower at full-text search operations than SQL 2005. That’s kind of a bummer because we rely heavily on full text. I saw a glimpse of this in my initial testing, when I had both servers up side-by-side with the same data, running the same queries, to see how much faster it would be (new versions are supposed to be faster, right?). My experiments showed that any full-text queries I tried were inexplicably slower on 2008, but we thought it had to do with different query plans, and was something we could work around.

Turns out we were wrong. Apparently SQL Server 2008 was the source of the massive slowdown earlier today. A set of full-text queries that ran fine all last week on a single, shared server caused a newly dedicated 8 CPU, 4 GB server to completely melt down and peg at 100%. Traffic levels were about the same, the database was about the same, and the code hasn’t changed much. Not to mention the whole “newly dedicated database server”, so you’d expect performance to be better, not worse.

We’re not the first people to notice that full text performance took a step backwards in SQL 2008:

I was lucky enough to visit Microsoft during the CTP period and was testing out integrated full text search in 2008. An issue we experienced was that full text can be slow when there is a high number of updates to the index and is caused by blocking on the docidfilter internal table.

That post mentions the DBCC TRACEON (7646, -1) flag. We’ve enabled this flag and restarted SQL Server, but haven’t seen much improvement.

The temporary workaround is much more aggressive caching, including caching to disk. Caching is the bread and butter of computer science, and any opportunity to cache smarter and.. er.. harder.. is a good thing. But all things considered, I’d still prefer it if SQL 2008 was delivering better full-text performance than 2005, not worse.

«
»

I will be attending the Microsoft Professional Developers’ Conference this year, courtesy of Microsoft.

pdc-2008-logo

The main reason I am attending is to help Phil Haack deliver a session about ASP.NET MVC.

ASP.NET MVC: A New Framework for Building Web Applications

Monday, October 27th
3:30 pm — 4:45 pm
Room 153

Phil invited me to talk for 15 minutes about the real world use of ASP.NET MVC in stackoverflow.com, and I was happy to oblige him. Phil and I have been friends for a few years, so our relationship predates any of the business stuff.

If you’re attending PDC08 this year, maybe I’ll see you there — and definitely attend our session!

Update: the session Phil and I presented can now be viewed online. (Thanks Zack!)

«
»

We’ve noticed there are a number of users running a script that retrieves their uncompressed user page multiple times per second, producing an absurd amount of network traffic. Fortunately, we do cache the user page if the requests come in anonymously, so the database load was not significant.

However, this behavior is irresponsible and unacceptable, so we will permanently ban any IP we see doing this. We’ve already banned about a dozen IPs for this, and we will continue to do so. If you persist, your account will be permanently deleted. We might even lay down long term IP block bans if necessary.

We would prefer that you use our RSS feeds, or lobby us to improve our RSS feeds, rather than scrape Stack Overflow so aggressively.

«
»

There will be no podcast this week, because Joel is on a business trip to Korea, apparently at webappscon. I can’t remember his exact words, but they were something along the lines of “I don’t care that the listeners want a new podcast, I am not a monkey who dances for their amusement!” I tried in vain to reason with him, but you’ve heard how he is.

code monkey

But seriously, one thing Joel and I want to do in future podcasts is have more guests on the show. While we of course loooove talking about all things Stack Overflow and Fog Creek, it’s also nice to open the floor up a bit and broaden our horizons.

We have a few guests tentatively lined up for future shows through the end of the year:

I’d like to open the comments up to suggestions. Who would you like to hear Joel and I talk with on future Stack Overflow podcasts?

And yes, Stack Overflow rules do apply — the guest does have to be at least peripherally “programming related”, in theory anyway. :)

«
»