I wrote a tutorial on importing the Stack Overflow XML files into SQL Server that includes how to bring it into SQL Server, what the tables and fields mean, and how to query it.
I've also built spwho2.com, a front-end for some of the slicing and dicing I'm doing. I've got reports like:
I'm working on reports about users, answers, badges and comments next.
Update 2009/6/9 - User pages are completely online, although folks with wacko display names are going to have a hard time finding theirs. You can check out Jon Skeet as an example, or go to the home page at http://spwho2.com and click on the first letter of your display name. I'll build a search-box style lookup later. On the individual user stat pages, sometimes the statistics are out of order, and I haven't vetted the statistics yet, so if you see anything odd let me know.
Update 2009/6/11 - On user pages, I've added percentiles so you can see where you rank in relation to other users overall, or for a specific tag. For example, on Jon Skeet's page, you can see that he scored in the 99.89% percentile for Submitted Answers on the .net tag, so you might wonder who beat him. Click on the link for Submitted Answers for .net, and you can see the top 100 submitters for that tag. If you go back to his page, another interesting statistic is that he averaged 0.31 hours to answer .net questions, which might sound pretty good - but he only scored 37.25% there. Click on the link for Avg Hours to Answer .NET questions and you'll see that users are averaging as low as .05. The tricky part to that is that they may have answered a lot less questions, so their average looks artificially low. I still haven't vetted most of these, just having fun building pages out. There's bugs with some tags that have # signs in them, for example.
Update 2009/6/14 - How much would you pay for a web site that does all this? $19.95? $29.95? But wait - there's more! I've added:
- A user & tag search
- On the tag pages, I'm now showing the top 10 related tags for each tag (the tags that also seem to be attached most often to questions)
- On the tag pages, I'm showing the 50 oldest unanswered questions if you want to swoop in and find easy questions to answer. Keep in mind that it's first come first serve - the pages will only be updated when a new database export comes out, so by the time you visit your favorite tag, some of the questions may have been answered by someone else.
Update 2009/6/26 - I'd always had a lot of links in the site back to Stack Overflow, but I've added even more after reading Jeff's post about attribution. The user names are now linked back to their Stack Overflow profile, plus there's a footer on every page that links both to Stack Overflow and to the SO blog. I already have every question linked straight back to the Stack Overflow question - I don't want to include the body of the questions (or the answers) on SPWho2 itself, because I'm not after the Google traffic. I haven't included the questioner names next to the questions yet, but I'll do that in the next round of updates when the next database dump comes out.
Update 2009/7/13 - I just updated it with the June data dump. I haven't QA'd the data due to an upcoming vacation, but hey, that's what you're for, ha ha ho ho.
Update 2010/02/13 - I set up a public-facing SQL Server with the Server Fault, Stack Overflow, Super User, and Meta Stack Overflow data dumps. You can connect to it with SQL Server Management Studio 2008 or Toad for SQL Server. Connection info is at Querying the Stack Overflow Data Dump.