10

Below is a list of some 70GB+ (circa 7GB compressed) of datamined hand histories from 6 different poker sites.

Many people wish to perform research on online poker but do not have the time to datamine millions of hands themselves and don't have the budget to pay for hands. I have made these hands available for free so this research can be performed.

The hands have the table name, hand ID and player names changed. These are always changed to the same string - so all statistics will still make sense.

If you use these hands you must reference HandHQ.com in all forum posts, blog posts, publications or other work that makes use of the hands or information extrapolated from them.

I would like if you post details of any research you perform here also, so others can see what interesting discoveries are made.

If you want to share these hands with others please point them at this thread. Don't just pass on the direct link to the zips.

http://static.handhq.com/hands/obfuscated/ABS-2009-07-01_2009-07-23_1000NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/ABS-2009-07-01_2009-07-23_100NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/ABS-2009-07-01_2009-07-23_200NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/ABS-2009-07-01_2009-07-23_400NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/ABS-2009-07-01_2009-07-23_50NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/ABS-2009-07-01_2009-07-23_600NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/FTP-2009-07-01_2009-07-23_1000NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/FTP-2009-07-01_2009-07-23_100NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/FTP-2009-07-01_2009-07-23_200NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/FTP-2009-07-01_2009-07-23_25NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/FTP-2009-07-01_2009-07-23_400NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/FTP-2009-07-01_2009-07-23_50NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/FTP-2009-07-01_2009-07-23_600NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/IPN-2009-07-01_2009-07-23_1000NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/IPN-2009-07-01_2009-07-23_100NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/IPN-2009-07-01_2009-07-23_200NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/IPN-2009-07-01_2009-07-23_400NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/IPN-2009-07-01_2009-07-23_50NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/IPN-2009-07-01_2009-07-23_600NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/ONG-2009-07-01_2009-07-23_1000NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/ONG-2009-07-01_2009-07-23_100NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/ONG-2009-07-01_2009-07-23_200NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/ONG-2009-07-01_2009-07-23_400NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/ONG-2009-07-01_2009-07-23_50NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/ONG-2009-07-01_2009-07-23_600NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/PS-2009-07-01_2009-07-23_1000NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/PS-2009-07-01_2009-07-23_100NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/PS-2009-07-01_2009-07-23_200NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/PS-2009-07-01_2009-07-23_25NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/PS-2009-07-01_2009-07-23_400NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/PS-2009-07-01_2009-07-23_50NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/PS-2009-07-01_2009-07-23_600NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/PTY-2009-07-01_2009-07-23_1000NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/PTY-2009-07-01_2009-07-23_100NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/PTY-2009-07-01_2009-07-23_200NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/PTY-2009-07-01_2009-07-23_25NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/PTY-2009-07-01_2009-07-23_400NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/PTY-2009-07-01_2009-07-23_50NLH_OBFU.zip http://static.handhq.com/hands/obfuscated/PTY-2009-07-01_2009-07-23_600NLH_OBFU.zip

flag

13 Answers

7

I've worked with David's obfuscated data from HandHQ.com, and have managed to pump out some research which is (hopefully) interesting to academics, poker players and laypeople alike. The biggest challenge I encountered while conducting this research is that PokerTracker did not appear to be equipped to handle multiple datasets of these sizes. However, I'm sure the ingenuity of researchers and advances in software and hardware will ameliorate these challenges in the future.

Here's a link to a preliminary draft of my research:

http://docs.google.com/fileview?id=0B96yNcMkeuFsNDkxMTI3OWQtYmE4MS00ZDdkLWI0NDktZTA0NWZhZDUwMWUz&hl=en

I welcome any and all feedback from anybody interested in these sorts of research. This is a preliminary draft, so please check with me if you intend to use or cite it.

Kyle Siler

Cornell University

Department of Sociology

ksiler[at]gmail[dot]com

link|flag
"The lower proportion of shown down hands offsuit Broadway hands suggests that at higher stakes, players are more likely to fold these hands, which are very conducive to reverse implied odds. As a result, the profitability of these hands increases." I'm confused. Should that say that the profitability of those hands decreases? – Matador Nov 2 2009 at 18:44
Any idea how the site owners feel about this kind of thing? Hopefully the fact that the players and tables are obfuscated prevents this from violating any rules. Thoughts? – SleepyLaBeef Nov 2 2009 at 19:05
Few things: (1) Type page 2: 'eand strategic' (2) IMO, the first comma should be removed: 'Some players use poker as their sole, or a partial means of income, others use it as a lucrative avocation, while others play as recreational gamblers.' – Matador Nov 3 2009 at 2:30
@Kyle Any chance of a Ciff's Notes on your findings? – AA Every Time Nov 3 2009 at 2:34
5

Matador: What I meant there was that at higher limits, players are more likely to fold offsuit Broadway hands, and do so in more appropriate situations. As a result, the observed profitability of offsuit Broadway hands as a whole increases, due to players making more and better laydowns. Presumably, lesser skilled players lose value off these hands by being unable or unwilling to fold them at optimal frequencies for the games they're playing (e.g., the archetypal player who can't get away from TPTK or TPGK). The theoretical profitability of hands and implied odds may change depending on skill and table-dynamics equilibria in a given game, but these data are only able to speak to 'empirical' or 'actual' profitability. Down the road, I might do simulations to get at more theoretical issues.

Also, the prose that you quoted is awfully clumsy; why do I always notice these things after putting them online? Thanks for your critical eye and query. There will be a revised paper posted in the coming weeks.

SleepyLaBeef: Since these poker hands are public events open to observers, and anonymity is being provided to all players, this project should be pretty defensible, at least from the standpoint of university IRB's. I know of at least one site that has tentatively started working with academics, but is moving very cautiously due to legal concerns. I'm not a lawyer, so I'm not sure what those concerns are, or if and how they can be assuaged. The internet is a new frontier for law, especially when tied with online poker, so these issues might take some time to iron out.

Erring on the side of caution, in my own paper, I do not reveal the name of the site from which I received anonymous hand histories via HandHQ.com.

link|flag
Good writing takes work. On the whole it's good writing IMO :) – Matador Nov 2 2009 at 22:51
5

Apologizing in advance for the brevity of my explanations and my tongue tied statistics writing voice; I have feedback about Kyle Siler’s paper. There are measurement and data structure issues associated with the second regression analysis of big blind win rates (BBWR) on hand type (HT).

The data have a nested structure; each individual has only one BBWR which is used for every hand. So in the data file for the regression of BBWR on HT the dependent variable BBWR is assumed to have persons * hands played observations instead of just as many observations as there are people. More seriously the relationship between BBWR and HT could vary between people. Unlikely, but also serious players may learn over the course of the data. A multi level regression analysis (consult your local statistics support people) should address these issues and increase the study’s power.

The big measurement issue is that in poker most of the time we lack information about the hands players are holding. The sample of hands we see is not a random sample but instead is a convenience sample of the hands people actually play that actually get shown down. Also see my comment about how the hole cards in hands where players go all in before the river may not have been recorded. The meaning of HT needs to be thought about and included in the paper. I think I can come up with a couple pages on the topic myself 

A very small measurement issue that most would overlook but I would like to point out is that categorizing hand type does have some consequences. Independent variables in regression analysis are assumed to be measured without error. Are seven’s the same as jacks? Is hitting trips with sevens more likely to give a big payoff than hitting trips with jacks?

Another measurement issue is that rake should be calculated each hand so that a players actual rake can be calculated. It’s possible that player strategies affect the average rake. Uncalled bets aren’t raked so tight, aggressive players who make relatively few and large bets should have lower average rakes than loose players.

link|flag
3

AA - Good idea:

The main point I'm making is that poker is a challenging game, not just due to the architecture of the game itself (e.g., reading betting patterns, calculating pot odds, etc.) but due to the risk structures it presents players with. These incentives often go against how humans handle risk in their lives, particularly in regards to monetary issues.

Empirically, the main findings are:

I. Winning higher percentages of hands are negatively associated with profitability. This seems counter-intuitive, as you play every hand to win it. However, economic research has shown that humans tend to overweight frequent small wins vis-a-vis infrequent large losses. Relatedly, since humans are generally risk-averse with perceived gains and risk-loving with perceived losses, this also explains how losing/tilty sessions can spiral out of control for some players.

Of note, this trend attenuates (but remains significantly negative) at higher stakes, pointing to the importance of stealing pots to retain an edge at these stakes. As an example, this suggests that one cannot merely set mine and wait for monsters at higher stakes and remain profitable.

II. Different types of hands have different values at different stakes. The result that stood out most to me is that at low stakes (NL50), small pairs (22-77) actually were more profitable than medium pairs (JJ-88). This likely has to do with the skill of the players in these games, as opposed to special dynamics in NL50 games. Small pairs have less ambiguous value than medium pairs, and it is difficult for less skilled players to correctly take advantage of thin value bets under conditions of uncertainty. Put differently, folding twos after the flop without hitting a set is a much easier decision than deciding what to do with medium pairs.

In other words, like a marginal utility curve in economics, it's fairly easy to skim value off of sets and flopped flushes (albeit not quite as easy to maximize value), but being able to optimally adjust one's betting, calling and folding frequencies with 99 on a board with two overcards is both trickier, and a thinner source of value. However, as one moves up stakes, these thin and more uncertain sources of value are your only remaining sources of profit against stronger players.

Further, suited connectors had less value at high stakes (NL1K), but since the anonymized data is restricted to shown down hands, this is likely a result of a greater propensity of players at NL1K to push draws, and players at lower stakes to be given drawing odds to see if their draws come in. The greater value for other types of hands at high stakes may be derived from the quasi-sacrificial lambs of suited connectors (Phil Gordon identifies this as Prahlad Friedman's strategy in his Little Green Book).

III. TAG play is more profitable and prominent at low stakes. As one moves up stakes, LAG play becomes more prominent and profitable. Once again, this is likely primarily a function of the skill of the players involved. Low-stakes LAGs are are more likely to be playing for fun; high-stakes LAGs are often aggressive and clever, exploiting the risk-aversion of other players. Changing dynamics of the game may also explain why there appears to be a growing niche for LAG play, as fold equity is probably more prominent with more skilled players.

At higher levels, TAG strategies still can be successful, but occupy a smaller niche of the top winning players and a larger proportion of the top losing players than at lower stakes. This suggests that as one moves up levels, poker becomes less strategic and more tactical (since if played very well, a large variety of strategies at NL1K have been shown to be successful). Further, a grinding player may get to the big game by set-mining and playing patiently, but unless they're really good at it, they're going to have to modify their previous winning (and presumably ingrained) strategies in order to keep moving up limits.

Anyhow, those are the main points I'm inclined to emphasize. Thanks a lot for your interest!

P.S. - Is there any way I can continue use my name as a login? I couldn't figure out how to login to edit my old post.

link|flag
Re logging in. Next time you log in click the Google button. This will associate your account with your Google/Gmail OpenID account and you will always be able to log in as the same Outflopped user. – Mr. Flibble Nov 3 2009 at 15:18
@Kyle. I've merged your accounts and associated with your google openid. Please let me know at outflopped.uservoice.com if there is any problem. – Mr. Flibble Nov 4 2009 at 2:00
Doesn't your work depend on hands shown down, and therefore have a huge bias against hands which make bluff-catchers? You aren't seeing the times players miss draws and fold. Missing draws could be highly unprofitable, balancing the times those hands go to showdown after hitting. – Douglas Zare Mar 12 at 20:26
Douglas Zare is right. This is the big measurement problem I was refering to above. Its like the parable of the blind men and the elephant. In the parable each blind man touches a different part of the elephant and comes to a conclusion about what animal the elephant resembles. When the blind men compare notes they realize that they all came to different conclusions and none of them was right. – mazoula Jun 13 at 7:49
3

I have some notes about the poker files. First the poker files do not contain starting chip information. Statistics like m and average table m, and player-is-all-in cannot be computed. Second the hand numbers have been masked. The only source of time information I have been able to find in poker hands is the hand number. Presumably higher numbered hands start after lower numbered hands. For those who needed time information and couldn't find it, your welcome :) Hand number could be almost as well masked by subtracting a particular hand number as a reference point. Then the hand with the larger hand number occurs after the hand with a smaller hand number and the time information contained in hand number is retained.
Third Hole cards that are revealed before showdown may not have been recorded. I may have just been unlucky but while reviewing hand histories I found lots of mysterious big preflop bets(no idea if players where all in without starting chip info) followed by checking to the river and no hole cards listed but no cases of hole cards being listed if there wasn't a post river call. Can anyone confirm whether holecards are recorded if the players are all in before the river?

link|flag
2

Hi Kyle

Interesting research, there is much to learn from such large data sets. One remark, regarding the observation that the top 20 winners tend to have marginal win rates: One explanation is indeed that winners tend to move up in stakes, but also there is the multitabling tradeoff. I can win maybe 8 BB/100 at 50NL singletabling, but I can also play nine tables simultaneously with, say, 3 BB/100, obviously generating a huge profit in doing so, even while cutting my winrate. The players who win the most at any given stake are by no means necessarily the most skilled players, and they are not necessarily choosing the strategy that is most profitable if singletabling. The more tables you play, the tighter you'll have to play to keep it up, so certainly the players with the most hands played will be much tighter than average.

Regards,

Jens

link|flag
Jens - Good point! I'd imagine that multi-tabling is a bit more of an issue at lower stakes (are there people who 9-table NL1K profitably?). Players try to find the optimal point between win-rate and volume of tables. It's definitely an explanation I'll add to subsequent drafts of my paper. – Kyle Siler Nov 14 2009 at 15:54
2

@DaWeef -

Thank you for your thoughtful comments. Substituting percentile ranks as opposed to raw scores is an interesting idea. My biggest reservation is that it would make comparing behaviors across stakes difficult (e.g., a 55%ile AF in NL50 might be 40% in NL1K). Still, computing percentile scores within stakes may be a good idea. I'll acknowledge that my cutoffs at PFR 25 and 35 were arbitrary guesstimates (and might be slightly on the high side), but as per the distribution of strategies shown in Tables 1a, 1b and 1c, I think they did a reasonable job of differentiating players in different pools. Further, having slightly high boundaries also gives me the latitude to better differentiate loose players.

Another solution I've considered is running interaction effects between PFR, AF and Hands Raised, as opposed to the user-defined PokerTracker categories. My reservation with this, is that it would make the paper less accessible and more esoteric for non-academics, which is the opposite of what I want to (and according to journal editors, need to) achieve in subsequent drafts of the paper.

Thanks again for giving me some good food for thought. I'd be interested in seeing your research, and I'll definitely keep the theoretical and empirical issues you've raised in mind when writing my revisions.

link|flag
2

I have also been working with the data generously provided by HandHQ. I am mainly working towards building a website for rake(back) comparison. I will post a link when it is finished.

@Kyle:

I have read your research and I really like the general idea behind it. Poker is in my opinion a beautiful example of complex game-theory. Also the fact that it is one of the few examples of (confined) complex game-theory for which empirical data is abundantly available, I am surprised there is not far more research on the subject.

However... here it comes ... I have some critical remarks:

You are completely right when you mention that:

"As Boyd (1976) observes, someone labeled a „tight‟ player in one game, might be a „loose‟ player in another game, while implementing the same strategies."

However when you start rating players (Appendix A) you seem to have forgotten this earlier statement. I can not find proper argumentation for the thresholds you suggest in Appendix A. From personal experience with the data (and playing poker myself) I know that average % of hands played for 6max is approximately 25-26%. This means that by your definitions, hands played by tight players are strongly overpopulated. Also you do not adjust your definitions for the different stake levels you analyze, which again does not follow your earlier statement.

With looseness being relative, I would suggest working with percentiles (33th and 66th) in defining your thresholds.

For the record, I doubt it will significantly change the outcomes of your study.

Finally, I would suggest adding graphs of distributions for % hands played, % hands raised and aggression.

link|flag
Hi DaWeef - My answer was too long for the comment box, so see below for my reply. – Kyle Siler Nov 20 2009 at 19:02
2

David B,

Would you mind if I created a torrent from these hand history files and posted it to this thread and a torrent indexing site? This would lighten the load on the handhq servers as well as give more people access to this useful data.

link|flag
I've no problem with that. Thank for asking. Please reference this thread and handhq.com in the torrent. – David B Feb 7 2010 at 14:20
1

Huge props to the folks over at HandHQ for making these available. Currently working on a research project involving these hands, which will soon be available.

link|flag
0

hi .... i made a mistake and import the hands to my database an a lot of info with bv2v32b343n45345k3j453 names.... how can i erase this ??

best regards

Danilo

link|flag
I suggest creating a new question to ask this. If you tell people what type of tracker you are using they will be able to help better. – David B Nov 9 2009 at 14:26
-2

These hand histories are invalid.

It's impossible to have a player named 'QQvWnnVEqw72lCcpTzhUiw' on PokerStars.

Certain trackers shall refuse to parse this and they're not incorrect in refusing to parse this.

Btw, you're artificially inflating the size of the HHs by using such long names whill shall considerable slow down parsers.

Using only a-zA-Z0-9 and staying in the 12-characters limit would create valid hand histories and still allow you to have gazillions of different names.

Sadly the hand histories you generated there are invalid.

Could you please recreate them while respecting the site's format?

Thanks a lot for your efforts!

link|flag
"It's impossible to have a player named 'QQvWnnVEqw72lCcpTzhUiw' on PokerStars." The player names are changed from their real names so that they cannot be identified and the hands cannot be used for real play. I tested them with Poker Tracker and Holdem Manager and they import ok. Please tell me which tracking aplication has a problem with them and I will look into it. – David B Nov 7 2009 at 15:54
There are technical reasons for giving a long user name. The obfuscation algorithm creates a hash of the username. If the resulting hashed string is shorter there is a higher likelihood that the there will be collisions, where two different names give the same result when hashed. I agree that I may have gone overboard with the name length, but as neither PT nor HEM had a problem importing these names I saw no reason to shorten them. – David B Nov 7 2009 at 15:55
I would have done it differently: instead of generating what looks likes a 128-bit Base64 encoded hash, I'd have kept all bi-dir mappings [player name]--[obfuscated player name] in a DB and make sure, for each new name found, to generate a unique mapping. There are, what, a few millions players in the hands here? It's not complicated to keep track of the mappings and that would have resulted in much smaller file size: with five Base64 characters I could be encoding 1 billion player names without any collision:) Smaller file size, faster download, faster import, certainty of no collision. – AnonymousCoward Nov 8 2009 at 13:11
That was an option I considered. I decided against using a lookup table as I would then have to keep the table around for as long as I wanted to be able to generate hands. If I want to create new hands in 6 months I can do so now without needing the original database of names. It just seemed safer not to be tied down to a lookup table that I would have to keep safe. So what trackers don't parse the hands? – David B Nov 9 2009 at 14:23
I'm not disagreeing that it would be better to have shorter names BTW. It would! But as long as they work, then I don't plan to change it. If it's a big problem anyone can send the hands through their own app to shorten the names. – David B Nov 9 2009 at 14:35

Your Answer

Not the answer you're looking for? Browse other questions tagged or ask your own question.