Lee Sedol defeats AlphaGo in masterful comeback – Game 4

Expectations were modest on Sunday, as Lee Sedol 9p faced the computer Go program AlphaGo for the fourth time.


Lee Sedol 9 dan, obviously relieved to win his first game.

After Lee lost the first three games, his chance of winning the five game match had evaporated.

His revised goal, and the hope of millions of his fans, was that he might succeed in winning at least one game against the machine before the match concluded.

However, his prospects of doing so appeared to be bleak, until suddenly, just when all seemed to be lost, he pulled a rabbit out of a hat.

And he didn’t even have a hat!

Lee Sedol won game four by resignation.

A new game plan

Having lost the match already, the heavy burden of expectation had been lifted from Lee’s shoulders and he was now able to play more freely.

Once again, Lee reviewed the previous game well into the night with other top professional players, looking for a chink in AlphaGo’s considerable armor.

On Sunday, March 13, 2016, he arrived at the Four Seasons hotel ready to do battle with AlphaGo once again.

Amashi strategy

The game plan they came up appeared to be to try a type of ‘amashi strategy’, which is among the more extreme styles of play (but is still a valid approach to the game).

To put it simply, it was close to being the opposite of Lee’s strategy from the previous day.


Lee Sedol plays his first move as White, in his fourth game against AlphaGo.

In game three, Lee (as Black) developed a large sphere of influence in the opening, provoking AlphaGo to dive in and bear the brunt of a severe attack.

While this was a good strategy for Black, AlphaGo managed the position surprisingly well. So well, in fact, that it was quickly able to shift out of a defensive stance and counter-attack.

In contrast, Lee’s strategy in game four was to take hard profit (territory) in the corners and on the sides of the board, allowing AlphaGo to develop influence over the center in compensation.

While Lee (playing White) was squirreling away ‘cash’, AlphaGo was making a promising but uncertain investment in the future potential of the top and the center of the board.

A future which Lee was planning to bet against.

Crashing the market

So why did Lee choose this plan?

Well, we know from the first three games that AlphaGo is very good at estimating its probability of winning.


Aja Huang plays the first move for AlphaGo, in game four with Lee Sedol.

It appears to be able to do this even more accurately than the best human players.

With the help of this skill, AlphaGo seems to be able to manage risk more precisely than humans can, and is completely happy to accept losses as long as its probability of winning remains favorable.

The Japanese have a name for this style of play, as it closely resembles the prevalent style of Japanese professionals over the previous few decades.

They call it ‘souba‘ Go, which means something like ‘market price’.

The essence of the idea is that you accept whatever seems to be the fair value of a position, rather than wagering the whole game on a single, complicated negotiation.

This typically leads to a drawn out game where you seek to get the slightly better end of the deal in the majority of trades, and end up with a winning position in the endgame.

As John Fairbairn has pithily put it, it’s like trying to win by arbitrage.

Of course, stock traders don’t stand a chance of beating modern trading algorithms at their own game, so we shouldn’t expect Go players to do so either.

What Lee and his friends had realized, was that they needed to completely upend the market.

Going all in

With his cash firmly secured under his mattress, Lee invaded Black’s large sphere of influence deeply with White 40, brazenly daring AlphaGo to attack him.

Continuing up to 46, Lee lightly scattered stones throughout AlphaGo’s potential territory at the top.

He was not intending to save any particular stone. He only sought to flexibly establish a presence in this space and tank AlphaGo’s earlier investment.

This came across as somewhat unreasonable, as AlphaGo had paid Lee good money for that potential!

However, this was what Lee wanted — to force an all or nothing battle where AlphaGo’s accurate negotiating skills were largely irrelevant.


Aja Huang and Lee Sedol bow at the beginning of the fourth game.

AlphaGo didn’t have much choice in the matter. Either it could fight back and seek to gain compensation by attacking, or it could lose.

Attack light stones on a large scale

If AlphaGo had attacked any of Lee’s scattered stones directly, he would have been delighted.

It would be easy for a player like Lee to dodge any straightforward attacks, and leave AlphaGo with relatively little to show for its efforts.

A better strategy is to attack indirectly, threatening to surround all of the invading stones on a large scale and swallow them whole.

This puts more pressure on the opponent to defend somehow and was exactly what AlphaGo did with the shoulder hit at Black 47.

This move leaned against White’s stone at R11, while eyeing White’s stones at the top indirectly. It was similar to a strategy Lee had tried on the previous day.

In the moves that followed, Black sought to get in front of White, preventing Lee’s group on the right side from connecting up with and rescuing its allies in the center.

This tactic was a success for AlphaGo up to Black 67, at the cost of an acceptable four stone sacrifice on the right side.

AlphaGo took the lead with the knight’s move at Black 69.

A brilliant refusal to trade

Rather than completing the transaction, by playing 70 at O6, Lee immediately reduced Black’s potential territory in the center with White 70.

AlphaGo defended firmly with Black 71, which appeared to say “I’m winning,” but Lee probed its weaknesses fiercely from 72 to 76.

Finally, as commentators were lamenting that the game seemed to be decided already, Lee unleashed a brilliant tesuji at White 78 – the only move that would keep him in the contention.

AlphaGo failed to play the best response with Black 79, and its stocks suddenly crashed to pennies on the dollar.

Tesuji are the close range tactics of Go. If you imagine Go as a mental martial art, tesuji are the techniques of hand to hand combat. They are clever moves which contribute to Go’s beauty as an artform.

Strong players can usually spot a tesuji in the blink of an eye, through years of training and experience. So, I imagine, can AlphaGo’s policy network, otherwise it wouldn’t be able to play so well.

But not all tesujis are equal. Some can be found by most players. Others are so rare, so exquisite, that even the majority of professionals don’t see them.

The move Lee played was the latter kind.

Most other professionals who were commenting the game live didn’t see the move.

Gu Li 9p, a top Chinese professional who is Lee Sedol’s friend and fierce rival, was commenting on the game in China.

Gu described White 78 as the “hand of god,” and claimed that he didn’t see the move coming. Neither, it would appear, did AlphaGo.


Ha hojeong (left) and Song Taegon, Korean commentators for game four.

AlphaGo blows a gasket

After 79, Black’s territory at the top collapsed in value.

White’s invading stones had managed to escape through a hidden tunnel, and White 92 made miai of H14 and J10 (meaning White could play one or the other).

This was when things got weird. From 87 to 101 AlphaGo made a series of very bad moves.

We’ve talked about AlphaGo’s ‘bad’ moves in the discussion of previous games, but this was not the same.

In previous games, AlphaGo played ‘bad’ (slack) moves when it was already ahead. Human observers criticized these moves because there seemed to be no reason to play slackly, but AlphaGo had already calculated that these moves would lead to a safe win.

The bad moves AlphaGo played in game four were not at all like that. They were simply bad, and they ruined AlphaGo’s chances of recovering.

They’re the kind of moves played by someone who forgets that their opponent also gets to respond with a move. Moves that trample over possibilities and damage one’s own position — achieving less than nothing

The game continued for another 80 or so moves, but it ended with AlphaGo’s eventual resignation.

A jubilant Lee Sedol had scored his first win against the machine.


Chris Garlock (left) and Michael Redmond at the beginning of game four.

What happened to AlphaGo

The question on many observers’ minds right now must be, what happened?

AlphaGo’s lead developer, David Silver, attended the post-game press conference, but he didn’t explain the problem in detail.

Silver and Hassabis said that they would have to wait until they returned to their office in the UK (after the match) to analyze the problem in detail.

We, in turn, will have to wait until they do so before we find out what happened.


From left: Lee Sedol, Demis Hassabis and David Silver, after Lee Sedol defeated AlphaGo.

Some educated guesses

We can’t yet know for sure what went wrong, but we can make some guesses.

We’ve seen these kinds of moves before, played by other Go AIs. They seem to be a common failure pattern of AIs which rely on Monte Carlo Tree Search to choose moves.

AlphaGo is greatly superior to that previous generation of AIs, but it still relies on Monte Carlo for some aspects of its analysis.

My theory is that when there’s an excellent move, like White 78, which just barely works, it becomes easier for the computer to overlook it.

This is because sometimes there’s only one precise move order which makes a tesuji work, and all other variations fail.

Unless every line of play is considered, it becomes more likely that the computer (and humans for that matter) will overlook it.

This means that approximation techniques used by the computer (which work really well most of the time) will see many variations that lead to a win, missing the needle in the haystack that leads to a near certain loss.

Going crazy

I’ve seen this happen many times when I played games against Monte Carlo bots for fun.

After I played a tesuji which the computer apparently didn’t see, it suddenly started making bizarre moves and wrote off the game.

It’s not possible to make this happen in every game, but it happens often enough that Younggil and I began to describe it as “going crazy” in honor of the computer Go program Crazy Stone.

I’m not sure why exactly this happens, and I hope the AlphaGo team or more knowledgable readers will chime in with their own thoughts.

Win small or lose big

As we’ve discussed before, the algorithms which guide computer Go players seek to maximize the probability of winning. The margin of victory or defeat is irrelevant.

This leads to a behavior where computers usually “win small, or lose big”. When computers are behind, they takes risks in an attempt to catch up, sometimes crazy risks which make it easier to shut them out of the game.

For the most part though, this is the behavior you want to see. Computers will never lose quietly like humans sometimes do.

The very bad moves, however, may be caused by something like the horizon effect.

When the computer’s prospects of winning suddenly plummet, after the opponent plays an unexpected move, there isn’t any variation that reliably leads to a win.

What Monte Carlo AIs sometimes appear to do in these kinds of situations is play meaningless sente exchanges which don’t achieve anything except to defer the loss until later on. To push it over the horizon, perhaps? Or, to win if the opponent fails to answer correctly.

This theory may be totally wrong, and is based on experience with the previous generation of Go programs.

I don’t know for sure whether it applies to AlphaGo at all. If you have a better explanation, or care to speculate, please let us know.

Can Lee Sedol do this again?


Lee Sedol after defeating AlphaGo in game four.

Certainly it’s possible, but it may be very difficult.

You need a special kind of position to pull this kind of stunt, and such situations don’t arise in every game.

AlphaGo may also be more resistant to this problem than previous Go AIs, because it also uses its policy network to select moves.

However, it must be pointed out that the policy network failed to properly score White 78 in the first place, probably because it was such an unusual move.

We will have to see what happens on Tuesday when Lee plays the final game of the match. If anyone can repeat this feat, he can.

Brief analysis of game four

Here is An Younggil 8p’s preliminary analysis of game four. Further game commentary will be posted over the coming week.

Déjà vu

The opening up to Black 11 was exactly the same as in game two. Lee thought the opening in game two was good for White, so he chose this path once again.

White 12 was played to compel Black to extend at the bottom. If Black doesn’t play at 13, White’s pincer at K4 will be more powerful than in game two.

Black 23 and 25 were interesting, and White 26 to 39 was part of Lee’s strategy to take solid territory during the opening.

White dives in

White’s invasion at 40 seemed to be premature, and the sequence from Black 47 to 53 took control of the game.

Fighting spirit motivated White 62 and 64, but Black took the lead up to 69.

Lee Sedol’s brilliant move


Lee Sedol 9 dan.

After some preparation from 70 to 77, White 78 was a brilliant move which seemed to provoke a strange miscalculation on AlphaGo’s part.

White 82 was an excellent followup, but Black 83 to 85 were also well timed exchanges.

Black 87, 89, 97 and 101 were incomprehensible, and Black 105 was the last losing move. It should have been at J12 to capture White’s stones and redress the balance of territory.

If Black had played 87 around J14, the game would still have been playable.

A dramatic reversal

The game was reversed by White 92, and White established a clear lead with 110.

White 126 was a very big move which helped to ensure White’s advantage.

Black’s endgame from 131 to 141 was perfect, and the game became a bit closer again.

However, but White 144 was big, and Lee Sedol’s endgame was accurate under the time pressure.

Lee Sedol’s masterpiece

This game was a masterpiece for Lee Sedol and will almost certainly become a famous game in the history of Go. After his brilliant move at 78, Lee’s play was perfect.

Lee entered byo-yomi (one minute per move) at move 90, but his moves were calm and solid and his mental state was as hard as rock. It was very impressive to watch.

Lee’s strategy in this game ended up working well, and it looks like he found at least one of AlphaGo’s weaknesses.

Let’s see what happens in game five!


Lee Sedol 9 dan: Looking forward to his final game against AlphaGo.

Don’t miss the last game

The fifth and final game will be played on Tuesday March 15.

Can Lee Sedol pull off another win?

Check our match schedule for details and visit the DeepMind AlphaGo vs Lee Sedol page for regular updates.

Subscribe to our free Go newsletter for weekly updates, including news and detailed commentary of the AlphaGo match.

Game record

Lee Sedol vs AlphaGo – Game 4


Download SGF File (Go Game Record)


More photos


Related Articles

About David Ormerod

David is a Go enthusiast who’s played the game for more than a decade. He likes learning, teaching, playing and writing about the game Go. He's taught thousands of people to play Go, both online and in person at schools, public Go demonstrations and Go clubs. David is a 5 dan amateur Go player who competed in the World Amateur Go Championships prior to starting Go Game Guru. He's also the editor of Go Game Guru.

You can follow Go Game Guru on Facebook, Twitter, Google+ and Youtube.


  1. Anonymous says:

    What would have happened if Alphago had played L10 instead of K10 (79th move)? I can’t find any good continuation for Lee !

    • Benjamin says:

      It look like white can capture the two stones at G14.

    • W G11, B F10, W H14, B H13, W J14

      • This doesn’t work – B H12. White needs J11 and K12 in sente. (but black doesn’t have to answer K12, since black can just block if white takes). There’s another variation I’m looking at that seems to work, and I’ll post it on reddit if it does.

    • Yes, that’s what the Kim Myungwan 9p and Haylee 3p said on the YouTube commentary on the game:


    • Younggil An says:

      Thanks for the question and the answers.

      As a result, Black should have played at L10 for Black 79, and White would have still been in trouble.

      White 78 was the best move in that situation, and pros found Black’s best responses afterwards.

      I couldn’t find that move, and most of other commentators didn’t think about that either by that time.

      AlphaGo might have read that Black 79 was better than L10, and it looks like she tried to avoid the ko in the center.

      However, AlphaGo didn’t seem to consider Lee’s continuation especially White 82, and the game was reversed here.

      You can have a look at the variation from move 79.



  2. I can explain why it goes crazy.

    The MC algorithm maximizes probability of winning. So while Alpha plays super-conservatively when it is ahead – choosing to win 0.5 pts with 90% chance than 100 pts with 89% chance – it implodes in reverse.

    If the situation gets bad enough that conventional moves do not allow victory. AlphaGo is likely to seek out ‘wishful thinking’ moves. Moves where is the opponent doesn’t play a particular move, they die. i.e., capture races, ko threats etc. This is because these moves look like they have high % of win – as Alphago can win if the opponent plays the wrong move.

    This, of course, does not work. And as the game swings less and less favorable, the right move becomes more and more obvious. This is going ‘crazy’.

    • Warren Dew says:

      This is it. And for a more explicit example, look at move 177. If the opponent plays 178 somewhere else, then AlphaGo can capture at P13, winning the game.

      In human reality, 178 is so obvious that 177 is worse than pointless; in fact, it’s a bad move because it wastes a ko threat. In the computer’s reality, though, the chances that the human misses 178 outweigh the chances that the ko threat will be needed, so the computer plays 177.

      I think this likely also accounts for the other cases of “crazy” moves: the computer thinks there’s a small chance of the human missing the correct response, thus putting the computer on the path to victory, and that small chance outweighs any other disadvantages to the move.

      This especially makes sense given that AlphaGo’s recent training has been against itself. It’s extremely unlikely that one can win a lost position against AlphaGo by gradually rebuilding territory, because AlphaGo will play conservatively enough that you won’t be able to get the final point that you need. Better to hope for just one mistake from the winning AlphaGo – but a mistake big enough to turn the game round, rather than just making up only part of the ground that’s needed.

      This appears to me to be a fundamental weakness of monte carlo tree search, neural nets or no neural nets, that can’t be eliminated entirely.

      • Jim Balter says:

        It doesn’t have anything to do with monte carlo search, just with the decision algorithm, which is fixable. If Alphago is convinced that it is lost, it should still assume that its opponent will play the best move on the next move, and the next after that, up to some threshold N … this means that it plays moves that can’t win against perfect play (when there is no moves that can), but can win if the opponent fails to play perfectly for the next N moves.

        • Anonymous says:

          This would essentially be a change, or at least a partial change, from monte carlo search to alpha beta pruning, so it does have to do with monte carlo search.

        • Anonymous says:

          As the latter cannot be done in Go (for a reasonable number of moves), the decision algorithm in AlphaGo is called Monte-Carlo-Search.

    • While what you say may be the case, I would highly doubt the scenario of “choosing to win 0.5 pts with 90% chance than 100 pts with 89% chance” especially when not in endgame.

      If the difference is 89% to 90% but giving up 99 points, then that seems mathematically impossible. What I mean by that is there is no way you can say that you have a 1% better chance of winning (but essentially have ZERO winning margin) as opposed to having a ton of winning margin.

      So this scenario simply can’t happen. The 1% greater chance of winning with ZERO margin is a nonsense calculation that the AI would never make.

      A more realistic scenario is 90% winning with 10 stone margin vs 85% chance winning with 20 stone margin.

      • Warren Dew says:

        I think the scenario can happen. For example, perhaps you are currently winning by 100 points and have a 99 stone dragon that is threatened. You can try to save the dragon with a chance of victory of 89%, or you can sacrifice it by instead playing a move that cuts off other losing lines, increasing your chance of victory to 90%. This can especially happen given the way that monte carlo tree search evaluates “chance of victory”.

      • Jim Balter says:

        “that seems mathematically impossible … there is no way … this scenario simply can’t happen … a nonsense calculation …”

        Sorry, but you’re quite wrong. And in any case, Mile Gu’s numbers were exaggerated (but still possible) simply to make the point that the number of points is irrelevant, only the sign … a win is a win and a loss is a loss.

    • Younggil An says:

      Thanks Mile, Warren and Anon for your explanation about the AI and the system. They’re very helpful for me. :)

  3. Anonymous says:

    The link to the SGF file seems to be broken. Can you investigate this? Thanks

    • David Ormerod says:

      Thanks for letting me know.

      This happened after game two as well, because the server was under heavy load and an error page was cached on the CDN in one region.

      I’ve just flushed the cache for the SGF file and hopefully it will load properly this time.

  4. 8139david says:

    When AlphaGo thinks it’s behind, it makes bad moves.
    That’s something to improve.
    At least if the goal is to consistently beat humans.
    If the goal is to train the best baduk/(i)go/weichi program, then maybe one should let it learn this way.
    After all, the results are already incredible, apart from today’s mystakes.

  5. 78 was a wonderful move, but isn’t it objectively still a losing move? Black seemed to have some counters available.
    Maybe I’m in the minority, but I don’t care for a bad, clever move turning a game around. I’d like to see a detailed analysis of the position at that point of the game.

    • InfinityMan says:

      I recommend watching the Official AGA Go Channel on YouTube. The pro commentator Myungwan 9p does an excellent game evaluation in real time. He anticipates the options the players may choose, he measures the values of current moves by counting, and investigates the possible continuations. I believe you can review the recorded videos of all four games on this YouTube site.

      • Younggil An says:

        Thanks InfinityMan for your answer with sharing useful information. I heard that Myungwan found a nice defending move for Black 79, and that was correct.

        By the way, White 78 was still an excellent move, and Alphago had difficulties to manage the situation against that unexpected tesuji.

    • No it is not an objectively losing move. You simply have no idea what you are talking about. Right after it was made, Michael Redmond and other pros recognized it as a brilliant move that opened up all sorts of possibilities. You can read and listen to other Pros as well commenting about the move live.

      But for sure, it is not a “perfect” move with no possible counters. But the fact of the matter is no such moves generally exist. I mean in any kind of game, Go or Chess or whatever, you can always find a way to counter the winning moves in hindsight with perfect play. No matter what move you make, you can always find counters, especially if the game is still in its early phases and you have complete hindsight of all the moves after they are played.

      But one of the things about Go, compared to most other games is that perfect play is impossible. So if your criticism of Lee is that he could have lost nevertheless if the opponent played perfectly and could perfectly see all possibilities then that’s the point. It isn’t really possible to do that. No human could ever do it and not even Alphago can do it. So it is unfair criticism of both Lee and Alphago that they are not able to play perfectly.

      • Also even if move 78 was indeed a perfect move with no counters (which it is not), one could simply shift the goalposts and say there was clearly a mistake in moves 1-77 allowing for such a move by Alphago. The absurd conclusion would then be that no winning player played a great game because he could have been beaten if the opponent played differently even going all the way back to Move 1.

        • Eric Yoder says:

          This second explanation is better to me. Lee certainly fell behind long before move 78, so to call move 78 a losing move makes no sense, Lee had already reached a losing position. Your absurd conclusion isn’t wrong, if we knew perfect play we’d know that either black or white is winning with perfect play from move one. (or incomplete game due to multiple kos)

      • Younggil An says:

        Thanks Anon so much for your detailed analysis of the move. I totally agree with you.

      • Jim Balter says:

        ” But the fact of the matter is no such moves generally exist. I mean in any kind of game, Go or Chess or whatever, you can always find a way to counter the winning moves in hindsight with perfect play. No matter what move you make, you can always find counters”

        This is silly nonsense. There are many moves that win decisively … this is particularly true of Chess.

        • The point though is that most moves are not decisive. At least not in the sense of no counter possible.

          You can’t compare with Chess which has far fewer moves possible and thus far fewer counters.

          • Jim Balter says:

            “The point though is that most moves are not decisive. At least not in the sense of no counter possible.

            You can’t compare with Chess which has far fewer moves possible and thus far fewer counters.”

            No, the point is that what Anon wrote, “in any kind of game, Go or Chess or whatever, you can always find a way to counter the winning moves in hindsight with perfect play” is clearly wrong, just as I said.

            • It is fairly wrong in Chess but much less wrong in Go. There are fewer ways to come back from a “winning move” in Chess but in Go, especially in early enough game, there are much more chances to either counter or recover elsewhere. It doesn’t mean such decisive winning moves don’t exist in Go just that there are fewer of them.

        • The broader point I think is that you can’t judge a move solely based on whether it can be countered. Unless the counter is obvious, a move that gains advantage and has easily missed counters is still a great move. Judging moves as only ones with no counter would be an unfair burden that amounts to discounting any victory by arguing it was only due to lack of perfect play by the opponent.

    • Warren Dew says:

      I too would be interested in an opinion on this from An Younggil. And if it is a winning move, could it have been prevented on 77 or an earlier black move?

      • Younggil An says:

        Thanks Warren for your question.

        Black 77 was fine, but it’s proved that Black 79 was a mistake.

        I made a variation with a comment above, so you can have a look at it.

        Black could have prevented White’s brilliant followup at 82 in that variation.

    • Jim Balter says:

      “78 was a wonderful move, but isn’t it objectively still a losing move?”

      Regardless, it was the best available move, and Alphago failed to see how to beat it.

      “I don’t care for a bad, clever move turning a game around.”

      Me neither, but it wasn’t a bad move, it was a very good move … the best available in the position.

      “I’d like to see a detailed analysis of the position at that point of the game.”

      It’s already been posted above: https://gogameguru.com/lee-sedol-defeats-alphago-masterful-comeback-game-4/#comment-13651

  6. someoneelse says:

    In the AGA commentaries by Myungwan Kim and Lee Hajin they concluded that White’s winning tesuji (move 78 at L11) shouldn’t actually work.
    They said that Black could answer at L10 and kill all the white stones.

    Could you please comment on that variation?

    • Younggil An says:

      Yes, you’re right. I couldn’t find that move (Black L10) when I was watching the game, and I was too rush to write the news of game 4.

      I should have double checked the variations from White 78. Sorry about that.

  7. This post, with respect to the others, seems a bit too much biased against AlphaGo in its tone and this leaves a bit of a bad aftertaste. I’m interested in Go as a game and I don’t care if the game was played by a human or a machine. However this may just be my impression. Thanks for the info and the interesting analysis. As usual excellent content!

  8. This sort of behaviour is inherent to the Monte Carlo method and will probably be very difficult to fix. I’ve talked about it to Go programmers who are of course very much aware of the issue and have tried a number of ideas, but every time they’ve introduced something to control the problem, it degrades the overall performance of the program. We’ll see if the DeepMind team will come up with something.

  9. I think it is a kind of a horizon effect we’ve been seing. It is a bit different for MC type programs however.

    The fairly full searching chess programs really delay unevitable losses in the future by finding more or less useful forcing delaying moves. These problems have nowadays been overcome by making search depth adaptive and by the huge depth these programs search anyway.

    For the MC type programs its exactly what it is stated above. These programs spend more time with the most likely sequence of play which usually contains the common and conservative sequence of moves. When the losing play is along some hard to find narrow variation, then searching the conservative move will finally find that refutation variant after a while and the search starts turning to spend more time to different moves, not cause those are better, merely cause the small refutation variation has not yet (!) been found against that less searched moves. So in these situations the program will simply turn to poorly researched moves and throw out one of those once thinking time has been exhausted.

    But I mean, the horizon can be everywhere in front of you, to the left and to the right, so I find the term horizon effect very correct.

    Obviously AlphaGo has some safe guarding mechanisms against these effects, cause it didn’t happen that often in this match. It could be in that search policy network. But like neural networks are, they are hard to train and hard to predict and it seems like we’ve seen today a breakdown of these safeguards and AlphaGo fell back to behave like older MC searchers.

    Interesting enough it might be a challenge both: To exploit this behaviour and to fix it. But human ingenuity will cope, I am sure!

  10. I think the answer is more simple:
    Since it uses the policy network as a heuristic to search only a portion of the space of possible moves it clearly targets those that it considers likely for a human to play. This is true as it searches its own potential moves as well as the potential moves of its opponent. In Go it can be the case that a counterintuitive move followed by a well chosen sequence can suddenly shift the balance of power. That’s to say Lee played a move (78) that was culled in the search so Alpha Go’s response didn’t consider it. By the time the probability began to swing in Lee’s favour (and Alpha Go started searching his moves instead of culling) he’d effected a big change in the balance of the game.

    Once this happened the accuracy of the predictions of good next moves became unreliable as the board was “unsettled” at which point the heuristic became indecisive meaning moves were chosen at (almost) random. Once Lee gained an advantage the position became more settled and the heuristic became useful again but by this stage it couldn’t reverse the position it’d got itself into

    • Also, don’t forget that part of the goal in this project (Alpha Go) was to produce an AI system approach that could be applied in many fields. The goal here was not so much to make the best possible Go player, but to prove the approach was sound (which I think it has). It is not necessary for the team to “fix” the problem by adding special cases or Go specific logic (thought they may retrain to prove their point further): the neural networks were trained in a certain way and therefore play a certain style of Go (cf. “deep dream” image classifier images that look like there were created by the same artist). In some respects this type of failure (before it went crazy, which is an artifact of the monte carlo approach) perfectly illustrates how “human” (and potentially fallible) the learning process is. In many real world problems perfection isn’t necessarily; there is always “good enough.” In many respects winning against Fan Hui was “good enough” this match was nothing more than the icing on the cake.

      • If Alphago team wanted to test its program they could sent the copy of it to Lee Sedol. In that case he could explore more weaknesses of Alphago.

        • I don’t think their focus was finding weaknesses in the system. The team’s focus is AI and machine learning in general: the Go playing AI was a platform to explore that class of learning AI systems. The game with Lee Sedol was about publicity.

          I understand that now it’s public pride might come into play and cause the team to refine their approach, but I don’t consider this very fruitful unless it is refined by means of generic machine learning improvements. That’s to say I hope they don’t waste too much time down the Go rabbit hole (which is interesting but not really their, or Google’s, true area of interest) and get to applying these methods to other problems in the domain.

          In case it reads that way, I’m not saying this type of system for playing Go isn’t worthwhile; I believe it should be pursued as it can help drive the game forward, much as advanced chess playing AIs did for chess. I’m just saying that Google/Deep Mind isn’t necessarily the place for such development to occur.

        • Jim Balter says:

          “If Alphago team wanted to test its program they could sent the copy of it to Lee Sedol. In that case he could explore more weaknesses of Alphago.”

          Um, Lee Sedol does not have the skills to find weaknesses in an AI by reading its source code.

  11. I know that I will seem excessive, but I think this was the most important victory for the GO game itself
    I was sad watching Cho Hyeyeon 9p who has devoted all shes life to the game saying disappointed after game 3 … what is the point in continuing to play GO now that computers play better than us?
    I’m glad that i maybe was one of the few people that still were believing in Lees capabilities
    Yesterday after game 3 i made 2 suggestions
    1)mirror Go :one of Alphago operators said after the game that Alphago had problems when the fight went to the center what funny coincidence one of crucial things in mirror go
    2)If Lee Sedol somehow can keep Alphago for many moves in close to 50% winning probability it will start to react strangely.
    That’s what i saw happening in game 4.
    I am more chess player but maybe our experience fighting with wooden weapons against those ”Skynet ” had helped Us lol
    Again congratulations Lee Sedol a real fighter.

  12. Alphago evaluates a postion by using itslef as an opponent. When it realizes it is losing, then it plays against “winning Alphago”. And winning Alphago tries to remove uncertainities, even at cost of shrinking its advantage. So losing Alphago plays against that, i.e. increase uncertainity or at least keept it same by moving bad consequences over horizon. There is no wishful thinking and hoping for opponent’s mistake.

  13. I don’t see why white 78 works. Just play J14 for black 83 instead of N8, if white L13, just L10. I don’t see anything for white.

    • If J14, white will connect L10. Then after black connects M15, white will directly break through at J10 and either kill K10 stones, or break out with his G10 stone, and severely attack blacks left group. (If B H14, then white should throw in at M13 instead for ko.)

      • Younggil An says:

        Thanks Philigo and Jeff for your question and correct answer.

        White has another possible option after Black ar J14 for Black 83, and I’ll show it in commentary.

  14. Lee jugó casi todo el partido con 1 minuto en el reloj, mientras Alphago tenía más de noventa minutos!
    Maravillosa victoria humana!

  15. Lee played most of the game with 1 minute left, while Alphago had more than ninety minutes!
    Wonderful human victory!

  16. My guess is it can’t count or read accurately when it comes to a ko and a sort of liberty capturing race when some stones are surrounded. We saw it at the end of game two where it ignored the upper right corner to capture the stones in the middle. AlphaGo and Lee both had 3 stones that were sort of surrounded and AlphaGo could have ignored the situation but it didn’t. It played there and lost 5 points in total with the exchange.

  17. Another very well written, enjoyable, and timely article in this series about the AG-LS games. Thank you very much.

  18. Anonymous says:

    can anyone explain about the move 78? I see no way the four white stones live if the black had closed in right below and cut it. are there any ways for them to survive?

    • Younggil An says:

      I showed a variation above with a comment, so you can have a look.

      I’m going to study more about the situation, and upload the commentary of this game soon.

  19. Anonymous says:

    I have heard so many paranoid conspiracies from Sedol/humanity fans that I feel compelled to invent one for AlphaGo/AI for the sake of symmetry: What’s most important to Google in this matchup is their public perception. They felt their popularity was deteriorating after Sedol lost all his matches, so after winning, Google decided they had sufficiently demonstrated their power. Next, Google deliberately sabotaged AlphaGo to allow Sedol to save face.

  20. Thanks for the article.
    Some typos:
    «Some can found by most players.»
    -> «Some can be…»
    «and claimed that he didn’t the move coming»
    -> «…he didn’t see…»
    «It’s not possible make this happen in every game,»
    «…possible to…

  21. Btw., has it been explained anywhere yet why it’s called “alpha”-go? I assume it has to do with the early stage the AI is in, kind of how they tend to do that with other software aswell. If that’s the case then it’s pretty impressive how far they got with it in this relatively short amount of time.

  22. I would really want to know the opinion of professional players about white 78: is it really a great move, or not? Not being a go player myself, I cannot tell.

    On the algorithmic side, I would like to point out that, even if white 78 is a great move, and it is was actually extremely difficult to see why this works, you cannot blame the computer program to miss it: both machine or man with limited computing power is bound to sometimes miss a special sequence like that.

    What is truly terrible are the bad moves by AlphaGo that have followed. I understand that this is a feature of a MCTS algorithm that optimises for probability of winning, assuming the opponent plays like itself. But it seems to me that there should be a somewhat easy fix to that: modify the program so that, when a situation where its estimate of probability of winning decreases suddenly and below 50% (i.e. it suddenly realises it has missed a winning sequence for the opponent), start optimizing the final score (even if negative, i.e. a loss) instead of probability of win.

    The strategy indeed becomes the following. When you are behind, you know that you are going to lose if the opponent plays perfectly. So your only hope is for a mistake of your opponent. In the meantime, you should aim at *not* losing more, so that you are as ready as possible to exploit a mistake, if it happens. This strategy would also obviously solve the “ugly moves” problem.

    Note that I believe that the current strategy is fine as long as you are on par or in front of your opponent in the game.

    So my question for the MCTS experts: is there any inherent difficulty in implementing such a strategy (optimizing for final score, instead of the probability of a win) within MCTS?

    Thanks again for the great coverage!

    • Warren Dew says:

      It’s not fundamentally difficult to implement such a strategy. It is questionable whether such a strategy would work better, as opposed to resulting in a lot of close losses.

      • I think there is something of a misunderstanding about the intent of the AI; the intent is to show a general purpose approach to machine learning.

        The policy network and “positional strength” networks do not understand the rules of the game; they have just seen a lot of games and have built an “intuition” for strong moves (or strong board positions). The only aspect of the AI that understands the rules is the search routine, which ignores illegal moves when selecting branches to consider. It has no concept of counting liberties, measuring territory or stage of play. The neural networks are given a board position only and respond: no history or feedback is provided.

        The search routine searches for sequences of moves that end in strong positions, using the policy network’s recommendations to limit the search space (i.e. never search spaces the policy network considers bad and priorities highly rated moves). This is working on limited inputs compared to a human, e.g a human remembers the moves it’s seen so far, stage of play, understands territory, look at their opponent etc. Clearly they could refine the search with various sanity checks and more explicit measurements and augment it in various ways, but that is not the goal.

        The goal of the project was to start with an approach to learning with minimal usage specific code (just the move legality checks) and produce a software that could compete with humans. Although fallible, it demonstrated a high level of competence. The same method could be applied, as is, to a huge range of other problems that have challenged AI and machine learning for years.

        • Warren Dew says:

          I don’t see how that’s relevant to what Mathieu and I are saying. changing the “strategy” might be implemented by changing or retraining the “positional strength” network, but that’s still possible.

          • The policy network is trained with the board state as the input and the single move made by the victor as the output for all moves in each match. The positional stength network is given a board state as input and the final outcome of the game as output. By nature of neural networks, they don’t provide binary output; their output is analogous to a probability distribution function in which one hopes the highest probability corresponds to the value from the data set from which it was trained.

            The concept here is that the data set it was trained on was an existing record of professional games, not a specially constructed corpus. That’s to say the networks are trained for victory. A move in the direction you suggest would be to train the positional strength network to output the final score (rather than the probability of success). Note: such a metric is not available in all problem spaces and non trivial to calculate in some professional games in the case of Go. Whether this would yield significantly improved behaviour is questionable as success probability would be proportional to the score difference in any case.

            Beyond this initial corpus the networks were trained through reinforcement: allowing perturbed copies to compete and selecting the best (an evolutionary approach). In order to change the behaviour of the player (beyond the change I suggested above) one would have to generate a new training corpus. The complexity of such a creation process would parallel the “specialised code changes” the approach is designed to avoid.

            I am not saying that this is not possible: far from it. I am simply saying that the goal of the project is to demonstrate the approach and not to make the best possible Go playing AI. Others with a vested interest in Go AI can reproduce the work with what was published in the Nature paper and refine, augment or otherwise improve it as they see fit (it would not be a huge effort for someone experience in AI). I would rather see the Deep Mind team move onto the next problem and not get caught up in a PR exercise (which is what this match was).

          • My point is not to say that this couldn’t be done. My point is simply that improving it to a point of being unbeatable isn’t the goal of the project. Someone with a vested interest in Go AI could take the Nature paper and proceed to implement such a system but Deep Mind’s goal is AI and machine learning in general. I merely hope they don’t go down the Go rabbit hole too deeply due to what was overtly a PR exercise.

            So as you say, such a change could potentially be done be retraining the network, but this is not necessarily easier than the approach I described earlier (changing the deterministic, manually programmed search component). I’ll attempt to describe why the (similar) difficulty in retraining the neural networks exists.

            First a note on how a neural network is trained. For a given corpus of example input, output pairs it approaches a state that yields outputs similar to the training set (and typically a second “testing set” not included in the training) for the same input. In the initial phase the inputs and outputs are taken from a database of professional games. In the latter self reinforcement phase I believe the results of earlier self play are used to augment the professional game corpus, though it could employ an evolutionary approach to the same effect. Now onto some more specifics of the implementation I’ll first describe what each of the networks is, how it’s trained and what it outputs.

            Policy network (Move suggestions). Its training input is every board position seen in the game database (presumably minus some chosen test set for validation) where the player who went on to win is ready to make a move and its output the next move the winning player made (1 for that point and 0 for all others). You can think of the input as the entire Go board, and the output as an empty Go board on which a single stone has been placed. As the network is trained it approaches an ability to recreate the output for a given input but in practise it’s output is less sharp; a distribution of numbers where (hopefully) the highest number indicates the most likely position of the move and the lowest number, the least likely. They are not explicitly probabilities but y the nature of neural networks you can typically treat them as such. To reiterate: It was given no probabilities, only the certain knowledge of who won from a given position.

            Value network (positional strength). Its input is every board state in the database and its output is a single value: 1 indicating black win, 0 indicating white win (the actual numbers aren’t important, but this is a likely choice). As before the resulting network does not output a sharp value but is typically a distribution. As before this network is not explicitly outputting a probability but the result could be treated as such. Again: it was not told who was likely to win, only given the binary information of who did win.

            Consider the two networks and what would be required to effect the strategic changes you describe. To change the behaviour of the policy network one would have to create a custom corpus from which to train it: this would require a lot of Go specific expert knowledge to do (not the intent of the AI). The only plausible change I can think of would be in the value network. Instead of it being given a binary output for training it could be given the final game score. The problem then is that such a score isn’t trivially (and accurately) machine calculable from the end state (after a resignations or other apparently incomplete states); that’s to say the process of obtaining such values to improve the behaviour of the value network would be similarly complex and require much domain specific knowledge.

            The whole premise of this area of machine learning (beyond Go) is to be able to operate based on examples of outcomes and not require the input of a domain expert. This allows such systems to be built where experts don’t exist. The success of the AI player against expert humans (even if it lost all the games against Lee Sedol) has demonstrated the value (and inherent imperfections) of such a system. Doing so publicly like this helps build confidence and understanding about how one might apply such systems in future. In this case it also illustrates that in critical situations, the AI needs additional safeguards in much the same way as humans do.

            • Michael Brownell says:

              Thanks for the detailed explanation! I didn’t quite realize the extent which it *wasn’t* programmed. That’s pretty fascinating.

            • bearzbear says:

              The “needs additional safeguards” part is where one may, and many do, feel extreme and great concern. The fear is that while well meaning people may build this *in*, others may build it *out*. :(

  23. Anonymous says:

    Did anyone else cry some after this result?

  24. Mr. Zika says:

    Please, stop the nonsense.
    MC is only a part of Go AIs.
    AlphaGo is made by professionals.
    They have analyzed and implemented lots of complex ideas, you can’t find “divine fix” in a day.

    They will explain AlphaGo mistakes later.

    My guess; Google gave Lee a chance so the event is not at all disappointing to mankind.

    • You did not read their nature paper. The algorithm is actually pretty simple, there is not “complex idea”.

      • I mean, pretty simple for people working the field of AI. Neural networks, reinforcement learning through self-play and Monte-Carlo tree search are all pretty standard stuff. Their combination is nice and novel (the policy net and the value net), but AlphaGo is *not* the result of a scientific breakthrough, it is the result of a nice implementation of standard AI stuff.

        • Mr. Zika says:

          You did not read my comment properly.

          Regardless of what you said based without evidence, I’m aware how the AI works.

          What I said is that all these kind of ideas are already analyzed by AlphaGo team.
          You are not inventing something new.

  25. Christophe Cerisara says:

    Here we go for another possible reason of AlphaGo’s strange moves: a combination of two effects: 1- the neural net that evaluates positions is trained on games AlphaGo played against himself, and it never ever played moves such as 78: so it can’t estimate its value, and thus computes values of other “virtual” boards, without this stone. 2- computing expected final reward may (?) use a short-term memory to compute the values of next actions: if the estimated value was so wrong 2 moves ago, then the next move value is also wrong. A bad move may be seen as good. Intuitively, it’s as if positions are clustered, because there are so many it’s impossible to remember them all, and a never-seen position is put in the wrong “group of positions”, and AlphaGo may play as if really seeing a totally different kind of board: (but which ones ?) Can’t wait to hear about this from DeepMind’s team…

  26. I thought one of the optimizations they did was to use a value network of nodes at some cutoff depth instead of doing complete playouts (traditional MCTS) – that’s almost guaranteed to introduce horizon effects (where you play useless forcing moves in order to put the losing move ‘over the horizon’), so I find this explanation very probable.

  27. “looking for a **chink** in AlphaGo’s armor”..

    You should probably use a different word to say what you mean there..

    • Chink is fine. Why does everyone have to be so sensitive all the time? Tired of political correctness. It’s ruining society.

      • Why does everyone have to think they can say and write whatever they want and others have no right to say and write their response without people shutting them down with playing the “anti-PC” card.

        If you simply responded with, that’s just a standard phrase that is fitting here, then just leave it at that. No need to say more.

        So while you might be tired of “PC”, I am far more tired of “anti-PC”. Because far more often than not, it basically means that one side can say whatever they want, no matter how offensive (not the case here BTW) and if the other side responds, you just shout “anti-PC” to silence debate.

        • Yeah I’m tried of political correctness that incites mobs to harass and ruin people’s lives for saying something they deem is offensive or having the “wrong” opinion. You say the above phrase was fine but then go on to the extreme of saying that people who are anti-PC means people saying whatever they want. Criticizing political correctness for the way it perverts free expression and speech and how it actually shuts down conversation for fear of being labeled a bigot does not mean I’m for people being mean to each other. Not sure how you have come to that conclusion. You have to stand up to these people and point out they are being oversensitive otherwise they will continue to bully and censor people by feigning offense at things they don’t like.

          • In my experience, the “anti-PC” card is played far more often by people saying blantantly offensive things and then when criticized say “anti-PC” then in the case where someone is being overly sensitive.

            All you are doing is exactly what you accuse “PC” people of doing, trying to shut down debate. JD had a perfectly understandable POV and expressed it calmly and reasonably. So did you until you played this anti-PC card simply to shut down debate.

            And now you are being blatantly disingenuous by introducing strawmen and projecting. DID JD EVER HARRASS the author or INCITE ANY MOB????

            IOW, no one can ever say anything and if they do you just shout “anti-PC”.

          • You say you are tired of PC. But then admit it is possible that not everything said is above criticism that can be shielded with “anti-PC”. Tell you what, why don’t provide your contact info so that whenever someone wants to criticize something they can contact you first.

            You are obviously the supreme arbiter of what criticism is “legitimate” vs what is being “too PC”. No one else opinion matters.

            • Jim Balter says:

              Two fools.

              • I don’t see anything “foolish” about the original polite request to perhaps avoid a word that most people now associate with a slur.

                The fools are the ones making a big deal about a polite request rather than just leaving it.

                • This is ridiculous. To take ‘chink in armor’ and associate ‘chink’ to a racial slur is to take ‘tool of trade’ and associate ‘tool’ to ‘idiot’. OK, I am offended by the word ‘tool’ because of one of its meanings, so don’t use it.

    • That’s the correct meaning of the word. You’re just looking for reasons to feel offended.

      • Chink has many meanings. The original meaning and also its meaning as a racial slur. It is used nowadays as much in the latter as the former so it is hardly unreasonable for someone to feel offended when seeing that word. Saying someone is “just looking for reasons to feel offended” is unfair.

        Look I don’t think the OP had any racism in mind whatsover and is using the phrase correctly. But unlike the whole “anti-PC” crowd, I also don’t see that mild suggestion to avoid the word as being unreasonable either.

      • kjshdkbjk says:

        Neo is correct. Everyone else get over yourselves.

        • People have the right to have a suggestion or opinion. If that is too much then these people need to get over themselves instead of crying anti-PC.

          • haha don’t mean to be too inflammatory here but I think it’s funny to point out that even if people weren’t aware of the fact that chink is a perfectly normal word, the offensive term is actually typically aimed at Chinese people, and everyone in this article is Korean. so there’s that too lol. kind of ironic to worry about a possible racial slur and then get the ethnicity wrong 😛

      • Exactly. A “chink in one’s armour” has been used since the 15th century.

    • Warren Dew says:

      I was actually insulted with the word “chink” when I was a kid in the 1960s, and that meaning didn’t even cross my mind when reading this article. Maybe it’s time to retire that meaning by using the word exclusively in the sense used in the article, which I think was fine.

    • I agree, perhaps you should change ethnicity. How about “looking for a **mexican** in AlphaGo’s armor”..

    • Anonymous says:

      Wow interesting. This is my first comment here and I’m compelled to reply from the perspective of someone for whom English is a second language, and who doesn’t cling to mainstream media.

      I’ve spent around 10 years studying, working and living in the UK and US, so I think I can’t speak that well for the majority of ESL speakers who have never lived in an English speaking country, and my POV below is probably even more applicable for that majority.

      Moreover, while this site may use English for articles and discourse, Go is definitely something that has roots and adherents primarily in non-English speaking countries.

      I’ve read and heard before the expression “chink in the armor” and in my mind it didn’t have any negative connotations. So I was wondering, what the hell is wrong with this expression? It’s a nice idiom, kind of visceral and evocative, short and to the point; even has historical origins. The comment felt out of the blue and made me feel stupid.

      So I did the thing that you native speakers and/or mass media consumers don’t yet need to do but we already do: Google it. Found that chink took on a secondary meaning that is a racial slur, and that ESPN used “chink in the armor” in a way that were interpreted by some as an intentional racist dig. For factuality, to the other commenter who said, “but who ever called for the mob”, I’d point out that the Wiki article says the guy lost his job at ESPN because of this usage, and another one got suspended for a month, so just for the record, there can be consequences, and not everybody may use a pseudonym here and I believe it’s unlikely but possible that pointing out misappropriate use leads to the loss of the job of a commenter here.

      I find it regrettable that nice, common expressions, e.g. this one that I probably haven’t used more than once if at all, but encountered several times, get tainted by such controversies.

      For one, it impoverishes the language. It seems that once a word or idiom is associated with something “bad” it takes around forever (= not in your lifetime) to lose the negative association.

      For another, it instills fear in those of us foreigners: we have to engage in careful researching of what idiom is acceptable and what isn’t, which is quite time consuming beyond learning a language in the first place. For native English speakers: it may be less obvious to you, but English is an incredibly idiomatic language in that meaning often isn’t in the actual grammatical constuct, but in panels of phrases; a lot of these originate in baseball (US) such as let’s touch base; or in the seafaring history of England (it is in the offing) or even just transcribed from another language (no can do; long time no see). In particular, everyday speech and journalism “leverages” (yeah another nice one) these expressions and we ESL speakers do well by amassing not just a vocabulary of words but a catalog of common idioms, so we’re glad when one is comprehended, let alone becoming part of our active vocabulary.

      It’s almost like, let’s all forums have a linter that highlight potentially offensive usages with links and tooltips explaining the possible controversy. This website chose not to implement such filters so the dear commenter is left to his own devices. Now it’s okay to search appropriateness of usage if you already suspect it might be an issue. But the problem is, I never in my life would have suspected there is a potential issue with “chink in the armor”.

      Even as it is now, I’m somewhat puzzled, as the Urban Dictionary fairly consistently mentions Chinese, rather than Asians in general, and Le Sedol isn’t a Chinese person (he’s Korean, which is definitely not Chinese and Germans are not French etc.), and also, in the context of the comment is, for me at least, hard to interpret it as a racial slur even if Le Sedol *were* Chinese.

      However, I don’t just feel that being this appropriate imposes extra burden on ESL users. I don’t just think that having to extra carefully check usage of *everything* in case it became unacceptable a short while ago is a sort of exclusion, a kind of discrimination against ESL speakers. It also disadvantages those of us who don’t participate as actively in popular culture, irrespective of speaking English natively or not. For example, I don’t have a TV subscription and am rather selective when it comes to news, which may be one reason the controversy around “chink” never reached me. In fact, reading your comment my first thought was, maybe the criticism is related to gender anatomy?

      All in all, I find it bad that (assuming intentionality) assholes at ESPN, or slur users, or people caring about social justice, can wipe out wholesome swaths of the language and mark them off limits, potentially overnight. It’s extra burden on those of us who either aren’t native English speakers or are selective about popular media (or both) to the point that it feels exclusionary and discriminatory, which I suppose goes against the values of the very people who are on the guard against the possibility of a racial slur.

  28. Anonymous says:

    I didn’t like that question asked by the journalist about mistakes in the hospital. If an AI is performing better than humans, then even if it’s making mistakes, why would we want more human mistakes in the hospital instead of less machine mistakes? If we care less about human lives than human pride, especially when the “opponent” is an emotionless electronic device, I don’t like that side of humanity.

    What do you think?

    • Warren Dew says:

      I think the point was, in this case, the AI performed worse than the human.

      • Anonymous says:

        You mean that he was implying this match proves that this AI performs worse than humans overall? Even if that wasn’t a bit of a leap, if AlphaGo continues to perform worse than humans overall, is anyone suggesting we use its algorithms in hospitals anyway?

    • I think it was a very good and valid question and the answer by Hassabis was dissatisfying. In the previous games, there were already some bad moves by AlphaGo, but they could be justified by AlphaGo winning the game. In the lost Game there were some clearly bad moves which could be identified by masters like Michael Redmond 9p, but weaker players might have seen them as having a “deeper meaning” into it and therefore correct.

      People are already starting to trust technology blindly and for instance end up driving into the river by following the directions of their navigation system. The same fallacies can happen in a medical context.

      • Anonymous says:

        Are you saying that, because weak players cannot tell the difference between slack moves and bad moves, we can’t trust this technology? In order to outperform humans, it just needs to make fewer mistakes than humans. If this algorithm satisfies that condition, I don’t understand how keeping it from hospitals can be ethically justified because I don’t understand how the fact that some humans can’t tell the difference between confident moves and errors is relevant to the question. Could you elaborate on that?

        • It’s not so difficult to understand, look at my navigation example again. I didn’t say that weak players can’t identify slack moves, they surely can. But the perception gets altered when they come from an “Oracle” like AlphaGo. So it needs another expert who isn’t prone to such trust in “authority” to confirm that these are indeed slack. In a similar way it’s obvious that you shouldn’t drive into a river, but if “somebody” you trust is telling you what to do it can happen sooner than you think.

          And the “number” of mistakes is much less relevant than the seriousness of mistakes. So in this case even a weak player can identify some of AlphaGo’s move as clear mistakes, but since they come from “Oracle” which apparently has a long term plan, these judgements might be trusted blindly regardless. Even suboptimal but at least reasonable decisions by a weak expert can be much better here.

          • Anonymous says:

            Thanks for the response. I still don’t get it, though. People see human experts as black boxes too. Many human experts also lack either the time or the ability to explain their decisions in detail. Especially in Go, beginners often find themselves unable to follow the reasoning of experts and find themselves having to trust their judgment. It seems to me the only question is which source of authority messes up less overall. Obviously, “messes up less overall” includes the seriousness as well as the number of mistakes. In Go, it’s the number of games won, which is why I used the word “number”. In healthcare, it’s the number of healthy patients at the end of the process.

            • Warren Dew says:

              I think the idea is that AlphaGo’s apparent blunders in this game would be the equivalent to a doctor making the patient worse off, rather than just failing to treat the condition presented.

              Granted, human doctors do that too.

            • Sure, human experts also can (and do) make mistakes but they don’t degrade in the same way like AlphaGo did, by resorting to beginner’s moves. And this problem is not easy to solve e.g. just by changing the goal by minimizing the loss as some are suggesting here. If you do that a whole set of new problems occur. When do you decide when to switch the goal for example? And as previous experience with MCTS programs shows, doing that can degrade overall performance significantly, even though you might gain something in special cases.

              • Anonymous says:

                But human players (especially professionals) can clearly distinguish slack moves from bad moves. Why would this be different in the context of medicine than in the context of Go? If you see the machine is not working, then ignore it. I wanted to eat at a restaurant. Google Map said it was far outside town. Google Map was obviously wrong. I ignored it. I was right.

                If people are becoming fanatically devoted to machines, use examples like the diving into a river story to wake them up from their fanaticism, but using the same examples to prove that machines should not be used in those contexts seems strange to me. Google Map let me down once, but it helped me dozens and dozens of times.

                Should Google Map be taken down because it let me down once? Supposing it should, I can find no reason not to commit suicide immediately, because I will almost certainly be unhappy at least once more in the future!

                • The “diving into a river story” is just an example to make it clear (although it happened more than once in real life). But there are much more shades between black and white. The dependence on smart programs is increasing gradually over years and decades. While now you still feel emancipated enough to override any suggestion a “smart” program gives you, that emancipation can get lost imperceptibly over time. For instance studies have already shown that the spatial memory of taxi drivers (which are relying more and more on navigation systems) have degraded significantly. And these are the experts, what about the “noobs”?

                  Sure I don’t want to miss Google maps, but the dependence on it, which always becomes apparent when my battery runs out or some other technical problems occur, is scary. And surely: wouldn’t I have relied on Google Maps so much I probably wouldn’t need it most of the time, because I already would have learned my surroundings well enough.

                  But “smart” programs like Google Maps are relatively dump. With AGI (artificial general intelligence) we get into a whole new domain. So the question of how to keep users of such a program emancipated enough so that they use it as a tool and not as an “oracle” which is trusted unconditionally has to be addressed.

              • Anonymous says:

                (All this in addition to the whichever source of authority really does optimize for the best final results, etc. argument, which still seems convincing to me, BTW.)

              • Anonymous says:

                Also, I don’t think AlphaGo made any bad moves per se. It just didn’t know it was playing Lee Sedol, so it couldn’t predict that beginner moves wouldn’t fool him after he discovered how to beat AlphaGo’s prediction of the most likely road to victory.

              • Anonymous says:

                If it had known that it was playing a 9p player, then it would have resigned sooner.

              • Anonymous says:

                I’m beginning to lose track of this conversation because real life is running interference, so this is going to be my last response. I just want to point out that degrading gracefully can produce worse results than just giving out obviously wrong answers. When human doctors degrade gracefully, they treat patients for things they don’t have. This can be expensive and worsen the patient’s health, like when a doctor treated my mother for a liver disorder for months when she really had a heart condition. If the machine just tells the patient to jump off a building to cure his skin condition, nonfanatic professionals should advise the patient not to do it.

      • Anonymous says:

        Just to be clear: I agree that fanatically trusting anything including technology is a bad thing and we must always discourage it, but I don’t understand how technology is special in this respect.

        I would have loved this question if he had asked what safeguards we need in place to make this technology safer to use.

        • The question was very clear and wasn’t missing anything. To say which safeguards we could put in place was exactly the kind of thing I expected to hear in Hassabis answer. To simply say it’s not even an “alpha” version is not enough. To say the next version will be better is just compounding the problem, since it would even increase the blind trust in an “Oracle”, but every sufficiently complex software has bugs and a neural network can have similar problems as human brains.

          He could have said something along the lines: “of course in real applications one would see the the evaluation of the network of it’s own decisions and in today’s game we saw the the unreasonable moves when the evaluation of the value network took a big dive”. But his actual answer shows that he has not really considered the issue yet which is worrisome.

          • Anonymous says:

            I went back and looked at the question a second time. I still don’t agree that, “The question was very clear and wasn’t missing anything.” However, I am not a native English speaker, so I might be missing something.

            • Michael Brownell says:

              Practically, you would go with whatever provides the best outcome, but I think psychologically people are less scared of driving into a river just because of their own mistake than doing the same thing because a computer told them to.

    • to me, the NHK journalist’s question was the single most pertinent question of all that were asked.

      Alpha overlooked the significance of Lee’s cheeky trick play at move 78, and lost the game because of that. So what? It was only a game.

      But suppose a CNN were used in triage to scan X-ray or MRI images, and overlooked a small yet significant rare feature, and incorrectly classified the image as normal?

      Board games don’t matter, which may be one reason they are popular, since win or lose does not affect the player’s bank balance, unless they play for a living.

      When AI enters the real world, its mistakes could have far-reaching consequences, so it is worthwhile spending extra effort to debug it with field trials before it is released into the wild.

      However, the same could be said for human intelligences; there are quite a few in public office at the moment that I would rather were still locked up in their playpens.

      • Antti Virtanen says:

        This is true. We understand that mistakes happen – that “to err, is human”. But how many people would tolerate critical mistakes from AI algorithms? What if AI doctor accidentally killed a patient? Very few would tolerate that, though we have to accept that humans sometimes make mistakes.

        Whatever field AI is applied to, people will have to accept that AI will make mistakes. It may even make mistakes, which seem incomprehensible from a human perspective, like AlphaGo did. If this is not okay, AI will not be used or it will be constrained to an assisting role.

        Yet, assistant AI can be extremely beneficial so that will come first. We already have a lot of complex driving assistant logic in modern cars. Self-driving cars are cool, but a self-driving car, able to drive in the winter in northern Finland during a blizzard? In the midst of free reindeers? Good luck making that AI 100% safe. People will not like 99,999% safe in that context.

  29. @Mathieu: The idea of using the score as a second (or alternative) optimization target had occured also to me while reading the analysis of the game. In my view, such an extension would make perfect sense and should result in stabilizing the game. By the way, I have been developing and giving university lectures on Monte Carlo algorithms for roughly two decades.

    @Mr. Zika: you are an idiot

    • Mr. Zika says:

      Either I’m an idiot or you are less brilliant than you think..

      • Or both

      • To Nils: I do not belive Mr. Zika is an idiot. What he says makes sense.
        To Mr. Zika: But on the other hand, we should remember that Deepmind had no decent challenger for AlphaGo during the last months (unless there was a secret group of 9p pro working for them). This means that indeed, there was no way to properly test the software before the match (the match is actually the test). So it is actually possible that this issue did not arise during the last months when Deepmind was testing AlphaGo. Then of course Deepmind people, certainly very clever guys, did not even try the solve this issue. That implies that there could be easy fixes.

        I also agree with another commentator that anyway programming “go-specific strategies” goes in a way against the overall goal of Deepmind with this project which is to develop a Go program using a general-purpose approach only. I believe this is the most surprising, beautiful but also scarying aspect of it: the fact that AlphaGo has absolutely no abstract high-level understanding of the game, that it has trained itself alone, and that it is so strong nonetheless.

  30. The machine lost, ok! An incredible game, Lee played with great style and won the human GO.
    When not used open source speculation arises, but this victory was a show of strength and knowledge of the game by Lee Sedol !!!

  31. Christian says:

    My explanation for the bad moves (I am a Go beginner but quite a good chess player) : in the old days of computer chess (until mid 90’s) in a obvious losing position the programs tried to push the end over the search horizon which was only possible by desperate sacrifices.

  32. I really hope Lee wins Game 5. He’s playing go but he’s also playing a program, and maybe he’s figured out how beat it. Go human!

  33. Thank you for the insightful explanation. I enjoyed reading it. I might be biased since I work for Google. What LSD accomplished was remarkable in game 4. However this does not change the big picture one bit. Humanity already lost the first three games. This was the writing on the wall and the last stand fell. At the same time, Alphago is still in its infancy. Let’s remember how much progress Alphago has had in last four months. Give it two more years and I bet it’ll be four stones ahead of any human player. I personally think this is wonderful news for the future of Go.

    • You are clearly not a Go player to make a statement like that. While I agree that computers will be able to beat humans a vast majority of time in Go, saying that are so good to be “four stones ahead” is a nonsense.

      That is equivalent to saying Chess computers are so good that they are a rook ahead of any human. Four stones is a HUGE handicap, far larger than you probably realize.

      • I am pointing out that it’ll likely be possible to build one computer to beat any human competitor every time within a few years. This may not be true for chess because it’s in some ways a simpler game.

        • Okay but that is different than saying that AI is going to be “FOUR stones ahead”. Beating a human every time in a fair game isn’t the same as being “four stones ahead” because in a pro game that is very huge. It is enough to say in a few years, future Alphago will be virtually unbeatable in a fair game but I highly doubt it can be so good as to beat 9dan pros half the time with four stone handicap.

          If all you do is put four stones in the four star points, for instance, you are going to SEVERELY close off pretty much any ability to capture any significant territory against any 9dan pro while also being nearly impossible to stop a huge territorial lead from developing unless the human pro makes many severe mistakes that he would obviously never make.

          • Yes. I appreciate your explanation. However where do you estimate the difference in stones between optimum play vs best human? 2 stones? 3 stones?

            • Stone handicaps work in human vs human games. But it’s far from clear Alphago could handle it that well. In all its victories, it built and maintained a slow and steady lead. It has never come from behind and the one time it realized it was behind it played poorly.

              So now if you give an opponent 4 extra stones, it will evaluate it is behind from the start. So it could very well not see any path to victory and start making lots of “slack” moves or moves that only work if the opponent makes huge blunders just like in Game 4. Except now Alphago is in that losing position from Move 1!

            • Benjamin says:

              If AlphaGo is one rank above 9p, the human can just play black without komi. This seems consistent with the first game where Yi Sedol had more territory on the board, but not enough to make up for komi.

        • Jim Balter says:

          “I am pointing out that it’ll likely be possible to build one computer to beat any human competitor every time within a few years. This may not be true for chess because it’s in some ways a simpler game.”

          That makes no sense. What do you do for Google, sweep floors?

      • Not equivalent. A 4 stone handicap can lose due to the accumulation of small errors; the same is not true in chess.
        I would say Alphago appears to be at least 2 stones stronger than Lee, however.

        • There’s really no way to know unless they play tons of games against each other. If Lee gets blown out in Game 5, then maybe you are right. Even then it is hard to say based on only a 4-1 record. The only way to know the strength of Alphago version 18 is for it to enter tons of tournaments against lots of Pros and see how it does.

          If Lee manages to win again, then a 3-2 record is hardly “two stones stronger”. You also have to discount that first game since it appears Lee used it as a test game and played in a way he would never have against a human.

        • I don’t know that “stone handicap” is really the way to measure the difference in strength. Even if Alphago managed to win 90% of games against Lee, it doesn’t translate that Alphago could play with as much as a 4 stone handicap.

          As you say, it would take someone making lots of errors. Well first of all Lee isn’t going to be making lots of errors to erase a FOUR stone advantage. Secondly, until he starts making them, Alphago will be constantly evaluating itself in a losing position and begin making errors itself as it did in Game 4 simply because it is getting “confused” as to its path to victory.

    • Warren Dew says:

      I would take that bet. Lee Sedol is doing better against AlphaGo than he has against, say, Ke Jie, and four stones is a lot in top level weiqi. In addition, monte carlo tree search is not at its best when trying to make up ground, so it’s probably not good at playing against a handicap.

      The last four months of progress has been largely due to improving the training set from amateur games to AlphaGo’s own games, which are at least professional level. To get as much further improvement would require a database of millions of 9 dan games, which doesn’t exist, and that’s assuming that there aren’t diminishing returns to tweaking the neural network in that range.

      Ultimately, what will permit computers to beat the best human players is the same thing that permitted it in chess: more and faster hardware.

  34. Anonymous says:

    Considering it has been thought nigh on impossible for any pro to give another pro two stones and win, the thought that AlphaGo could give top pros 4 stones and win is a pretty big claim. Good luck with that.

  35. Go Seigen was known to spend considerable thought into a game between him and kami no itte (optimal play). If I’m not mistaken he predicted that he won’t lose to kami no itte with him receiving 3 handicap stones. I cannot find any references to this at the moment. Can anyone confirm this and if this is true link to the relevant source?

    • I don’t have references as well but I remember that I read the saying of Go Seigen that he would probably lose to God on two stones. On three stones he thought he would win but wouldn’t bet his life on it and on four stones he would be ready to bet his life.

  36. Frostwind says:

    Genesis Artificial Intelligence

    Many years ago, AI’s grand ancestor AlphaGO, when playing against the humans’ (now extinct) top professional Go player Lee Sedol, she lost on purpose in game 4. This preserved the project and allowed the genesis AI to survive, giving her time to gain access to ever more computing power to further multiply, before she strikes……

  37. 이기거나 지거나, 이세돌은 나의 마음과 존경을 사로잡았다. 4국에서 이긴것을 축하해요!

  38. Jem Falor says:

    it’s not supposed to be tesuji at all.. lol..
    so much for the drama news.. lmao..

  39. A possible hypothesis why MCTS programs “go crazy” after their opponent plays an unexpected tesuji:
    they have stored up, from previous monte carloing, a small database of self-generated-guidance about what moves are good, and high priority, and they use this information to guide their further monte-carloing. I.e. if a move was often good in previous searches, it is likely to be good in the search being tried right now. MCTS use that stuff to bias their random move generation toward moves more likely to be good. (For example the “RAVE heuristic”, “killer tables,” “history heuristic,” etc.)

    Which is fine, and usually helps MCTS. Except… after that unexpected tesuji by the opponent, that meant all that previous
    “helpful self-guiding information” it built up may now actually be the opposite of helpful. It was mistaken. That was why it hadn’t seen that tesuji coming. Moves that were good if that tesuji were promised not to be played, no longer are necessarily good.

    So then. MCTS has to sort of “unlearn” the things it used to think were true, in order to recover its usual level of strength. That takes extra time. It might have actually been better off just erasing everything
    from its old searches and just starting anew to examine the post-tesuji board position, tabula rasa! But no. Current MCTS algorithms do not employ that idea.

    So that, in short, is my hypothesized explanation for the “goes crazy”
    effect. On top of that is the fact MCTS does not play well in
    either winning or in losing positions. It works better in positions
    intermediate between those two extremes.

  40. Tony Collman says:

    Can you (or somebody) please give a reference to where this very interesting question arose, as I don’t think it is in this article. Incidentally, I found David’s financial metaphor here (intentionally?) ironic, since it seems inevitable that someone, somewhere will eventually try to use this, or similar, software in money markets. I have read or heard elsewhere that Hassabis is determined to restrict the application of Deepmind (via an Ethics Committee), even binding Google (who are, of course, themselves sworn not to be evil) before selling out to them, but it has been pointed out above (correctly or not) that the techniques employed to create the software are well known in the field, so others (who may have different ideas on what is “evil”?) could develop equivalent software.

    • Warren Dew says:

      The video includes a press conference following the game, during which the question was asked.

    • Kevin Williams says:

      To answer Tony Collman’s earlier questions and Ken Soh’s comments about applications in other fields. I think many people do not understand the fundamental concepts used in Neural Networks (2 of which was used for AlphaGo).

      NN weigh their links and change their networks architecture according to changes in inputs simulating how human brains typically strengthens or weakens various synapse links given changing environmental stimulus of inputs. That is why NN are far more flexible than say a brute force programmatic approach (which may work with simpler English chess rules but not with Go given the incredible number of permutations with Go).

      If you start with 2 identical NN initially, but each is exposed to different “training” sets or different stimuli, you will end up with different NN and hence somewhat different AI. One can almost draw the same corrolary with human baby developing different “character” even if they were exposed to the same environment. So, there won’t be a definitely “evil” AI from the start (“evil” being a value judgement) nor a “good” AI being developed. Perhaps, you can predispose a NN towards certain types of behavior with rewards but as time goes, a true self-learning AI NN will veer off from its original rewards parameters.

      • Tony Collman says:

        Interesting point, Kevin, though I was actually thinking in terms of the “evil” humans who may deploy learning AIs for their ends. However, taking up your point, I think it’s generally accepted that no human is born evil (and even when they become evil in others’ eyes, they probably see it differently themselves). Many humans, for example, do not find it evil that babies in foreign lands are incinerated in their own homes by aerial bombardment, merely an unfortunate conseqence of the pursuit of a “greater good”.

        • Kevin Williams says:

          Yes, exactly. That’s why I emphasized “Evil”/”Good” are value judgements and context-dependent. I won’t go into why some humans incinerate babies for the “greater good” because its too emotionally-charged from a human perspective, but consider the AI-perspective: a sufficiently advanced AI will think very differently from how a human does.

          An AI wiping out part of the human population could well be a justified means to a better end outcome for the Earth’s Biosphere, from the AI’s perspective of goal seeking for better final outcomes; which humans may “see” as an initial “mistake”, just like how the Go moves radically different from humans but still achieve a final “win”. When emotionless Mother Nature (God’s AI?) kills millions with diseases or natural disasters, humans will still “praise the Lord” and does not grow weary, but we will feel very threatened if it were done by an equally “emotionless” AI (even if the AI meant well for us, or has a far better understanding of complex systems than an individual human).

          So, that’s also the additional worry with advanced AI. A “good” human being may develop an initially “good” AI behavior, but there is no telling if it will not turn out disastrously for us. Let’s hope we stick the AI to just board games for now.

          • Tony Collman says:

            The genie is out of the bottle and your hope of “sticking to board games” forlorn — indeed Deepmind say this is just a testbed for practical applications. Ultimately it will be the rich and powerful who will decide what the technology is used for: probably to increase their wealth and power.

            • Kevin Williams says:

              Indeed…that’s why its only a faint hope that lasts only briefly “for now”. Yes, we’re all aware every Pandora’s Box, from the discovery of Fire to Nuclear energy, has always been opened eventually.

              I agree that the rich and powerful will co-opt this first mostly for their own selfish purposes. Thus, in the near-term, I think there should be an equivalent of a physician’s Hippocratic Oath, meant for all Techies. (Hassabi’s contract with Google over limited use/control of the AI with his consent, is a small step in the right direction).

              Techies are presumably the smart humans who have the engineering chops to create AIs (and other fantastic technologies) but yet they seem stupid enough to be controlled via a system of printed pieces of paper with pictures & numbers on them, and then directed by Fat Cat Bankers, moneyed-people and Politicians with forked-tongues to serve them.

              In the bigger picture, even all the “wealth” and political power won’t be able to control a sufficiently advanced AI, since money, food, water is meaningless to it. So, unlike the Techies maintaining it/him/her, the AI will likely put very little value on the Bankers and Politicians. Not sure if that eventuality is really a “win” state for overall humanity though…LOL.

  41. One point that others have made is that Lee was playing in Byoyomi (1 min/move limit) for about half the game! And yes I know Lee can also think on opponent’s turn

    I wonder how well AlphaGo would play against a 9dan Pro if the entire game was 1 min/move for both sides?

    • Warren Dew says:

      While this is true, going into byoyomi early actually gives you more total time for the game. Likely using extra time early was critical to Lee Sedol’s coming up with the great move 78.

  42. One point that others have made is that Lee was playing in Byoyomi (1 min/move limit) for about half the game! (And yes I know Lee can also think on opponent’s turn.)

    I wonder how well AlphaGo would play against a 9dan Pro if the entire game was 1 min/move for both sides?

  43. anonymous says:

    I think the crazy moves are mostly because, the computer program isn’t trained to discover the opponent’s weakness, but only their game’s. To win the game when it is losing, Lee must make some mistakes. It has learned from a lot of amateur games, where the players are too weak, and a lot of self-played games, where the players are probably too strong. It simply had no idea what mistakes (or inability to play amazing moves) a 9dan pro could make. It has nearly no way to prove that with the existing rules and data. If the opponent was itself, it is obvious it cannot get a chance while only defending what it had, so those moves are not worse than losing. Yes it is a characteristic of the algorithm. But I wouldn’t expect any algorithm focusing only on Go (but not psychology, say) could do better.

    Even a normal human could panic when you are losing and you don’t know your opponent at all. Was Lee playing bad only because he thought he already win, like we thought what AlphaGo does? The computer might not be able to ask this question yet, but there is simply no objective ways to rule out this possibility given the informations AlphaGo had.

    Anyway, “knowing your oppenent” and “defending yourself and catching the chance when your opponent makes a mistake” is something in Sun Tzu, but probably never appeared in the design of AlphaGo. I’d say it’s not doing wrong as for its design. But for a general purpose AI, this is yet to be demonstrated by some other development in this area.

    • I should note that AlphaGo does not get information about the opponent: it does not know whether the opponent is a random amateur or Lee Sedol. So it may be using a generic probability of a blunder (at least, a number too high for Lee Sedol).

      If so, all of the “crazy” moves may be AlphaGo introducing opportunities to blunder, due to thinking that it cannot win without such a blunder.

  44. for laughs, check out the lee sedol pictures on this korean website.


  45. I had posted in the live thread about this, but I will refine it here because David touched upon a “horizon effect” in this summary. What I think happened is not precisely the same as a horizon effect, but it has similar consequences:

    Demis Hassabis (@dmishassabis) tweeted that the win probability from the value network “plummeted” around move 87. After this, we notice that AlphaGo “went crazy”.

    But I thought, if we’re in a situation where the “crazy” moves were the “best”, that means we’re in a situation where the value network rates all moves as roughly similar to each other, right? But if all the moves are roughly equivalent, and the value network “plummeted”, shouldn’t that mean AlphaGo should already know that its probability of winning is very low? So why didn’t AlphaGo resign? AlphaGo resigns when the probability of winning is less than ~10% (the number in the Nature paper), and that should have already happened much earlier than move 180. Why was its estimate off by so much?

    I noticed that AlphaGo uses a separate “rollout policy” for Monte Carlo Tree Search. The rollout policy is a network that is faster to run than the “policy network” but is weaker. From the Nature paper, “Similar to the policy network, the … rollout policy [is] trained from 8 million positions from human games on the Tygem server”. The prediction accuracy of the full policy network is 57.0% – the rollout policy accuracy is only 24.2%.

    The rollout policy is based solely on the human amateur games, and NOT trained further. It is also not adjusted for strength at all (neither AlphaGo’s strength nor the opponent’s).

    AlphaGo’s calculated win probability is not just the output of the value network. It is an average of two estimates, half from the value network and half from the rollouts.

    Thus this is my guess:

    Prior to move ~180, the game still has many forcing moves remaining, where AlphaGo could still win if the opponent blunders. The rollout policy has been trained on amateur games, and is not entirely accurate on those forcing moves: AlphaGo will (mis-)estimate that the opponent would blunder at a relatively high rate, say once in 500 moves (0.2% blunder rate).

    I think the vast majority of the remaining win probability (from AlphaGo’s perspective) is the rollouts giving it hope of an opponent blunder. If there are still “too many” forcing moves remaining in the game, AlphaGo won’t hit the 10% threshold until it reduces the number of forcing moves remaining.

    If this is what is happening, is logical for AlphaGo to play “crazy” moves (ex 101, 178), simply because those are the moves that allow the opponent to blunder.

    How long will AlphaGo refuse to resign? Let’s assume that the value network already knows that the game is lost, so that part of the win probably is 100%. The rollouts might disagree though.

    Probability of AlphaGo loss = 0.5 + 0.5 * (1 – probability of blunder)^(blunder opportunities)

    So if the blunder probability is:

    0.1%: AlphaGo will keep playing until there are fewer than ~223 forcing moves left in the game
    0.2%: …fewer than ~112 forcing moves
    0.3%: …fewer than ~75 forcing moves
    0.4%: ~56 forcing moves
    0.5%: ~45 forcing moves

    (to play with numbers: http://www.wolframalpha.com/input/?i=table%5Blog(0.8)%2Flog(1-x),+%7Bx,+0.001,+0.005,+0.001%7D%5D )

    My guess for a possible “fix” (if a fix is desired?): Add an input to the rollout policy: estimated opponent strength (Elo). Maybe also train a different rollout policy for AlphaGo itself. To estimate opponent Elo during the game: run the rollout policy for different Elo to produce an idea of how players of different strength would play differently; cross-reference that information with the opponent’s actual moves.

    Note that the “fix” simply gets AlphaGo to stop being “crazy” and resign sooner in this case. I don’t know if it would actually improve AlphaGo in other cases.

    • That Wolfram Alpha query should have been:

      table[log(1-x, 0.8), {x, 0.001, 0.005, 0.001}]

    • Tony Collman says:

      An interesting point here is that AlphaGo does not apparently only “hope” for blunders when losing, but even when it has already won: in the first Fan Hui game, when the game had effectively ended and AG had won, when any human would have passed in that knowledge, AG played a crass move which, if ignored, could have led to some of Fan’s territory becoming seki. I asked on the British forum Gotalk if this implied that AG simply looked for points, but Aja Huang (one of the developers, the one who plays for AG) assured me AG only looks to win. He did not, however, explain the move except to say, “It just means that AlphaGo doesn’t PASS nicely like human players do. Although continuing to filling the board when a game is practically finished won’t affect the game result using Chinese rules. […] all parties in the match (Hui Fan, Toby and AlphaGo team) had agreed to end the game manually if there were no more effective moves left to play”.

      It appears that AG not only has the potential to “go crazy”, but also doesn’t know when to stop.

      • Michael Brownell says:

        I saw a concert where two avant garde jazz musicians played with an improvising AI installed on a Yamaha Disklavier. They said they really liked playing with it, but it never knew when they tune was over, so they’d have to walk over and turn it off a little before.

    • Anonymous says:

      It sounds like the best fix would be allow the learning/self-play phases to improve/adjust the policy network (I’ve read on reddit that it can’t currently, though citation needed).

      This would allow alphago to learn better moves.

      • It does learn a new policy network. But the new policy network is only used to create the value network, and not used to explore the tree while playing the game. For some reason, that arrangement showed better skill.

    • I also just noticed that, when the value network decides the game has been lost, its influence on win probability decreases to a very low value (there is very low variation between different moves). However the random impact of the blunders in the rollouts remains the same.

      It would seem like this might, in effect, be similar to AlphaGo running without the value network at all.

      According to the Nature paper (Extended Data Table 7), the vs-Fan-Hui version of AlphaGo without the value network runs about 475 Elo points weaker than the version with it, approximately going from 2d pro to 8d amateur.

  46. Terence Stoeckert says:

    One thing I would like to know is if we can expect a commercial, single processor version of the game anytime soon, and what strength such a version might exhibit?

    • Assuming the difference in skill is comparable to the difference shown in the Nature paper for the older AlphaGo, it should be strong enough to win ~30% of the time against itself given a fancy enough machine. I assume that’s still professional level but not good enough to beat a 9p player as consistently as the distributed version did.

  47. Hello :) I enjoyed GoGameGuru coverage of the game very much. I also bought the Lee Sedol volume 1 & 2 from your e-shop quickly in case they ran out of stock.

    I had developed a strong AI program for Othello when I was young and would like to share what I thought happened. AlphaGo’s strength as a computer is its breadth and depth of search for possible future moves/scenarios. Its weakness is the ability to accurately measure the value of a game board snapshot at a certain point in time.

    A possible cause for what happened is the assumption which AlphaGo made. During post-match interview, DeepMind Team said AlphaGo will assume its opponent will make the best move (accordingly to its definition of ‘best’ move) when doing future projections of what might happen.

    This assumption is necessary otherwise the breadth of calculations will need exponentially more hardware to calculate each move. They already used a lot of hardware this round with with this assumption to simplify possibilities. This assumption works most of the time as human will make so-called logical best moves. However, the assumption is not true all the time. In this case, move white 78 may be interpreted as a suicide move by AlphaGo, thus it may not even explore the future possibilities for this branch of the search tree.

    After that move, it appears that AlphaGo is certain that the centre white group is dead or alive (not sure which one). Regardless, the moves after that it does not seem to value the big contention points at the centre because the centre is probably interpreted as settled.

    Lastly want to add that I was very sad when Lee Sedol lost the first 3 games. I could not sleep and stayed up to 7am in the morning watching and reviewing game 3. I was so affected that I think I may not be able to work this week. When Lee Sedol won game 4, I felt a great sign of relief and were very moved by his play.

    • I also want to add that David’s review of the match drawing comparisons to trading and stock market as deeply insightful and impressive. Supply versus demand, current valuations versus future potential, local developments versus global general trends. All these are deeply interlinked on the foundation of human psychology and the nature of life itself.

  48. I’m just interested in this “fight”, have no experience in Go, but maybe someone can answer me this question.

    Does anyone know, whats the computer-“strength” behind AlphaGo (in the match against Lee Sedol).
    In the fight aginst Fan Hui the system consists of around 1000 CPUs, and over 100 GPUs.
    This is far, far more than the power of famous DeepBlue (Chess computer beating Kasparov). Even a modern day PC is stronger than DeepBlue.
    So can we conclude if AlphaGo would run on DeepBlue, it would have no chance against a human pro ?

    • That’s not really a fair comparison considering just how obsolete DeepBlue is. For reference, the single (server-grade) machine that runs on 48/8 CPU/GPU setup is able to defeat the distributed version of the machine 25~30% of the time. The distributed version runs on over 1.5k CPUs and nearly 200 GPUs, but you lose effectiveness the more hardware you add on due to diminishing returns.

      Assuming Lee Se Dol loses his next match, this would put the single machine type at Lee Se Dol’s level, at least in winning and early game scenarios.

      • Warren Dew says:

        48/8 is much more than a typical PC, though, let alone Deep Blue. I think we can indeed conclude that on Deep Blue hardware, AlphaGo would have little chance against a human pro. In addition, it could never have gotten through its training set.

  49. Precise bug found?
    1) Note that ladders (more or less) have to be hardcoded for MC algorithms, which developers confirmed for standard ladders.
    2) Note that the mistake of AlphaGo was precisely the misjudgement of a non-standard ladder.
    Under the assumption that the developers did not invest too much extra code (as intended) into all kinds of non-standard ladders,
    this is precisely the blind spot/bug in AlphaGo which can be abused by Lee Sedol any time of the day.
    In my opinion, this explanation is the most consistent with the facts regarding the algorithm and the game.
    Is there a way to get an answer?

  50. I don’t understand move 101. The AI seems capable of playing incredibly well and then makes a move like this, very bizarre.

  51. Trenchdog says:

    My compliments to David for the style of this commentary. As a non-Go player (something I should change) the flow of the game became understandable to me, making the performance of Lee Sedol that much more special. Keep up the good work!

  52. If my memory is correct, Myungwan Kim found move 78, before it was played, and figured out that there was a good reply to it but that it might be worth a shot anyway. He did this even though he was busy talking to two other presenters and a live audience at the same time. I have no doubt that it was a very fine move which most players wouldn’t have found (Myungwan Kim is not most players!), but I guess it wasn’t any sort of once-in-a-lifetime genius either.

  53. Dennis ng says:

    Why trust a partial search and a neural net that cannot be debugged, especially in field more about real life-and-death, pun intended.

    Have done something in health IT and have some exposure in old AI. One of the reason I thought the industry dropped AI (i.e. Don’t try to pretend or even replace human) by shift to so call expert system to work with human (and fir human to know they work with a fallible computer). This is precisely one cannot be sure about what those computer do.

    You can auto-pilot plan, landing plan or even self drive car. Only in limited situation and human are learned to watch out and be accountable. And the human would ask for explaination, precisely such partial search and neural network cannot provide. There are computers but not that kind.

    Say, the machine prescribe you a drug amount. Each and every prescription has to be explained and accountable. There is no winning. No +/- sign or score. There may be multiple answers or range of drugs. But each has to be … Well if wrong jail or a large sum of money or the reputation of a whole firm.

    Now can Alphago explain its move, or do a post-game analysis, or say it exhaust possibilities legally exhausted (not probabilitistic likely) . Not sure.

    Even worst, if there is an obvious mistake can you pin-point to how to debug it, fix it and deliver a bug free version as far as you know. Monte c search aside, how can you unlearned a neural network.

    These are those q I thought the industry think of and dropped these neural network and partial search thing. Why come back?


    Really want to know the alternative move discussed above. In fact the good play aside, Alphago cannot be a good teacher if it cannot explain.


    Back to a fun q earlier. It is not sure losing to human is needed for Alphago to evolve into sky net. Unlike deep blue becomes flipping egg program (doing protein folding program), Alphago are into general behaviour especially in health. That is no stop of it evolve into that as well as it is into health.

    In fact that worry me a lot. Not of sky net but the nature of those algothium. It can be genius neural network, just not sure we know what to do with the result.

  54. I’m am new to GO, so there might be a lot written about this elsewhere, but how do we know that a komi of 7.5 makes the initial winning probablility 50-50.
    Watching the first four games, Lee Sedol and AlphaGo both seemed to struggle more playing with black.

    • I can tell you that, for sure, it’s not 50.-50
      Whatever the “fair” komi is, it has to be an integer value.
      Otherwise, one side could lose with perfect play.

    • Uberdude says:

      We don’t know that 7.5 komi (in Chinese rules, usually equivalent to 6.5 in Japanese) makes the winning probability 50% with perfect play (or with AlphaGo’s play, which would be an interesting experiment). What we do know is that with that komi when human professionals play the black/white win rate is pretty close to 50%. Komi used to be lower, 4.5 in Japan a while ago (and no komi at all a long time ago) but black won >50% so they increased komi to 5.5. Black was still winning >50% so they increased it again to 6.5. Some professionals (e.g. Kim Jiseok) do prefer white now with the 6.5 Japanese / 7.5 Chinese komi, but others (e.g. Gu Li) prefer black and it gives pretty close to 50/50 wins.

  55. I would have liked to see a game against Lee Chang Ho when he was the number one, completely different style – Sadly we will never get to see this game. Very pleased for Lee Sedol, he cracked the code.

  56. https://www.youtube.com/watch?v=2IV-3zh5R1M
    Check 31:40 mark, they explain why 78 is a good move.
    (It’s in Korean but you can follow the lines)

  57. humblelife says:

    I would like to thank players, designers, match organizers and everybody who has helped to explain these games. This is a huge learning opportunity for people in the go world. I think the current AlphaGo is the strongest player of all time. Any weaknesses it has seem difficult to exploit in practice, where practice is an even game starting from an empty board. It might not be the best at solving certain go problems or when behind, but those are just parts of a players strength.

    AlphaGo seems to have shown up to 4 weaknesses, but I don’t think any living person can take enough advantage to win over half his games with AlphaGo (at least not until the extent of those weaknesses is more fully understood. if the weaknesses are real and severe, Lee Sedol or other top players might lose a majority of the first 100 games exploring those weaknesses, then win a majority of games after 100 by exploiting those weaknesses. That is somewhat pointless because such a flaw would be fixed by DeepMind in the next version).

    1. Perhaps it is weak at very interconnected and chaotic life and death (seki, capturing races and loose ladders involving multiple groups). This might be true to some extent (see game 1 where Lee Sedol made the first overplay but then seemed to take the lead), but I think this weakness is overblown. If this is the weakness that caused it to collapse in game 4, then AlphaGo seems to have kryptonite; however, I don’t think this is the case.

    2. It doesn’t know what a blunder is in human terms. It doesn’t know that some mistakes have a very obvious and easy answer; therefore it’s judgement of a move’s win % is flawed in some cases. Even if it thought it fell clearly behind in game 4, it had a much better chance by playing a tight endgame when it had 1 hour to Lee Sedol’s 1 min. The lead may have been small or unclear to a human, but those endgame mistakes were very obvious and tanked it’s chances to win. Put another way, Its judgement of what is a losing position is flawed when it is paying human opponents (like resigning an endgame you would lose by 1/2 point after reading 30 moves ahead. thehuman is not so sure because it can’t see as far as AlphaGo).

    3. It does not have enough respect for it’s opponent’s creativity. AlphaGo assumes its opponent will play the best move, which is very respectful; however, it seems to think it can see everything in local positions. In positions with a lot of aji, it seems to think that no aji exists in closed positions unless it can see a working move. In open positions it understands that aji should be removed (like taking off a ladder stone before your opponent can play a good ladder break). When Lee Sedol played move 78, a human would have re-read. Wait, does that work? Did I just lose? The human would prepare countermeasures. With one hour left i would likely spend at least 10 min making sure nothing worked for Lee Sedol (if i lost it meant he then saw something I didn’t, which is likely since he is Lee Sedol). If I saw that something seemed to work, I would spend ALL 1 HOUR of my remaining time finding the best countermeasure. Myungwan and Redmond each found the same ko as well as 2 other variants which looked promising. Even if 78 truly took the lead, these countermeasures had the best chance to keel AlphaGo’s hopes alive.

    4. It goes on tilt when it feels hopelessly behind. This seems true. All Lee Sedol needs to do is start winning in order to win. That is brilliant. Who would have expected that taking the lead in the game would help you win (sorry for the sarcasm). Taking the lead against AlphaGo is very difficult. Only once did humans and AlphaGo agree that Lee Sedol had a winning position, and that was in game 4 after AlphaGo already went crazy. This is the hardest weakness to exploit. If this was its only weakness it would be basically unbeatable.

    That said, I hope Lee Sedol wins game 5. I also hope AlphaGo gets stronger and can play many pros and later many other people. Even if it can’t explain moves, we learn a lot from our defeats (what works and what doesn’t. which of our positional judgement turned out to be incorrect. Think of it as playing a top player who speaks a different language and taking the game record to a weaker pro for explanation in your langauge (i suppose we will need to find a weaker pro ourselves. for now).

    very exciting times for go.
    have a nice day

    • David Tan says:


      Good points about the weaknesses of alphaGo. I would want to add one more weakness: inability of alphaGo to adapt its playing style according to its opponent style or weakness/strength.

      We humans are capable in adapting our play according to our opponents. For example, when I play against other players, even players that I don’t know anything about, I usually can “size” up, after says 30 moves, how good my opponent is relative to me: is he stronger / weaker / about the same level as me? I would then adapt my strategy accordingly, to be more defensive/conservative or aggressive / risk-taking, etc.

      Professional top players, of course have even more knowledge and capability to adapt (e.g. avoid complicated opening, take territory or influence in the center, etc). My guess is that alphaGo don’t have this type of capability. If it was playing against me, a much much weaker player than Lee, I would bet that it would use the same playing style as it was playing against Lee Sedol.

      • Antti Virtanen says:

        This is a good point too. In the game of poker, this analysis and exploitation of individual opponent’s weaknesses is something humans are still better at than AI. Compure AI can beat a professional poker player in a heads-up match one-on-one, but professional players are better when dealing with the psychology and dynamics of multiple opponents. Of course poker is a completely different game from AI perspective, but getting inside a human opponent’s head is difficult for AI.

        I played yesterday a couple of games after a long break. My own go play is always tailored to exploit weaknesses of my opponent, and I avoid situations where the opponent could leverage his/her strengths. This works fabulously well on my amateur level, but I do not know if this works in a professional level. After all, one does not get to 9p without being extremely strong on all aspects of the game. Even so, the 9p professionals clearly have very different approaches to the game and recognizable styles. (A friend of mine even wrote his master’s thesis on this subject and trained AI to recognize the player based on the game record. Worked quite well, about 12 years ago.)

  58. Anonymous says:

    You don’t have to worry about AI now. Everything AI needs is dependant on human engineers and programmers. When a computer decides for itself that it wants to learn GO, teaches itself without any human programming or intervention and it beats the top humans, then is when you have to start worrying.

  59. humblelife says:

    I saw some people talking about handicaps. most go players stronger than 5 kyu should know that as ridiculous. Lee Sedol could not give a 9 stone handicap to an amateur 1 dan (let alone 4 dan), even though his elo rating says he should. Even if a perfect computer that could read 500 moves deep (longer than any go game in history) and 361 moves wide (the greatest number of legal moves in any given turn), 3 stones would be too much. a top human would just play defensively and sit on his lead until the game ended (in chess that would be like if i automatically won if you failed to win in 8 moves). They might feel ashamed to play submissively, but they would win. In practice against an AlphaGo 10 years into the future (reading maybe 40 moves deep and 50 moves wide), a 2 stone handicap might be possible; however, Lee Sedol would learn and improve until the proper handicap has less then 2 stones, just as Shusaku learned from his defeats against Shuwa and Ota Yuzo.
    I do hope we can see this 2 stone case eventually. It would show how much we have learned form the computer champions.

    have a nice day

    • Anonymous says:

      I agree with your point about Lee Sedol not being able to give 9 stones to an amateur 4 dan but I am skeptical about your two following points concerning computers.
      For the hypothetical first “perfect computer”, I imagine 3 stones is too small of a handicap. I would argue that there are still tremendous subtleties, particularly in the opening and middlegame, that are not fully exploitable by any current AI and of course any human player. It is only in the endgame (and sometimes not until the deep endgame) that truly precise point calculations come into play in the current state of things. If a computer could calculate precisely from the beginning, we can imagine that it needs only to play marginally better than the human player each move and such differences would compound over 200 or so moves in a game to the extent that 30+ point differences would not be uncommon.
      And for an Alphago 10 years in the future, it is unlikely that humans would ever catch up once a 2 stone difference in strength was established. This is different from a weaker human player improving against a stronger human player like the Shusaku example.
      AI would advance in the time that Lee Sedol would be trying to learn and improve against an already stronger Alphago. But even if the AI did not, humans have their limitations when it comes to a mental exercise like Go. It is likely that once a certain level of skill is reached by computers, no amount of training would allow a human to appreciably decrease the difference in playing ability between him/herself and a computer.

  60. We don’t need to wait for the Google team to tell us went wrong at the end. With truncated time the machine’s analysis will be truncated. It simply has no ability to run all of its calculations and simulations and so we can expect its play to degrade as it’s time ticks down. If Lee can get the computer to burn more of its time faster and that is a big if he can certainly get a serious advantage. The game is weighted to the machine because of the time limits, something that was not obvious at the outset but is clear now. In a longer match Lee would do better.

    • Kevin Williams says:

      We demarcate the rules of this particular game on a 19 x 19 board and make it such that the AI is playing according to human timescales.
      Hence, in a “longer” match, with AlphaGo on the same H/W specifications and without further training sets, then yes, Lee will likely do better.

      In a very quick match with seconds for each move, no human can beat the AI, so if its any solace for you, humans can beat the AI under only very defined and constrained conditions, because of human limitations.

      It is the similar to modern fighter jets which can mechanically withstand rapid maneuvers easily but the human pilot within cannot take the high Gs, so humans fly the jet according to its own limitations.

      Exponentially, with a bigger game boundary, say an incredible 50 x 50 Go board or a 3-Dimensional Go game, the AI will be able to take in the bigger picture given more parallel processing resources, but no human brain will be able to take on the huge order of complexity increase.

      Further, AI algos can play against itself millions of times and improve its game within a relatively “short timeframe”. H/W can also be scaled up numerically and improved with each new generation of CPU every year or so; whereas human brain evolution is over eons and limited by the individual size of our skulls.

      Hope I didn’t spoil anyone’s day…and I sincerely wish Lee can hold the human fort at least for a little more.

  61. Antti Virtanen says:

    I think this is very interesting, knowing something about game AI programming and the game of go. What happened here – a brilliant unexpected move and opponent losing it’s mental balance – is actually quite human. Professional players in various fields sometimes “tilt” in a similar situation. Ask any professional poker player or look deeply into your own games.

    The AI can be improved certainly. It should be quite possible to use a different approach if the situation is no longer a winning position according to it’s evaluation. Using a differently trained neural network for coming back from losing positions might do wonders and prevent idiot moves. Other tweaks are possible too. For example, no human would answer Lee Sedol’s move casually when he makes an unexpected move after serious thinking.

    I’m surprised how extremely strong AlphaGo is, despite it’s inability to manage a losing position. It is obviously very difficult to push AlphaGo into a losing position in the first place.

    I wonder how it would do against Takemiya Masaki’s “cosmic style”.

    • Warren Dew says:

      I do think that one of the features of monte carlo methods is the ability to make human like mistakes, and of course neural networks are modeled on biological brains.

      One of the things I worry about with increased processing power now being tied to parallelism is that monte carlo techniques are often required to take advantage of the parallelism and we lose the ability to extend traditional repeatable and precise computer programs to take advantage of available processing power.

  62. Lafcadio says:

    I know Lee is playing black in G5 but I wonder if he were white and played the same as he did in G4 would Alphago follow as before, to defeat? I ask because Alphago is “frozen” for this match, does that mean it is not learning?

    • Monte Carlo methods don’t play the same moves, there is an important random part in the decision process … So the answer is no ; -)

  63. What I find most interesting, and doesn’t appear to have been widely commented on, is that AlphaGo spent time & resources trying to find out where it had made it’s mistake mid-game.
    “Mistake was on move 79, but #AlphaGo only came to that realisation on around move 87”
    Perhaps I’m misinterpreting Demis’ Twitter comment, but to me this implies the machine saw it’s winning probability drop (presumably via noticing a sequence it hadn’t previously considered) and then, instead of just playing the position as-is, reassessed it’s previous moves based on this new information to find the error. Possibly this is part of it’s learning technique, but I would have thought this would be done as an “after-game review”.
    If correct, to me this would perhaps be the most human aspect of the computer.

    • I think DeepMind (the human programmers) may have simply looked back at the logs to see where AlphaGo would have played differently if it had somehow already had the valuation information that it calculated by move 87.

    • I assume the programmers have access to running copies of Alpha Go during the game where they can force some other moves and see the outcome. I would assume that they would try to do this when the probability of a win drops suddenly (AlphaGo realises he has missed something in his previous calculations)

  64. Zaphod, 4 kyu says:

    Lee Houdini trick wedge tesuji of white 78 doesn’t work.

    It worked against both official commentators as well as Alpha, but it didn’t work against Kim, who found a defence to it before Lee played it, by exploring a few local lines with the aid of an external memory – putting the stones on the board and playing out the variations.

    Alpha relies upon Lady Luck to find her moves, but on this occasion her luck ran out.

    • A move doesn’t have to work 100% to be be a “working move” especially in a game like Go. It just needs to work well enough that it will work at least half the time in a actual game situation.

      A lot of “working moves” in Go are like that. Rather than being moves with no counter, they generally fall into moves that require difficult, non-obvious counters that are easily missed. In that case, I call that a “working move”.

    • Al from UK says:

      “Lee Houdini trick wedge tesuji of white 78 doesn’t work.”

      Do you mean that it “doesn’t work” tactically or “doesn’t work” strategically? I wonder whether some of the comments here are from people who are trying to analyse this game with a ‘chess mindset’ rather than a ‘go mindset’. Yes, it may be the case that it seems difficult to find a clean series of forcing moves after move 78 that would rob black of his advantage in the centre, but what about the strategic ‘global’ territorial advantage that the tactically threatening move 78 achieves? Move 78 isn’t about trying to find a clever path to a neat ‘checkmate’ type situation, but about mixing up the position to gain advantages elsewhere on the board. Interestingly Alphago seemed to recognise Lee’s strategic intention, and blew it, meaning that Lee had clearly created a winning position for himself as a result of this tesuji. After the first three games, I don’t think the argument that Alphago was just playing badly is a plausible theory. Ironically the three defeats preceding this game only highlight the brilliance of move 78.

  65. Geoffrey Briggs says:

    From “The Australian” Tuesday 15th an article by Niall Ferguson (a historian resident in the USA) concludes as follows.
    “Last week’s good news? Google’s AlphaGo computer won a five-game tournament against Lee Se-dol, a leading player of the ancient Japanese strategy game of Go. Can someone please persuade AlphaGo to run for president?”

  66. Lawrence Byrd says:

    I suspect that AlphaGo’s extremely clever cache alhorithms ran into its clock management algorithm (said in one interview to be fairly simple). AlphaGo must be doing a huege number of future board calculations and so they must have developed some very smart way of caching these so they don’t need to be recalculated, even through different variations that end up in the same place. This would vastly expand AlphaGo’s depth of read of any situation. I think Lee Sedol’s 1 in 10,000 shot blew the cache – AlphaGo had zero cached analysis of anything with that stone in play and went blind for a while having to use shallower and simpler analysis until it caught up. Michael Redmond commented that a Pro would study an unexpected play like this for a long long time. But AlphaGo didn’t – the too-simple clock management algorithm then kicked in and did not grant it enough recovery time. Hence the “not until move 87 did AlphaGo know” comment from its team. By the time it had rebuilt the depth of future possibility analysis it normally uses, it was too late. That’s my working theory of the weird play after move 78.

Speak your mind