Google teaches its AI to DREAM so it can boost its learning speed

Google teaches its AI to DREAM: Firm hopes technique used by animals (including us) could boost learning speed

Google taught its AI to dream similar that of animals to improve learning

The system replays sequences of a game containing rewarding events

Researchers also pulled inspiration from how babies learn motor skills

It learned how actions affect what it see rather than just predictions

This means the AI doesn't need to be customized to different games

Published: 18:06 EST, 18 November 2016 | Updated: 18:14 EST, 18 November 2016

Google's AI achieved record scores playing Atari and beat a human during a game of Go – but to do so, the system needed constant training over a long period of time.

In order to speed up and simplify the learning process, the team at DeepMind has now taught its AI to dream in a way that is similar to animals.

After replaying sequences of a game containing rewarding events, the system proved to learn the stages 10 times faster than previous algorithms.

Scroll down for videos

To speed up the learning process, the team at DeepMind has now taught its AI to dream in a way that is similar to animals. After the system began 'dreaming' about a game called Labyrinth (pictured), it proved to learn the stages 10x faster than the previous algorithms

THE AI THAT CAN DREAM

DeepMind's system Unreal was augmented with two additional tasks.

The first task taught the AI to control pixels on the screen, which emphasizes learning how your actions affect what you will see rather than just predictions, which is similar to how a baby develops motor skills.

The second task taught Unreal to focus visual features in the game, which it has seen during previous rounds

'Just as animals dream about positively- or negatively-rewarding events more frequently, our agents preferentially replay sequences containing rewarding events,'explains DeepMind researchers.

After testing its agent, DeepMind found that its system, named Unsupervised Reinforcement and Auxiliary Learning agent (Unreal), can also play the game at 87 percent the performance of expert human players, reports ZDNet.

Researchers said now that they do not have to spend so much time training Unreal, they can carry on with more of their own experiments.

Unreal's dreams of chasing apples through a maze is the focus on the virtual game Labyrinth, which is similar to the popular video game series Quake.

The machine quickly walks through a windy maze and scores points every time it collects an apple in its path.

Researchers chose this game because it reinforces positive behavior with points and the system only knows part of the maze at a time – allowing them to monitor how fast it picks up the rest.

DeepMind researchers recently published a paper entitled 'Reinforcement Learning with Unsupervised Auxiliary Tasks', which highlights the 'dreaming' technique that improved the learning speed and final performance of the system.

And to do this, the team augmented the system with two main additional tasks – one mimics how babies develop motor skills and the other is similar to how animals dream.

'Just as animals dream about positively- or negatively-rewarding events more frequently, our agents preferentially replay sequences containing rewarding events,' they write in the paper.

The first task taught the AI to control pixels on the screen, which 'emphasizes learning how your actions affect what you will see rather than just predictions,' DeepMind explains in a blog post.

'This is similar to how a baby might learn to control their hands by moving them and observing the movements.'

Google's AI achieved record scores playing Atari and beat a human during a game of Go – but to do so, the system needed constant training over a long period of time. However, researchers have taught it to 'dream' about a game in order to enhance its performance

ALPHAGO: THE CHALLENGES OF BEATING A HUMAN

Traditional AI methods, which construct a search tree over all possible positions, don't have a chance when it comes to winning at Go.

So DeepMind took a different approach by building a system, AlphaGo, that combines an advanced tree search with deep neural networks.

These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like connections.

One neural network called the 'policy network,' selects the next move to play, while the other neural network - the 'value network' - predicts the winner of the game.

'We trained the neural networks on 30 million moves from games played by human experts, until it could predict the human move 57 percent of the time,' Google said.

The previous record before AlphaGo was 44 percent.

However, Google DeepMind's goal is to beat the best human players, not just mimic them.

To do this, AlphaGo learned to discover new strategies for itself, by playing thousands of games between its neural networks and adjusting the connections using a trial-and-error process known as reinforcement learning.

Of course, all of this requires a huge amount of computing power and Google used its Cloud Platform.

To put AlphaGo to the test, the firm held a tournament between AlphaGo and the strongest other Go programs, including Crazy Stone and Zen.

AlphaGo won every game against these programs.

The program then took on reigning three-time European Go champion Fan Hui at Google's London office.

In a closed-doors match last October, AlphaGo won by five games to zero.

It was the first time a computer program has ever beaten a professional Go player.

'By learning to change different parts of the screen, our agent learns features of the visual input that are useful for playing the game and getting higher scores.'

The second task taught Unreal to focus visual features in the game, which it has seen during previous rounds –allowing it to find shortcuts.

'The agent is trained to predict the onset of immediate rewards from a short historical context,' DeepMind explains.

'To better deal with the scenario where rewards are rare, we present the agent with past rewarding and non-rewarding histories in equal proportion.'

The second task taught Unreal to focus visual features in the game, which it has seen during previous rounds –allowing it to find shortcuts. 'The agent is trained to predict the onset of immediate rewards from a short historical context

'By learning on rewarding histories much more frequently, the agent can discover visual features predictive of reward much faster.'

Using these techniques, researchers found the AI was able to complete 57 Atari games and reach 13 levels during Labyrinth.

The team explains that their work isn't to just design a system that can master video games, but develop one that does need to be programmed to learn different games.

DeepMind says that their 'primary mission' is for the AI to 'learn to solve any complex problem without needing to be taught how'.