The Power of Reinforcement Learning

How artificial intelligence can be used to reason through some of the most complex puzzles known to man.

Adam Maj
8 min readMay 28, 2020

Go is an ancient board game that is thought to have originated in China 4000 years ago. What’s amazing about it is that it has remained one of the most complex board games in existence and its popularity has only risen since its creation thousands of years ago.

In essence, the game involves two players, one playing white pieces and the other playing black pieces. The players take turns adding their pieces to the board until the majority of the board has been covered. The goal of the game is to surround as much territory on the board as possible with pieces of your own color.

The reason that Go is so complex is that there are 361 points on each board where a piece could be played. Thus, there is an immense amount of possible states that the game could be in. In fact, there are more possible configurations of the Go board then there are atoms in the entire universe. Crazy, right!

This level of complexity meant that Go remained a game almost exclusively restricted to human play, until very recently. Everyone thought that they had found a task truly impossible for any technology we could create, especially the computer. There were just too many possible configurations of the Go board and no one expected computers to ever become competent at the game.

However, in 2016, Google’s Artificial Intelligence called AlphaGo shocked the world by defeating the Go world champion Lee Sedol. This event was a major milestone in the world’s of both Go and AI. The entire Go community was sure of the fact that no program could ever come close to the level of the world champion, and yet they witnessed a program completely outwit the best player in the world with completely novel moves. This was an unprecedented event and made it known to the world that AI was no longer a technology to be underestimated.

Now you might be wondering, how were computers and artificial intelligence able to figure out such a complex game? The answer is a strategy called reinforcement learning.

What is Reinforcement Learning?

Reinforcement learning is a type of artificial intelligence where a program or robot tries to improve and teach itself based on the setting that it is placed within.

In the case or reinforcement learning, we will refer to the robot or program that is trying to improve itself as the agent, and the setting that is trying to prove in as the environment.

In reinforcement learning, the environment can be anything that we want to make our machine good at. In the case of the Go AI, the agent would be the actual AI itself, AlphaGo, and the environment would be the game of Go. In its most essential form, the environment is just some kind of space where actions can be taken and will result in specific outcomes.

After our agent is placed in a specific environment, it begins its learning process by taking actions in its environment and seeing what happens. Actions are just what they sound like — they are moves that allow the agent to interact with its environment. In our Go example, the agent, AlphaGo, would take actions in the Go environment by making random moves. These actions might start out seeming pretty dumb, but they start to make more sense over time.

Every time our agent takes an action in its environment, it is observing and interpreting the changes to the environment that have taken place. The current state of the environment is known as the state. The agent is also interpreting the rewards that it receives from the environment. In the case of Go for example, rewards would be achieving the goal of capturing territory, where the AI would be rewarded for surrounding more and more of the board. Rewards are meant to motivate a reinforcement learning algorithm to improve.

After taking a variety of different experimental actions, an agent in a reinforcement learning environment will begin to learn what actions it can take in various game states to maximize its reward. By taking the best possible actions to optimize for a high reward, reinforcement learning algorithms begin to learn.

These concepts are the essentials of reinforcement learning. Pretty simple right? It almost seems like glorified trial and error (although its still super effective in a lot of cases).

Why is reinforcement learning useful?

So at this point, we understand what reinforcement learning is, but what can it be used for? Aren’t there already better and fancier methods out there? What’s the benefit of using reinforcement learning over one of those fancy neural network things?

Actually, reinforcement learning is far more useful than other techniques in many cases. This is because training neural networks always requires human intervention. Datasets have to be labeled by humans for neural networks to be able to identify and analyze data effectively. This obviously takes a lot of effort.

Reinforcement learning is ideal because it allows us to do minimal work as humans. All we have to do is put the machines in an environment and watch as they do all the work and learn. So many cool things can be accomplished by this strategy, and many of the most notable accomplishments that AI has achieved are due to reinforcement learning (for example, when AplhaGo beat the world champion in Go).

One of the coolest applications of reinforcement learning is when Google’s DeepMind made AIs to play hide and seek against each other. Their results show the true potential of the technology (check out the video below!)

A brief note: the downsides of reinforcement learning

Although reinforcement learning does have many advantages, it does have a few pitfalls which are important to note. The main disadvantages of reinforcement learning are the following:

  • You have to quantify precise rewards in your environment, meaning that if you are unable to define a specific end goal for a problem, then reinforcement learning might not be the best solution.
  • Reinforcement learning will always start out inaccurate and will make many mistakes. Thus, you can only use this strategy if you can afford to make mistakes.
  • Finally, good reinforcement learning algorithms for complex problems often take a lot of time to train, meaning that if time is of the essence, reinforcement learning might not be the best strategy to use.

That being said, aside from these few disadvantages, reinforcement learning can be applied in many different unique situations to achieve impressive results.

My Reinforcement Learning Project

I decided that because reinforcement learning is one of the most interesting and powerful technologies, I wanted to try it out and make use of it myself to get a fundamental understanding of how the technology works!

When I started out making my project, I was looking through a lot of preexisting reinforcement learning projects online and they all looked well made and interesting. However, they all had one thing in common: every project involved creating an algorithm for an agent that operated in a preexisting environment. In other words, few people had created their own environment AND agent to solve a completely new problem. I knew that this was what I wanted to do to really go deep into the subject of reinforcement learning!

Creating a Project Modeled After OpenAI Gym

Most reinforcement learning projects that people have created make use of preexisting reinforcement learning environments made by Google’s OpenAI Gym. This provides free environments to everyone to test their own reinforcement learning agents (like the one shown below where the environment is a parking lot where a taxi has to pickup and dropoff passengers).

Here is an example of an OpenAI Gym reinforcement learning environment

I sought out to create my own reinforcement learning environment in the same format as an OpenAI Gym environment, except completely from scratch. After this, I created my own agent to operate in the environment!

My Reinforcement Learning Environment: TruckWorld

I decided to call my reinforcement learning environment TruckWorld. It essentially involves a grid with different pickup and drop off locations for cargo. In this environment, the agents are individual trucks, who have to go the the intended pickup locations, load up their cargo, and then drive to their drop off locations and unload the cargo. The truck starts of knowing nothing about the environment and has to figure out everything it has to do through trial and error.

I ended up creating a truck agent which learned completely from nothing how to complete all of these tasks in the environment I created, and it was a great learning experience trying to create all of the aspects of the environment and training my reinforcement learning agent!]

Checkout My Project & Article

My Article: If you want to learn more about reinforcement learning and get a better intuition for how it works, check out my article on the fundamentals of reinforcement learning: How machines can teach themselves without human intervention

My Project: Click here if you want to check out my actual reinforcement learning environment and agent and see how they work!

Key Takeaways

  • In its most essential form, reinforcement learning is simple and is analogous to how humans learn.
  • Reinforcement learning involves an agent that takes actions in an environment. This environment can be anything that the agent needs to get good at.
  • The agent interprets which actions are good and bad based on the rewards it receives from the environment and the state of the environment.
  • Reinforcement learning is very powerful and is useful for tasks where a machine has to achieve a specific goal.

Wait… don’t click away yet!

I’m Adam, a 17 year old passionate in technologies like artificial intelligence/machine learning, blockchain, quantum computing, and much more.

If you enjoyed learning about this topic or are interested in modern technologies and artificial intelligence, make sure to: