How machines can teach themselves without human intervention

Adam Maj
6 min readJan 3, 2020


When babies are born, they know almost nothing. Especially human babies. In fact, human infants are one of the dumbest types of infants in the animal kingdom. They have almost no agency on their own and they rely entirely on their parents, or whatever adults are around them. So what makes these mindless blobs of cuteness turn into the smartest creatures on the planet? Let’s take a look.

When infants first come into the world, their minds are an open book. Since they don’t understand things like language yet, there’s really only one way that they can learn: observation.

Babies observe the environment around them and then take actions based on this environment. Some of these actions are decidedly dumb. But the great thing about babies is that they learn from these bad actions. They start to realize that certain actions are not rewarding them, and they stop doing these things.

For example, babies are often mesmerized by fire. This initial interest might motivate them to put their fingers in a fire to explore their instincts. However, soon after putting their fingers in the fire, they will be met with an intense burning pain on their finger and will immediately remove their finger from the fire.

After such an experience, babies would realize (after a lot of crying and a quick run to the hospital) not to put their fingers in the fire anymore. Why? Because they took an action, in this case putting their finger in the fire, and they were met with a negative reward from their environment, in this case when their finger got burnt. Thus, the babies know that if they take this action again in the future, they will likely be met by the same reward and they will cease in taking this action.

Similarly, if a baby finds a cookie in his house and takes a bite, he will be rewarded by the great taste he feels. Since our baby took the action of eating a cookie and was positively rewarded by the good taste of the cookie, the baby will be motivated to further take this action in the future.

Now, of course, this is quite a simplification of the actual process that's going inside babies' heads, but it is fundamentally the same process. Through this simple line of thinking, babies are able to interpret the immense amount of information the world has to offer in a relatively short period of time (after only a few years, they are already far smarter than almost any other animal on the planet).

So basically, babies are able to start from knowing close to nothing and can teach themselves to be geniuses through the process of taking actions and assessing rewards. Wouldn’t it be cool if machines could do the same thing? It sounds almost too perfect for them. Machines start out knowing absolutely nothing, but we would like them to learn from their environment on their own.

In fact, there is already a strategy in machine learning that uses this exact methodology to train machines. It’s called reinforcement learning.

What is reinforcement learning?

Reinforcement learning basically uses the exact same thought process that our baby used above to allow machines to learn from their environment. Let’s take a look at the specifics of the process.

Let’s say that we have some environment. In the case of the baby, that environment was the entire world, or more immediately, the specific area of the world that the baby is in at a given time.

In reinforcement learning, the environment can be anything that we want to make our machine good at. For example, the environment could be some kind of game (virtual or real) or some task that we want our machine to get good at. In its most essential form, the environment is just some kind of space where actions can be taken and will result in specific outcomes.

We can use reinforcement learning if we want our machine to learn from the environment its in (just like the babies learned from their own environment). We’ll call our machine the agent from here on out. Our agent is simply taking actions in its environment and seeing what happens. These actions might start out seeming pretty dumb, but they will start to make more sense over time.

Every time our agent takes an action in its environment, it is observing and interpreting the changes to the environment that have taken place. The current state of the environment is known as the state. The agent is also interpreting the rewards that it receives from the environment, just like our baby. The agent associates rewards with the actions it has taken and knows whether to take an action in the future based on the rewards it has received for taking that action in the past.

These concepts are the essentials of reinforcement learning. Pretty simple right? It almost seems like glorified trial and error (although its still super effective in a lot of cases).

Why is reinforcement learning useful?

So at this point, we understand what reinforcement learning is useful, but why would anyone ever want to use it for their machines. Aren’t there already better and fancier methods out there? What’s the benefit of using reinforcement learning over one of those fancy neural network things?

Actually, reinforcement learning is far more useful than other techniques in many cases. This is because training neural networks always requires human intervention. Datasets have to be labeled by humans for neural networks to be able to identify and analyze data effectively. This obviously takes a lot of effort.

Reinforcement learning is ideal because it allows us to do minimal work as humans. All we have to do is put the machines in an environment and watch as they do all the work and learn. So many cool things can be accomplished by this strategy, and many of the most notable accomplishments that AI has achieved are due to reinforcement learning (for example, when Google’s famous AplhaGo beat the world champion in Go).

One of the coolest applications of reinforcement learning is when Google’s DeepMind made AIs to play hide and seek against each other. Their results show the true potential of the technology (check out the video below!)

Key Takeaways

  • In its most essential form, reinforcement learning is simple and is analogous to how humans learn.
  • Reinforcement learning involves an agent that takes actions in an environment. This environment can be anything that the agent needs to get good at.
  • The agent interprets which actions are good and bad based on the rewards it receives from the environment and the state of the environment.
  • Reinforcement learning is very powerful and is useful for tasks where a machine has to achieve a specific goal.

Wait… don’t click away yet!

I’m Adam, a 17 year old passionate in technologies like artificial intelligence/machine learning, blockchain, quantum computing, and much more.

If you enjoyed learning about this topic or are interested in modern technologies and artificial intelligence, make sure to: