Life is not a reinforcement learning problem

14 April 2016

Reinforcement learning(RL) is an effective way to program robots and intelligent agents to discover their environment and adapt to it. In RL, the agent receives rewards based on the the action they did and the state of the world and resulting outcome, and learn to understand which of their actions will result in a higher reward.

Who describes the reward? The programmer. For the robot, the programmer is God almighty. The programmer is the dictator. The programmer giveth the carot and the programmer giveth the stick.

As when we live our human lives we are also adjusting to the perceived rewards for our actions. We are very good at it. Optimizing for grades at school, social influence to our peers, love and relationships, money earned, personal growth, mastery of a hobby. We are great at chasing the reward, maybe even better than a military drone chasing its target. I’m not actually sure about the drones capabilities, but I know for fact that humans are really good at playing the well described reinforcement learning games of life.

But who defines what is rewarding and what is misery? Is it our dopamine receptors? Is it Hollywood and The American Dream TM? Is it sex? Is it religion? Is it the internet?

It is up to each of us to decide how to define their own reward. If we don’t do it for ourselves, and we very rarely do, someone else will define it for us. But then, even if we are amazingly good at getting the reward, we will still live our LIFE as robots programmed by the person who described the reward system.

To me, this doesn’t mean living. It merely means executing along as long as our biological machinery works. It is passive existence, even if we are prolific contributors to society.

But what does it mean to define your own reward? To be honest, I don’t think it is mathematically possible to define this well. The reason is that you always need to base your definition onto some prior assumptions. But how do we make our own rewards if we always have to base them on something. There are infinitely many valid choices. We can’t decide…

Here is my opinion. To make your rewards your own, you need to at least understand why you are making them. So lets keep asking about our actions such as “Why am I brushing my teeth?” “Why don’t I punch people?” and we will get a chance to unmask the proverbial programmer who defined our reward system.

Each of us might come up to a different answer, but at least each of them is one step closer to the unattainable answer of “Where does the reward come from?”.

As we dig in with the “Why” questions, we wouldn’t be able to answer the “Why” but we can at least reveal our presumptions. At least they can be our own.

Life is not a reinforcement learning problem

← Previous Essay

Grounding, part 4: Explaining leaky abstractions

Next Essay →

Bored vs distracted