22 July 2018

Ten years ago, the iPhone 3G was just announced, the second version of the iPhone ever. A lot has changed since in tech to the point where it is hard even to imagine how we lived before. Ten years into the future… who knows where tech would be? I don’t know for sure, but this doesn’t stop me from making educated guesses.

It is also hard to know “exactly” ten years in the future where technology will be in its trajectory, but what is easier is to guess some technologies which will be developed any time before then.

I am going to state all my verifiable predictions in italics and marked with a lollipop emoji. For example like this:

🍭 In 10 years the year will be 2028.

On the high level, I expect a lot of things to be very similar to our current lifestyle. The houses for example - it just takes a lot of time to build housing. And houses last a long time, so why not keep using the existing ones. Inequality around the world and within countries will remain similar, and globalization will continue. Hardware, materials and sensors will keep improving as well.

I’ll leave the discussion on the topics above for a later day.

Today, let’s talk about the future of deep learning.

Where can deep learning go in ten years? Lets first see how it got here. Deep learning started picking up just five years ago. Merely half of the period that I’m predicting.

In that period we’ve seen techniques such as Variational Auto Encoders and Generative Adversarial Networks, which everyone calls GANs. We’ve seen computer vision get to the point where you can create fake videos of the president saying whatever you want them to say.

🍭 We will think of deep learning as a novel type of programming, and not call it deep learning anymore. Instead, we might use other phrases such as “differential programming”, “natural programming”, etc.

Deep learning has terrific potential but is still a toddler and requires a lot of supervision, and tuning by an engineer. That process of tweaking acts as a way of “programming” without writing code. When a deep learning engineer tries multiple models and picks the one which is most promising as the base of the next set of iterations, that’s programming. It is not coding, as that decision is not explicitly embedded in the code, but it is programming nonetheless, as the software ends up being different because of that.

All that deep learning, or any machine learning does is to allows us to write code that models data in a way that would have been intractable for us if we were to use primitives such as for-loops or if-statements, object-oriented or functional programming.

Some people have started calling Deep Learning with the new phrase “Differentiable programming”

I like this a little bit, as it reminds us that it is programming after all. But is very limiting. It is based on using gradient-based optimization techniques to come up with the neural network program. The “differentiable” part comes from using gradient. However, this is not the only way to arrive at the neural network, and in many cases, there is no need to use a neural network at all.

I think a more apt phrase would be “Natural programming” - it doesn’t matter what exact technique you used to come up with the final machine learned program, whether it was deep learning or simple decision trees. What matters is that the final model represents the input data’s relationships.

And the input data is often coming from the real world, hence the “natural” qualifier. I expect a lot more use cases to center around humans and how we interact with nature and communities, and “natural” should come naturally, the way it is for “natural language processing”.

As much as I like “natural programming” as a phrase, only time will tell which buzzword we will use.

🍭 3 new different paradigms will emerge for deep learning, on the significance level of GANs.

Generative adversarial networks (GAN) is a training algorithm which allows algorithms to be trained by trying to outsmart other deep learning algorithms, simultaneously improving in the process. It’s kind of like how counterfeit money evolves to be very close to the real money, while the real money keeps getting harder to counterfeit. GANs have a ton of cool applications, including, but not limited to fill-in-the-cat.

It is hard to know exactly how these would work, as only through sweat and tears and raw compute power will these be proven to work. But here are some wild guesses about what might happen.

🍭 The Newton prediction. “Physical Laws” - Deep learning will be able to extract the physical or dynamics laws of a system just by observations.

This can be as simple as extracting the equation of the speed fall of an apple falling from a tree, which accelerates with the gravity constant g = 9.8 m /s^2 decreased slightly by the air viscosity proportional to the velocity.

Learning algorithms can already detect statistical correlations in the data, which help it predict where an apple would be if there are a lot of examples of apples falling in the data. I’m not talking about statistical relationships. I am talking about being able to explicitly create a model about the acceleration an object would feel. This model will be used internally within the algorithm to make predictions about the intermediate steps of the model and would allow models to train with a lot less data.

Right now, if you want to program a robot to juggle balls (photo below), you need to know about physics law and incorporate them in the code as a physical model. Learning directly from observations could take a very long time, but with the ability to extract logical rules about the physics of balls, it would require a lot less training from scratch. Such technology could be used to determine rules about animal behavior in the wild or users’ behavior on a social network. If crossed with GANs, this could provide new and improved ways to play all kinds of adversarial games, from Hold Em Poker to leading an actual war.

🍭 Human level language and emotion models.

Deep learning will be able to incorporate things like body language and intonation into its models.

The software will understand when we are depressed, or elated. It will allow for a lot of applications, from Tamagotchi 2.0 which observes your emotional state and adjusts accordingly to help you. This could mean that a personalized health app could determine which of your actions caused you to gain weight, or to get sick.

When computers “understand” us, we will increasingly anthropomorphize our electronics and the services we use. Kind of like this lil robot dude on the left, with the red cup hat (Pintsize from Questionble Content)

.. or C-3PO from star wars

These algorithms, coupled with realistic and fast speech generation will be able to pass the Turing test. Surveillance devices such a- Google Home, Alexa, or closed-circuit cameras will be able to utilize these for even deeper tracking of humans

🍭 Einstein from fifth graders. Massively parallel optimization and Deep merging

Today, deep learning parallelizes the work it does by using GPU. GPU allows it to run through the pixels of each image much faster. But still optimization is a sequential process and requires you to go in each photo in the dataset in sequence. If you have a giant dataset, you’re stuck going through each example. It is still workable to a degree, but there can be datasets that require a cluster of hundreds or even thousands of computers just to store. To get to the massively parallel state, this would require merging the results of two or more independently trained network into a single network of roughly the same size.

Such merging wouldn’t be combining the two network in an ensemble and calling it a day. It would be close to the result you would obtain if train separately. If we think of each parallel training as getting to the knowledge of a fifth grader, then with the combined intelligence of them, they can invent general relativity. After all, is Einstein really smarter than a fifth grader?

To achieve this, there will need to be a way to merge the fifth-grader networks into a single network of roughly the same size. After all, a fifth grader ‘s brain is close in size to Einstein’s.

Such merging can only work well for certain neural network architectures. I can’t tell you now how these architectures will look but I might be able to do so in ten years.

Such merging might also need to require different specializations of the different fifth grader networks- one of them might be good at understanding physical intuition, other might be good at mathematics, and another might be good at imagining though experiments and yet another might be good at coordinating between them.

Such process might be called something like decentralized collaborative learning. Or fifth grader learning 😂👌.

Conclusion

I expect deep learning to advance in ways which make it better at things that humans are good at such as modeling nature, modeling humans and working well in communities. It will also have a different name ten years from now, and I am rooting for ‘natural programming’.

Do you buy my arguments? Do you have alternative predictions? I’d love to hear at [email protected] or on Twitter at @themitak

Hope you enjoyed the 🍭s

One short post per week, discussing actionable mental models. Join a community of readers, who receive these posts over email.