Are you working hard on your Machine Learning code but are ignoring your Gradient Descent algorithm? Your training times are likely suffering!
While machine learning is used to solve some amazing problems it’s fascinating to learn how they narrow in on the solutions. For us humans you might think of this related to any problem, say throwing a ball into a basketball ring.
You throw the ball and it might sail over the ring missing it. Try again and this time there’s not enough power so it falls short. So on and so forth. Eventually you throw the ball and after learning you score a point. What you just did was learn to complete a task (getting the ball in the ring) by trying and adjusting your next attempt accordingly.
You got closer and closer to completing the task as you narrowed in on what exactly it was you had to do. This is described as a “learning algorithm” in the machine world. There are many learning algorithms for many different tasks.
Gradient Descent is one of the biggest ones and getting your head around it is very important.
Essentially gradient descent is a mathematical way of describing what any logical person might do.
You try and complete the task. Then you test to see how close you were to achieving the task. Finally you try again but this time with your settings slightly changed towards the target. Then you repeat.
So if you threw the basketball and it missed to the left, you’d change your throw to be more towards the right. Easy!
However just like with many other things in Machine Learning, it gets much more complicated! An excellent source of information to further explain this and many other forms of Gradient Descent is Sebastian Ruder’s Overview post.
In it he goes through all the most popular versions and not only explains how they work, but why they are good and bad. His hugely detailed piece culminates in a fantastic GIF of all the various versions competing in a race.
Here you can see all the various versions starting off and going through their various tests. Eventually they all find their way to the star which represents the lowest error rate but at different speeds.
The speed at which each of the versions reaches the star translates to faster network training times. So as you can imagine everyone wants to find the quickest way to find the lowest error rate and reduce training time.
If you’re just beginning with Machine Learning, Deep Learning or Neural Nets it’s an excellent read. Even if you’ve been doing Machine Learning for a while it’s still a great resource so you know exactly which version to use for various cases. You might even learn of a faster version you’ve been neglecting!