Artificial Intelligence: Deep learning
Every day, new articles about artificial intelligence’s accomplishments appear. However, many people do not know how or why artificial intelligence works.
Currently, the most successful form of artificial intelligence is machine learning, in which the computer learns from examples of correct answers. Unlike solving problems at school, where students see a question, solve it, and get an answer, machine learning uses a vast amount of questions and answers to infer a set of rules for solving the problem.
Machine learning is once again divided into many categories, but the most prominent subfield of machine learning is deep learning, where computers use a single mathematical function to represent all of the rules. The computer program that deduces those rules is called the model. Despite the incredible amount of computations behind deep learning, the underlying math is relatively clean and only requires a basic understanding of derivatives to understand.
For example, suppose we wanted to determine the difference between an image of a dog and a cat. In that case, we could turn each image into a series of pixel brightness values, which we will refer to as X. If the images were 64x64 grayscale images, they would have 4096 pixel values. After multiplying each individual pixel value by a different number, W (weights), and adding them together, we end up with our result. (That’s W⋅X for the vector people). If the result is greater than 0, the image shows a dog; otherwise, it’s a cat. By looking at the difference between the model’s predictions and the actual answers, we come up with a cost function, which measures how well the model is doing. The higher the model’s error, the higher the cost, and the worse the model is doing.
One of the largest challenges of deep learning is determining the correct weights, and the process of finding the right weights is called training. Deep learning models begin with random weights, and after training, they become adapted to performing their tasks.
The training process is similar to walking down a mountain at night. You do not know how high up you are, how far down you must go, or the way down. The simplest way is to always go down the steepest route, and that is the basis of how deep learning training works.
In deep learning, after getting an initial result, the model determines the steepest direction of the hill (takes derivatives) and moves all of its weights in that direction.
After a series of weight changes, the model arrives at its desired weights and stops training. Of course, choosing the steepest direction every time is hardly the best idea, and the model often gets stuck in a hole somewhere on the mountain. There are many other methods to train a better model, but most of them require complicated mathematical formulas to properly describe.