Let’s Learn: Neural Nets #1
A step-by-step chronicle of me learning about neural nets.
As I’m writing this it’s early January 2022, and I’m finally getting around to doing the thing I’ve been meaning to do for 2 years.
That’s right — I’m going to learn about neural nets! I’ve been putting this off for age, mainly because whenever I’ve started I’ve been put off by something or the other. Mainly the maths. So much maths.
Nevertheless, I’ve decided that this year will be the year I learn about neural nets. Knowing me, it will probably take me at least a year to wrap my head around it… or will it?
I’m going to try using the Feynman technique of learning, which apparently is the best way to learn anything.
The Feynman Technique
In a nutshell, the approach boils down into the following steps:
Choose a concept you want to learn about
Quite simple really — choose a topic you want to learn about, and write down everything you know about the subject.
Explain it
Using your list of points, try to explain the topic to a layman. Use simple language, and do not use jargon.
Identify gaps in your explanation
… ideas that you couldn’t quite explain in simple language, definitions that were confused, and issues that were forgotten about are all signs that you don’t fully grasp the topic at hand.
This is a simple way of identifying the boundaries and holes in your understanding.
Plug those gaps
Focus on closing the gaps in your knowledge; research the material, experiment if you can, break down definitions into simpler terms — anything to help your understanding.
As you learn more about the topic, update your original list.
Organise and simplify
Organise your (now updated) set of notes into a story that you can tell from start to finish, avoiding jargon.
Read the narrative out loud (or explain it to someone again) as a sense-check. If it’s confusing at any point, take a step back and plug the gap.
Lather, rinse, and repeat until you have a story that you can tell to anyone who will listen.
Neural Networks
Here it is — my running understanding of neural nets.
A quick note on formatting: I’ll keep my original thoughts in normal text, and capture updates and comments in italics.
Neural networks
A type of model, designed to imitate a human brain. There are many types of neural networks, including:
- Feed-forward networks, where information flows through the model in one (forward) direction.
- Convolutional neural networks (CNN):
- Recurrent neural networks (RNN)
- Long short-term memory networks (LSTM)
Nodes and layers
Just like human brains are formed of vast numbers of connected neurons, neural networks are formed of connected nodes.
The nodes are organised into various layers.
The input layer is the first layer that the data is fed into.
The output layer is the final layer of the network and contains the almost-final outcome of the model.
The layers in between the input and output layer are called hidden layers.
Weights
Each node in the network has something called a weight. Each weight is a number and indicates how “excited” (or “activated”) the node becomes when exposed to the data.
For instance, a large positive weight might mean a positive response and a negative weight might mean a negative response.
Biases
Like weights, each node in the network has a bias which works together with the weight.
That’s as much as I know about biases.
Activation function
The output layer produces the “almost-final” model results. These results flow into an activation function which produces the “final” model results.
An activation function is an R→ R mapping.
Loss function
Similar to other machine learning models, a neural network has a loss function.
This is a function which computes how accurate the model is. Large loss values indicate poor “accuracy” (and vice-versa) and so we try to minimise the outcome of the loss function.
If we think of model error as a “cost”, then we can think of a loss function as a cost function and the idea of minimising loss is more intuitive.
Optimisers
Some sort of mathematical and computer science magic, optimisers work out the set of weights and biases which generate the lowest loss function output.
I’ve heard of the Adam optimiser which is apparently highly thought-of.
Data scaling
Data that is input into the model needs to be numeric (common to most machine learning models).
Apparently, neural networks work better when the data is scaled to lie in the range [0,1] or [-1,1].
Tensorflow, Keras and Pytorch
Tools that are widely used to build neural networks.
Tensorflow is a tool built by Google and can be used either directly or through the Keras interface. Apparently the syntax of Tensorflow is difficult to wrap your head around, especially since it changed significantly from one release to another.
Built by Facebook, Pytorch is another deep-learning tool. Apparently its syntax and approach is much easier to learn than Tensorflow but it is not as suited to large-scale productionisation.
I’m not really concerned about productionising my experimental models, so leaning towards the easier syntax of Pytorch at the moment.
Other terms I’ve heard of
… but have no idea about:
- Initialisation
- Learning rate (similar to the learning rate in GBMs?)
- Batch size
- Epoch
- Decay