Neural Networks
Unit Structure
There are a set of input nodes, , corresponding weights , a bias, , and an output node. The output will be
Hidden Layers
For problems where a linear separator is insufficient, adding neurons in between the input layer and output layer which perform a logistic regression can result in more complex behavior. These have their own biases and combine at the output layer with their own weights.
To label the weights and inputs, we say the input is a column vector:
and the weights are in a matrix:
Loss Function
Given the output layer is defined by
the loss function will look similar to the binary cross entropy function, but more complicated.
We can then use gradient descent, as
Optimizing the Network
Activation functions
A sigmoid function is good, but far from , the gradient is quite small. Instead, another function we can use is a rectified linear unit (ReLU), which is 0 for , then a linear function for .
Softmax
If the output layer is meant to be a probability distribution, the nodes should sum to 1. To achieve this, we can use a softmax processing step:
This converts all the outputs into values in the range such that all the values sum to 1.
Cross-Entropy Loss
For a probability distribution, a good loss function would be the cross-entropy loss function: