
Neural Networks mimic this model

Generalized formula for perceptrons


Input Layer

Output Layer : final estimate of the output
- There could be multiple output layers

Hidden Layer : the layers in the middle
- Deep Network : 2 or more hidden layers
- Might be difficult to interpret
z = x*w + b : basic formula for each perceptron
- w : how much weight or strength to give the incoming input
- b : an offset value, making x*w have to reach a certain threshold before having an effect
Activation Function (f(z) or X ) : sets boundaries to output values from the neuron
Step Function : Useful for classification
- a strong function - small changes aren't reflected

Sigmoid Function : moderate form of a step function
- more sensitive to small changes


max(0,z)



Activation functions for multiclass classification (Non-exclusive)

Activation functions for multiclass classification (Exclusive)
- Softmax Function : the target class chosen will have the highest probability


Notations
- ŷ : estimation of what the model predicts the label to be
- y : true value
- a : neuron's prediction
Cost Function
- must be an average so it can output a single value
- Used to keep track of our loss/cost during training to monitor network performance
Quadratic cost function
- aL is the prediction at L layer
- Why do we square it?

Generalization of cost function
- W is our neural network's weights, B is our neural network's biases, Sr is the input of a single training sample, and Er is the desired output of that training sample.

Gradient Descent : find the w values that minimizes the cost
- Learning Rate : how much you should move each time

Gradient : derivative for N-dimensional Vectors
∇C(w1,w2,...wn)
Cross Entropy Loss Function : for classification problems
- binary classfication : 
- multi class classification : 
w
x to set the activation function a for the input layer & repeat
