WHAT ARE LOSS FUNCTIONS? • A loss function measures how good a neural network model is in performing a certain task, which in most cases is regression or classification. • 3 Key Types of Loss Functions in Neural Networks • Mean Squared Error Loss Function • Cross-Entropy Loss Function • Mean Absolute Percentage Error • MEAN SQUARED ERROR LOSS FUNCTION • Mean squared error (MSE) loss function is the sum of squared differences between the entries in the prediction vector y and the ground truth vector y^. CROSS-ENTROPY LOSS FUNCTION • Regression is only one of two areas where feedforward networks enjoy great popularity. The other area is classification. • In classification tasks, we deal with predictions of probabilities, which means the output of a neural network must be in a range between zero and one. • A loss function that can measure the error between a predicted probability and the label which represents the actual class is called the cross-entropy loss function. • Mean Absolute Percentage Error (MAPE)
• The mean absolute percentage error, also known as mean absolute percentage deviation (MAPD) usually expresses accuracy as a percentage. We define it with the following equation: What is Backpropagation?
• Backpropagation is the essence of neural network training.
• It is the method of fine-tuning the weights of a neural network based on the error rate obtained in the previous epoch (i.e., iteration). • Proper tuning of the weights allows you to reduce error rates and make the model reliable by increasing its generalization. • Backpropagation in neural network is a short form for “backward propagation of errors.” • It is a standard method of training artificial neural networks. • This method helps calculate the gradient of a loss function with respect to all the weights in the network. Steps 1. Inputs X, arrive through the preconnected path 2. Input is modeled using real weights W. The weights are usually randomly selected. 3. Calculate the output for every neuron from the input layer, to the hidden layers, to the output layer. 4. Calculate the error in the outputs ErrorB= Actual Output – Desired Output 5. Travel back from the output layer to the hidden layer to adjust the weights such that the error is decreased. Keep repeating the process until the desired output is achieved Example • Most prominent advantages of Backpropagation are: • Backpropagation is fast, simple and easy to program • It has no parameters to tune apart from the numbers of input • It is a flexible method as it does not require prior knowledge about the network • It is a standard method that generally works well • It does not need any special mention of the features of the function to be learned. Introduction to Gradient Descent • Gradient descent is an optimization algorithm that’s used when training a machine learning model. It’s based on a convex function and tweaks its parameters iteratively to minimize a given function to its local minimum. • Gradient Descent is an optimization algorithm for finding a local minimum of a differentiable function. Gradient descent in machine learning is simply used to find the values of a function's parameters (coefficients) that minimize a cost function as far as possible. • "A gradient measures how much the output of a function changes if you change the inputs a little bit." -Lex Fridman (MIT) • A gradient simply measures the change in all weights with regard to the change in error. Gradient Descent • Gradient Descent is defined as one of the most commonly used iterative optimization algorithms of machine learning to train the machine learning and deep learning models. It helps in finding the local minimum of a function. • The best way to define the local minimum or local maximum of a function using gradient descent is as follows: • If we move towards a negative gradient or away from the gradient of the function at the current point, it will give the local minimum of that function. • Whenever we move To achieve towards this goal,a positive gradienttwo it performs or towards stepsthe gradient of the function at the current point, we will get the local maximum of that function. iteratively: •Calculates the first-order derivative of the function to compute the gradient or slope of that function. •Move away from the direction of the gradient, which means slope increased from the current point by alpha times, where Alpha is defined as Learning Rate. It is a tuning parameter in the optimization process which helps to decide the length of the steps. Cost-function andisLearning • The cost function rate of difference or error defined as the measurement between actual values and expected values at the current position and present in the form of a single real number. It helps to increase and improve machine learning efficiency by providing feedback to this model so that it can minimize error • Learning Rate: • It is defined as the step size taken to reach the minimum or lowest point.