16 views

Uploaded by Yog Sothoth

Deep Learning Cheat Sheet-Hacker Noon

- 3_Low-Cost Capacitive Humidity Sensor for Application Within Flexible RFID Labels Based on Microcontroller System
- Artificial Neural Networks.ppt
- Backpropagation Lecture
- 4_WSEAS_NN08
- Web Services Discovery and Recommendation Based on Information Extraction and Symbolic Reputation
- 1701.07274
- Chapter 6 E-ANN
- Civil-Applications of Artificial Neural Networks in Civil Engineering
- TEORIA.pdf
- Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500 (2016) - Krauss, Do, Huck.pdf
- Bits f312 Nnfl Tutorial 6
- Machine Learning to Communication System
- Twitter Emotion Analysis
- Machine Learning With Spark
- Cancer diagnostic through neural network
- COMPARISON OF SPATIAL INTERPOLATION METHODS AND MULTI-LAYER NEURAL NETWORKS FOR DIFFERENT POINT DISTRIBUTIONS ON A DIGITAL ELEVATION MODEL
- Forex Case Study
- 2.pdf
- Technical Program Version 1 Last Updated 5 Nov. 2017
- ppt

You are on page 1of 7

**Camron Godbout Follow
**

Deep Learning and NLP enthusiast

Dec 6, 2016 · 5 min read

Deep Learning Cheat Sheet

**Deep Learning can be overwhelming when new to the subject. Here
**

are some cheats and tips to get you through it.

What’re Those?!?!

In this article we will go over common concepts found in Deep

Learning to help get started on this amazing subject.

Visualization of gradient. The red arrows are the gradient of the blue plotted function.

Gradient ∇ (Nabla)

The gradient is the partial derivative of a function that takes in

multiple vectors and outputs a single value (i.e. our cost functions in

Neural Networks). The gradient tells us which direction to go on the

graph to increase our output if we increase our variable input. We use

https://hackernoon.com/deep-learning-cheat-sheet-25421411e460 1/7

com/deep-learning-cheat-sheet-25421411e460 2/7 .2017-5-24 Deep Learning Cheat Sheet – Hacker Noon the gradient and go in the opposite direction since we want to decrease our loss. the s is sigma in greek. This is used by applying the chain rule in calculus. Tanh function https://hackernoon. The ReLU do not suﬀer from the vanishing gradient problem. 1]. while the ReLU has a range from [0. Back Propagation Also known as back prop. This function graphed out looks like an ‘S’ which is where this function gets is name. Sigmoid σ A function used to activate weights in our network in the interval of [0. Also known as the logistic function Formula for ReLU from Geo rey Hinton Recti ed Linear Units or ReLU The sigmoid function has an interval of [0. This means that the sigmoid is better for logistic regression and the ReLU is better at representing positive numbers.1]. inﬁnity]. this is the process of back tracking errors through the weights of the network after forward propagating inputs through the network.

L1 can yield sparse models while L2 cannot. Softmax Softmax is a function usually used at the end of a Neural Network for classiﬁcation. LSTM/GRU Typically found in Recurrent Neural Networks but are expanding to use in others these are little “memory units” that keep state between inputs for training and help solve the vanishing gradient problem where after around 7 time steps an RNN loses context of the input prior. 1]. the derivatives are higher. This also avoids bias in the gradients. To see this. L1 and L2 Regularization These regularization methods prevent overﬁtting by imposing a penalty on the coeﬃcients.1].1] and that of the sigmoid function is [0. This function does a multinomial logistic regression and is generally used for multi class classiﬁcation. Assuming your data is normalized we will have stronger gradients: since data is centered around 0.The range of the tanh function is [-1. This is important because it allows your model to generalize better and not overﬁt to the training data.com/deep-learning-cheat-sheet-25421411e460 3/7 . Regularization is used to specify model complexity.2017-5-24 Deep Learning Cheat Sheet – Hacker Noon Tanh Tanh is a function used to initialize the weights of your network of [-1. calculate the derivative of the tanh function and notice that input values are in the range [0. Usually paired with cross entropy as the loss function.1]. https://hackernoon.

The goal of a network is to minimize the loss to maximize the accuracy of the network. cost function or opimization score function. Examples of these functions are f1/f score. hinge loss… etc. A more technical explanation [3]: The loss/cost/optimization/objective function is the function that is computed on the prediction of your network. https://hackernoon. Objective Functions Also known as loss function. mean squared error. Batch Normalization solves this problem by normalizing each batch into the network by both mean and variance.com/deep-learning-cheat-sheet-25421411e460 4/7 . y_hat which we will compare to the desired output of y. Your network will output a prediction. The shift is “the change in the distribution of network activations due to the change in network parameters during training.2017-5-24 Deep Learning Cheat Sheet – Hacker Noon Drop out [1] “It prevents overﬁtting and provides a way of approximately combining exponentially many diﬀerent neural network architectures eﬃciently“(Hinton). This is usually determined by picking a layer percentage dropout. If we can reduce internal covariate shift we can train faster and better. This function then is used in back propagation to give us our gradient to allow our network to be optimized. This method randomly picks visible and hidden units to drop from the network. categorical cross entropy. Batch Normalization [1] When networks have many deep layers there becomes an issue of internal covariate shift.” (Szegedy). mean absolute error.

The learning rate is a hyper parameter that will be diﬀerent for a variety of problems. Learning Rate [2] The learning rate is the magnitude at which you’re adjusting your weights of the network during optimization after back propagation.com/deep-learning-cheat-sheet-25421411e460 5/7 . This is used in multi class classiﬁcation to ﬁnd the error in the predicition. https://hackernoon.2017-5-24 Deep Learning Cheat Sheet – Hacker Noon F1/F Score A measure of how accurate a model is by using precision and recall following a formula of: F1 = 2 * (Precision * Recall) / (Precision + Recall) Precise: of every prediction which ones are actually positive? Precision = True Positives / (True Positives + False Positives) Recall: of all that actually have positive predicitions what fraction actually were positive? Recall = True Positives / (True Positives + False Negatives) Cross Entropy Used to calculate how far oﬀ your label prediction is. This should be cross validated on. Cross entropy is a loss function is related to the entropy of thermodynamics concept of entropy. Some times denoted by CE.

[1] * Suggested by: InﬂationAaron [2] * Suggested by: Juan De Dios Santos [3] * Suggested by: Brandon Fryslie To take a look at di erent types of models and sample code check out TensorFlow in a Nutshell—Part Three: All the Models.com/deep-learning-cheat-sheet-25421411e460 6/7 .Vanilla Gradient Descent ---- # W is your matrix of weights # dW is the gradient of W # Adjust the weights of our layer by multiplying our gradient times our learning_rate (Which is a scalar value) W -= learning_rate * dW This is a living document.xyz https://hackernoon. if anything you think should be added let me know and I will add and credit you.2017-5-24 Deep Learning Cheat Sheet – Hacker Noon # ---. As always check out my other articles at camron.

com/deep-learning-cheat-sheet-25421411e460 7/7 .2017-5-24 Deep Learning Cheat Sheet – Hacker Noon https://hackernoon.

- 3_Low-Cost Capacitive Humidity Sensor for Application Within Flexible RFID Labels Based on Microcontroller SystemUploaded byTiger Gangappa Angadi
- Artificial Neural Networks.pptUploaded byTadele Basazenew
- Backpropagation LectureUploaded byBoonKhaiYeoh
- 4_WSEAS_NN08Uploaded byBogdan Popa
- Web Services Discovery and Recommendation Based on Information Extraction and Symbolic ReputationUploaded byijwsc
- 1701.07274Uploaded byDiego Alejandro Gomez Mosquera
- Chapter 6 E-ANNUploaded byHafisIzran
- Civil-Applications of Artificial Neural Networks in Civil EngineeringUploaded bysateeshsingh
- TEORIA.pdfUploaded byKevin Andres Amaya Ochoa
- Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500 (2016) - Krauss, Do, Huck.pdfUploaded byBartoszSowul
- Bits f312 Nnfl Tutorial 6Uploaded byhimal
- Machine Learning to Communication SystemUploaded by冠廷李
- Twitter Emotion AnalysisUploaded byRaj Jha
- Machine Learning With SparkUploaded byGeorgeSidiras
- Cancer diagnostic through neural networkUploaded byalepadua91
- COMPARISON OF SPATIAL INTERPOLATION METHODS AND MULTI-LAYER NEURAL NETWORKS FOR DIFFERENT POINT DISTRIBUTIONS ON A DIGITAL ELEVATION MODELUploaded bycoolboy_usama
- Forex Case StudyUploaded bySri Ram Sahoo
- 2.pdfUploaded byKomal Venugopal
- Technical Program Version 1 Last Updated 5 Nov. 2017Uploaded byMichael
- pptUploaded byAravind Desilva Saladi

- Numpy_Python_Cheat_Sheet.pdfUploaded byAshish Sharma
- R cheatsheet Data WranglingUploaded byarekbee
- UTILIZAÇÃO DE REDES EM CÁPSULAS PARA CLASSIFICAÇÃO DE REGIÕES PROMOTORAS EM SEQUÊNCIAS DE DNAUploaded byYog Sothoth
- Rede de Convolução para Sistema Biométrico baseado em EEGUploaded byYog Sothoth
- Ortho MclUploaded byYog Sothoth
- Ortho MclUploaded byYog Sothoth
- City Builder 01 - CommunitiesUploaded byskypalae
- Ua Trailrun Half Marathon Training Plan Nt v4Uploaded byYog Sothoth
- Cyber Dragon Fast Play BetaUploaded byYog Sothoth
- ggplot2-cheatsheetUploaded bywroin1
- Um Estudo SobreUploaded byYog Sothoth
- Ua Trailrun 10k Training Plan Nt v4Uploaded byYog Sothoth
- Perilous AlmanacsUploaded byYog Sothoth
- Perilous WildsUploaded byLumpy Happytree
- Petty_Gods_Revised_&_Expanded_Edition.pdfUploaded bytempman
- TotGaD Compendium.pdfUploaded byRavenCrow
- LotFP VotE Character SheetUploaded byTim Randles
- The Sprawl.pdfUploaded byWafflePoets
- Old School HackUploaded byJosha Petronis-Akins
- The_Problems_Young_Learners_Encounter_During_Liste.pdfUploaded byYog Sothoth
- The_Problems_Young_Learners_Encounter_During_Liste.pdfUploaded byYog Sothoth

- Performance Analysis of Dissimilar Classification Methods Using RapidMinerUploaded byhaque nawaz
- Using Supervised Machine Learning to Linguistically Date Biblical TextsUploaded byToviah Moldwin
- dengue-fever-prediction-a-data-mining-problem-2153-0602-1000181.pdfUploaded bymathi0000
- 3-1-PBUploaded bySatria Surya Wijaya
- Faceted SearchUploaded bysgsfak
- EEGBasedBCIArtUploaded byJunaid Ahmed
- Unit 1Uploaded by7killers4u
- A Deep Learning Approach to Network Intrusion Detection - IEEE TETCI v2n1 201802 - Shone, Ngoc, Phai, ShiUploaded bykurtlingel
- Unstyle: A Tool for Evading Authorship AttributionUploaded byAlden Page
- Churn PredictionUploaded byShayan Ali Shah
- Innovation in Medicine and HealthcareUploaded byRicardo Castillo
- BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND SOCIAL MEDIA DATAUploaded byijscai
- Svenonius.pdfUploaded bysrdiego
- Sintelix Software is Accurate For Text Mining SoftwareUploaded byknowledgeablefo73
- An Ensemble Data Preprocessing Approach for Intrusion Detection System Using variant Firefly and Bk-NN TechniquesUploaded byMia Amalia
- malwaredbUploaded bybuzzganesh
- lecture11_4upUploaded byMaslimona Ritonga TechCelup
- IntroductionToWekaUploaded bysandyguru05
- Biran 2011, Identifying justification.pdfUploaded byIvan Trifunjagić
- Semantic HashingUploaded bygheorghe gardu
- Lecture 2-Boolean Retrieval.pptUploaded byYash Gupta
- A NOVEL METHOD OF PLAGIARISM DETECTION FOR MUSIC MELODIESUploaded byAdam Hansen
- Information Retrieval Systems in Academic LibrariesUploaded byLucia Cavorsi
- MAXIMAL MARGINAL RELEVANCE BASED MALAYALAM TEXT SUMMARIZATION WITH SUCCESSIVE THRESHOLDSUploaded byJames Moreno
- Georgetown Business Spring/Summer 1998Uploaded bytmm53
- Email ForensicsUploaded byOchieng Christopher Odhiambo
- Free Software for Research in Information Retrieva and Textual ClusteringUploaded byItzel VF
- Iris Recognition Using Modified Local Line Directional PatternUploaded byATS
- Paper 17-Retrieval of Images Using DCT and DCT Wavelet Over Image BlocksUploaded byEditor IJACSA
- Weka vs MOAUploaded byVyankteshKshirsagar