0 views

Uploaded by abolfazl

- Evaluation of Venture Capital Based on Evaluation Model
- MCQs on Correlation and Regression Analysis
- All LectureNotes
- Presentation on Pairs of Linear Equations
- 08_chapter 3.pdf
- Karan Patil
- 20110715123735-INFRASTRUCTURAL-MANAGEMENT-2011-2012
- Simple Regression B
- UploadedFile_129767389476505000
- Lecture Notes 7-10
- Stat a Tutorial
- Syllabus.docx
- 18508550 maths 2a assessment1 final
- HomeWork 01
- Machine Learning Notes.pdf
- SPPTChap005
- 18614765
- Regression.pdf
- Conference Paper NCECT'12
- Lab 5

You are on page 1of 8

Gradient descent is one of those “greatest hits” algorithms that can o er a new

perspective for solving problems. Unfortunately, it’s rarely taught in undergraduate

computer science programs. In this post I’ll give an introduction to the gradient descent

algorithm, and walk through an example that demonstrates how gradient descent can be

used to solve machine learning problems such as linear regression.

function de ned by a set of parameters, gradient descent starts with an initial set of

parameter values and iteratively moves toward a set of parameter values that minimize

the function. This iterative minimization is achieved using calculus, taking steps in the

negative direction of the function gradient.

It’s sometimes di cult to see how this mathematical explanation translates into a

practical setting, so it’s helpful to look at an example. The canonical example when

explaining gradient descent is linear regression.

Simply stated, the goal of linear regression is to t a line to a set of points. Consider the

following data.

https://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/ 1/8

10/14/2018 An Introduction to Gradient Descent and Linear Regression

Let’s suppose we want to model the above set of points with a line. To do this we’ll use the

standard y = mx + b line equation where m is the line’s slope and b is the line’s y-intercept.

To nd the best line for our data, we need to nd the best set of slope m and y-intercept b

values.

called a cost function) that measures how “good” a given line is. This function will take in

a (m,b) pair and return an error value based on how well the line ts our data. To compute

this error for a given line, we’ll iterate through each (x,y) point in our data set and sum

the square distances between each point’s y value and the candidate line’s y value

(computed at mx + b). It’s conventional to square this distance to ensure that it is positive

and to make our error function di erentiable. In python, computing the error for a given

line will look like:

PYTHON

1 # y = mx + b

2 # m is slope, b is y-intercept

3 def computeErrorForLineGivenPoints(b, m, points):

4 totalError = 0

5 for i in range(0, len(points)):

6 totalError += (points[i].y - (m * points[i].x + b)) ** 2

7 return totalError / float(len(points))

https://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/ 2/8

10/14/2018 An Introduction to Gradient Descent and Linear Regression

Lines that t our data better (where better is de ned by our error function) will result in

lower error values. If we minimize this function, we will get the best line for our data.

Since our error function consists of two parameters (m and b) we can visualize it as a two-

dimensional surface. This is what it looks like for our data set:

Each point in this two-dimensional space represents a line. The height of the function at

each point is the error value for that line. You can see that some lines yield smaller error

values than others (i.e., t our data better). When we run gradient descent search, we will

start from some location on this surface and move downhill to nd the line with the

lowest error.

To run gradient descent on this error function, we rst need to compute its gradient. The

gradient will act like a compass and always point us downhill. To compute it, we will need

to di erentiate our error function. Since our function is de ned by two parameters (m and

b), we will need to compute a partial derivative for each. These derivatives work out to be:

https://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/ 3/8

10/14/2018 An Introduction to Gradient Descent and Linear Regression

We now have all the tools needed to run gradient descent. We can initialize our search to

start at any pair of m and b values (i.e., any line) and let the gradient descent algorithm

march downhill on our error function towards the best line. Each iteration will update m

and b to a line that yields slightly lower error than the previous iteration. The direction to

move in for each iteration is calculated using the two partial derivatives from above and

looks like this:

PYTHON

2 b_gradient = 0

3 m_gradient = 0

4 N = float(len(points))

5 for i in range(0, len(points)):

6 b_gradient += -(2/N) * (points[i].y - ((m_current*points[i].x) + b_current))

7 m_gradient += -(2/N) * points[i].x * (points[i].y - ((m_current * points[i].x) + b_current))

8 new_b = b_current - (learningRate * b_gradient)

9 new_m = m_current - (learningRate * m_gradient)

10 return [new_b, new_m]

The learningRate variable controls how large of a step we take downhill during each

iteration. If we take too large of a step, we may step over the minimum. However, if we

take small steps, it will require many iterations to arrive at the minimum.

Below are some snapshots of gradient descent running for 2000 iterations for our

example problem. We start out at point m = -1 b = 0. Each iteration m and b are updated to

values that yield slightly lower error than the previous iteration. The left plot displays the

current location of the gradient descent search (blue dot) and the path taken to get there

(black line). The right plot displays the corresponding line for the current search location.

Eventually we ended up with a pretty accurate t.

https://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/ 4/8

10/14/2018 An Introduction to Gradient Descent and Linear Regression

https://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/ 5/8

10/14/2018 An Introduction to Gradient Descent and Linear Regression

We can also observe how the error changes as we move toward the minimum. A good way

to ensure that gradient descent is working correctly is to make sure that the error

decreases for each iteration. Below is a plot of error values for the rst 100 iterations of

the above gradient search.

We’ve now seen how gradient descent can be applied to solve a linear regression problem.

While the model in our example was a line, the concept of minimizing a cost function to

https://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/ 6/8

10/14/2018 An Introduction to Gradient Descent and Linear Regression

tune parameters also applies to regression problems that use higher order polynomials

and other problems found around the machine learning world.

While we were able to scratch the surface for learning gradient descent, there are several

additional concepts that are good to be aware of that we weren’t able to discuss. A few of

these include:

Convexity – In our linear regression problem, there was only one minimum. Our error

surface was convex. Regardless of where we started, we would eventually arrive at the

absolute minimum. In general, this need not be the case. It’s possible to have a

problem with local minima that a gradient search can get stuck in. There are several

approaches to mitigate this (e.g., stochastic gradient search).

Performance – We used vanilla gradient descent with a learning rate of 0.0005 in the

above example, and ran it for 2000 iterations. There are approaches such a line

search, that can reduce the number of iterations required. For the above example, line

search reduces the number of iterations to arrive at a reasonable solution from several

thousand to around 50.

Convergence – We didn’t talk about how to determine when the search nds a

solution. This is typically done by looking for small changes in error iteration-to-

iteration (e.g., where the gradient is near zero).

For more information about gradient descent, linear regression, and other machine

learning topics, I would strongly recommend Andrew Ng’s machine learning course on

Coursera.

Example Code

Example code for the problem described above can be found here

Edit: I chose to use linear regression example above for simplicity. We used gradient descent to

iteratively estimate m and b, however we could have also solved for them directly. My intention

was to illustrate how gradient descent can be used to iteratively estimate/tune parameters, as

this is required for many di erent problems in machine learning.

https://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/ 7/8

10/14/2018 An Introduction to Gradient Descent and Linear Regression

Related Posts

Crossing Numbers of PostgreSQL Materialized – An Apples & Oranges

Graphs Views Analogy

by Kory Dondzila by Matt Nedrich by Tom Liao

https://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/ 8/8

- Evaluation of Venture Capital Based on Evaluation ModelUploaded bykamer4u
- MCQs on Correlation and Regression AnalysisUploaded byMuhammad Imdadullah
- All LectureNotesUploaded bySrinivas Nemani
- Presentation on Pairs of Linear EquationsUploaded byAbhishek Kumar Chambel
- 08_chapter 3.pdfUploaded bySundeep Narwani
- Karan PatilUploaded bykaran patil
- 20110715123735-INFRASTRUCTURAL-MANAGEMENT-2011-2012Uploaded byPuneet Sood
- Simple Regression BUploaded byKimyMalaya
- UploadedFile_129767389476505000Uploaded byVishalParekh
- Lecture Notes 7-10Uploaded byWylie
- Stat a TutorialUploaded byvidar
- Syllabus.docxUploaded byEduardo David Pérez
- 18508550 maths 2a assessment1 finalUploaded byapi-379704781
- HomeWork 01Uploaded bySyed Muhammed Alee Kazmi
- Machine Learning Notes.pdfUploaded byAli Yaqoob
- SPPTChap005Uploaded byimran_chaudhry
- 18614765Uploaded byKhurram Shahzad
- Regression.pdfUploaded bywolfretonmaths
- Conference Paper NCECT'12Uploaded byprasanna_jaya
- Lab 5Uploaded byazharmir1
- Lecture Costs Ch03Uploaded byBee L
- Practica de TummoUploaded byhmoralesga
- 6300__HW05_OpenChannelConstrictionCalibrationsUploaded byFatima Al-Doski
- Jeynes_W_HUploaded byChin Chan
- 10.1.1.48.651Uploaded byPedro Junior
- UT Dallas Syllabus for econ3304.001.11f taught by Amy Speer (aes092020)Uploaded byUT Dallas Provost's Technology Group
- Personal NetworkUploaded byuday1902
- So_You_Think_Youre_An_Estimator.pdfUploaded bydanish_1985
- Controlling and Optimizing Wafer Yield FinalUploaded byPaul Kabiru
- Activity 1Uploaded byThirdy Rodje Nocom

- Process Dynamics and Control SolutionsUploaded byciotti6209
- Session 11Uploaded byabolfazl
- Session 7Uploaded byabolfazl
- Session 8Uploaded byabolfazl
- Session 9Uploaded byabolfazl
- session5,6Uploaded byabolfazl
- Session 24Uploaded byabolfazl
- session 1,2Uploaded byabolfazl
- session 1,2.pdfUploaded byabolfazl
- An Introduction to Gradient Descent and Linear Regression.pdfUploaded byabolfazl

- APE How to Drive Steel Sheet Piles (1).pdfUploaded byArianne
- 54_AF-1_CUploaded by2dsmart
- Motor Control and Protection.pdfUploaded byAfiq Taib
- Fundamentals of Computational Fluid Dynamics - Lomax, PulliamUploaded byJorge Chavez Quinto
- 149151267-LTE-CELL-Reselection-Procedure.docxUploaded byNak Sandy
- Modeling and Simulation of Integrated SVC and EAF using MATLAB & ETAPUploaded byTarek El-shennawy
- Wood CNC DatasheetUploaded byVimal Tiwari
- Max of Files Allowed in Directory Under UFS & Under VxFSUploaded byKathleen Cole
- 1627 Tank 38-Meskouris-Corinth V5Uploaded byJohn Cungkrink van Cingkrank
- 20 Dinamica Forca de Atrito.txtUploaded byEdnilson Souza Dos Santos
- Electricity and Magnetism Study GuideUploaded byKristen Pietrowski Liporto
- Internship Supervisor Evaluation FormUploaded bySupriya
- Chapter 2pr GENERAL IZED PERFORMANCE CHARACTERISTICS OF INSTRUMENTSUploaded byTesfa Tesfa
- N.pdfUploaded byRubens Junior
- Low Boy Ganzo DollyUploaded byDaniel Vargas Ribeiro
- 11.20.18-pd.docxUploaded byShernan Paul Claveria
- PAM Implementation Guide for Linux, Unix and SolarisUploaded byAshish Upadhyaya
- Ch-10-Steering and Front AxleUploaded bykeval patel
- Sit04 Mit MatlabUploaded byTran Van Thuc
- SAN&FCFundamentalsUploaded bysachinkwl
- 5.1.3.6 Packet Tracer - Configuring Router-on-a-Stick Inter-VLAN Routing Instructions IG-1.docxUploaded bysandra Torres
- W5Uploaded byRuth Wong
- Intelligent Assistant for LdUploaded bySagar Joshi
- PHYSICS-141-NOTEs.pdfUploaded byjake park
- dw basicsUploaded bycgamaran
- 7229-12373-1-PB.pdfUploaded byKamel Fedaoui
- 29006_BROCHURE ODE Bassa - Valv Para Maq CaféUploaded byjf2003
- HUAWEI Basic Handover ParameterUploaded byArif Maldini
- 61_20140612Uploaded byChung Chan
- IE4230_L2_AdvancedCharts.pdfUploaded byJeremy Loo Choon Boon