0% found this document useful (0 votes)

58 views5 pages

Understanding Regression Analysis Techniques

This document discusses different types of regression analysis including linear regression, nonlinear regression, and logistic regression. It explains key concepts such as the regression function, R-squared, and how to handle nonlinear relationships. It also briefly discusses opportunities for applying regression techniques to big data.

Uploaded by

mohammad Sulaiman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views5 pages

Understanding Regression Analysis Techniques

Uploaded by

mohammad Sulaiman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/318141663

Regression

Chapter · May 2017

DOI: 10.1007/978-3-319-32001-4_174-1

CITATION READS

1 2,588

1 author:

Qinghua Yang
Texas Christian University
47 PUBLICATIONS 742 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Examining what characteristics predict the retransmission of cardiovascular Tweets View project

Scientific uncertainty of e-cigarettes View project

All content following this page was uploaded by Qinghua Yang on 19 October 2017.

The user has requested enhancement of the downloaded file.

Regression Linear Regression

Qinghua Yang The estimation target of regression is a function

Department of Communication Studies, Texas that predicts the dependent variable based upon
Christian University, Fort Worth, TX, USA values of the independent variables, which is
called the regression function. For simple linear
regressions, the function can be represented as
Regression is a statistical tool to estimate the yi = a + bxi + ei. The function of multiple lin-
relationship(s) between a dependent variable ear regressions is yi = b0 + b1x1 + b2x2 þ
(y or outcome variable) and one or more indepen- þ bkxk + ei where k is the number of independent
dent variables (x or predicting variables; Fox variables. The regression estimation using ordi-
2008). More specifically, regression analysis nary least squares (OLS) selects the line with the
helps in understanding the variation in a depen- lowest total sum of squared residuals. The propor-
dent variable using the variation in independent tion of total variation (SST) that is explained by
variables with other confounding variable(s) the regression (SSR) is known as the coefficient
controlled. Regression analysis is widely used to of determination, often referred to as R2, a value
make prediction and estimation of the conditional ranging between 0 and 1 with a higher value
expectation of the dependent variable given the indicating a better regression model (Keith 2015).
independent variables, where its use overlaps with
the field of machine learning. Figure 1 shows how
crime rate is related to residents’ poverty level and Nonlinear Regression
predicts the crime rate of a specific community.
We know from this regression that there is a In the real world, there are much more nonlinear
positive linear relationship between the crime functions than linear ones. For example, the rela-
rate (y axis) and residents’ poverty level (x axis). tionship between x and y can be fitted in a qua-
Given the poverty index of a specific community, dratic function shown in Figure 2. There are in
we are able to make a prediction of the crime rate general two ways to deal with nonlinear models.
at that area. First, nonlinear models can be approximated with
linear functions. Both nonlinear functions in
Figure 2 can be approximated by two linear func-
tions according to the slope: the first linear regres-
sion function is from the beginning of the
semester to the final exam, and the second
# Springer International Publishing AG 2017
L.A. Schintler, C.L. McNeely (eds.), Encyclopedia of Big Data,
DOI 10.1007/978-3-319-32001-4_174-1
2 Regression

25
crime

–25

–50

–1.00 –.50 –.00 .50 1.00 1.50

poverty_sqrt

Regression, Figure 1 Linear regression of crime rate and residents’ poverty level

function is from the final to the end of the semes- prediction of the outcome variable. In logistic
ter. Similarly, regarding cubic, quartic, and more regression, we predict the odds or log-odds
complicated regressions, they can also be approx- (logit) that a certain condition will or will not
imated with a sequence of linear functions. How- happen. Odds range from 0 to infinity and are a
ever, analyzing nonlinear models in this way can ratio of the chance of an event (p) divided by the
produce much residual and leave considerable chance of the event not happening, that is, p/
variance unexplained. The second way is consid- (1p). Log-odds (logits) are transformed odds,
ered better than the first one from this aspect, by ln[p/(1p)], and range from negative to positive
including nonlinear terms in the regression func- infinity. The relationship predicting probability
tion as ^y = a þ b1x þ b2x2. As the graph of a using x follows an S-shaped curve as shown in
quadratic function is a parabola, if b2 < 0, the Figure 3. The shape of curve above is called a
parabola opens downward, and if b2 > 0, the “logistic curve.” This is defined as
parabola opens upward. Instead of having x2 in expðb0 þb1 xi þei Þ
pð y i Þ ¼ . In this logistic regression,
the model, the nonlinearity can also be presented 1þexpðb0 þb1 xi þei Þ
pffiffiffi the value predicted by the equation is a log-odds
in many other ways, such as x, ln(x), sin(x),
cos(x), and so on. However, which nonlinear or logit. This means when we run logistic regres-
model to choose should be based on both theory sion and get coefficients, the values the equation
or former research and the R2. produces are logits. Odds is computed as exp
expðlogitÞ
(logit), and probability is computed as 1þexp ðlogitÞ.
Another model used to predict binary outcome is
the probit model, with the difference between
Logistic Regression logistic and probit models lying in the assumption
about the distribution of errors: while the logit
When the outcome variable is dichotomous (e.g., model assumes standard logistic distribution of
yes/no, success/failure, survived/died, accept/ errors, probit model assumes normal distribution
reject), logistic regression is applied to make
Regression 3

Regression, Anxiety
Figure 2 Nonlinear
regression models

Semester Mid-term Final Semester

begins ends

Confidence in
the Subject

Semester Mid-term Final Semester

begins ends

of errors (Chumney & Simpson 2006). Despite opportunities and challenges. Generally speaking,
the difference in assumption, the predictive results big data is a collection of large-scale and complex
using these two models are very similar. When the data sets that are difficult to be processed and
outcome variable has multiple categories, multi- analyzed using traditional data analytic tools.
nomial logistic regression or ordered logistic Inspired by the advent of machine learning and
regression should be implemented depending on other disciplines, statistical learning has
whether the dependent variable is nominal or emerged as a new subfield in statistics, including
ordinal. supervised and unsupervised statistical learn-
ing (James, Witten, Hastie, & Tibshirani, 2013).
Supervised statistical learning refers to a set of
approaches for estimating the function f based on
Regression in Big Data
the observed data points, to understand the rela-
tionship between Y and X = (X1, X2, . . . , XP),
Due to the advanced technologies that have been
which can be represented as Y = f(X) þ e. Since
increasingly used in data collection and the vast
the two main purposes for the estimation are to
amount of user-generated data, the amount of data
make prediction and inference, which regression
will continue to increase at a rapid pace, along
modeling is widely used for, many classical sta-
with a growing accumulation of scholarly works.
tistical learning methods use regression models,
The explosion of knowledge makes big data one
such as linear, nonlinear, and logistic regression,
of new research frontiers with an extensive num-
with the selection of specific regression model
ber of application areas affected by big data, such
based on research question and data structure. In
as public health, social science, finance, geogra-
contrast, for unsupervised statistical learning,
phy, and so on. The high volume and complex
there is no response variable to predict for every
structure of big data bring statisticians both
4 Regression

Regression,
Figure 3 Logistic 1.00
regression models

0.80

0.60

pass
0.40

0.20

0.00

0 2 4 6 8 10
X

observation that can supervise our analysis (James ▶ Statistics

et al. 2013). Additionally, more methods have
been recently developed, such as Bayesian and
Markov chain Monte Carlo (MCMC). Bayes-
Further Readings
ian approach, distinct from the frequentist
approach, treats model parameters as random Bandalos, D. L., & Leite, W. (2013). Use of Monte Carlo
and models them via distributions. MCMC is studies in structural equation modeling research. In
statistical sampling investigations that involve G. R. Hancock & R. O. Mueller (Eds.), Structural
sample data generation to obtain empirical sam- equation modeling: A second course (pp. 625-666).
Charlotte, NC: Information Age Publishing.
pling distributions based on constructing a Mar- Chumney, E. C., & Simpson, K. N. (2006). Methods and
kov chain that has the desired designs for outcomes research. Bethesda, MD: ASHP.
distribution (Bandalos & Leite 2013). Fox, J. (2008). Applied regression analysis and general-
ized linear models. Thousand Oaks, CA: Sage.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013).
An introduction to statistical learning (Vol. 6).
Cross-References New York, NY: Springer.
Keith, T. Z. (2015). Multiple regression and beyond: An
▶ Data Mining Algorithms introduction to multiple regression and structural
equation modeling. New York, NY: Routledge.
▶ Machine Learning
▶ Statistical Analysis

View publication stats

ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
7 pages
Linear Regression Models 2018
No ratings yet
Linear Regression Models 2018
68 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
EC501 Lecture 04
No ratings yet
EC501 Lecture 04
30 pages
Business Analytics & Machine Learning: Logistic and Poisson Regressions
No ratings yet
Business Analytics & Machine Learning: Logistic and Poisson Regressions
62 pages
Data Analytics Iii Unit
No ratings yet
Data Analytics Iii Unit
8 pages
DA&V Module 2 (SAMI)
No ratings yet
DA&V Module 2 (SAMI)
14 pages
Strongest Linear Regression Analysis
No ratings yet
Strongest Linear Regression Analysis
5 pages
Understanding Logistic Regression Techniques
No ratings yet
Understanding Logistic Regression Techniques
13 pages
ECON6001 F2021 Topic4
No ratings yet
ECON6001 F2021 Topic4
76 pages
Machine Learning Regression Techniques
No ratings yet
Machine Learning Regression Techniques
201 pages
Wa0003
No ratings yet
Wa0003
20 pages
Unit-2 ML
No ratings yet
Unit-2 ML
39 pages
Regression Models and Applications
100% (1)
Regression Models and Applications
30 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
37 pages
Logistic Regression in Biostatistics
No ratings yet
Logistic Regression in Biostatistics
19 pages
Da Unit-Iii
No ratings yet
Da Unit-Iii
14 pages
BFCAI BigDataAnalytics Lecture#5 2
No ratings yet
BFCAI BigDataAnalytics Lecture#5 2
69 pages
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
No ratings yet
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
12 pages
Aiml 5
No ratings yet
Aiml 5
43 pages
Understanding Blue Property Assumptions
No ratings yet
Understanding Blue Property Assumptions
27 pages
Supervised Learning: Regression Techniques
No ratings yet
Supervised Learning: Regression Techniques
39 pages
Understanding Regression Models Basics
No ratings yet
Understanding Regression Models Basics
14 pages
Understanding Regression Techniques
No ratings yet
Understanding Regression Techniques
31 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
20 pages
Notes Book
No ratings yet
Notes Book
39 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
14 pages
Least Squares Regression Techniques
No ratings yet
Least Squares Regression Techniques
44 pages
Non Linear Regression Models
No ratings yet
Non Linear Regression Models
20 pages
Intro to Statistical Learning
No ratings yet
Intro to Statistical Learning
9 pages
Introduction To Statistical Learning: With Applications in R
No ratings yet
Introduction To Statistical Learning: With Applications in R
13 pages
Sigmoid vs Softmax in Regression
No ratings yet
Sigmoid vs Softmax in Regression
41 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
12 pages
Understanding Logistic Regression in Biostatistics
No ratings yet
Understanding Logistic Regression in Biostatistics
32 pages
T3 Logistic Regression
No ratings yet
T3 Logistic Regression
53 pages
21csc305p ML Unit 2
No ratings yet
21csc305p ML Unit 2
115 pages
Understanding Regression Analysis Techniques
No ratings yet
Understanding Regression Analysis Techniques
43 pages
Understanding Regression Types and Applications
No ratings yet
Understanding Regression Types and Applications
14 pages
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
No ratings yet
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
16 pages
Logistic Regression Analysis Overview
No ratings yet
Logistic Regression Analysis Overview
32 pages
Regression Analysis Guide
No ratings yet
Regression Analysis Guide
13 pages
Curve Fitting Techniques Overview
No ratings yet
Curve Fitting Techniques Overview
44 pages
Fda Unit 5
No ratings yet
Fda Unit 5
20 pages
Regression Diagnostics Overview
100% (1)
Regression Diagnostics Overview
53 pages
HR Analytics with Logistic Regression
No ratings yet
HR Analytics with Logistic Regression
9 pages
Binary Classification Techniques
No ratings yet
Binary Classification Techniques
11 pages
Regression Modeling Strategies: With Applications To Linear Models, Logistic and Ordinal Regression, and Survival Analysis (Springer Series in Statistics) - ISBN 3319194240, 978-3319194240
100% (32)
Regression Modeling Strategies: With Applications To Linear Models, Logistic and Ordinal Regression, and Survival Analysis (Springer Series in Statistics) - ISBN 3319194240, 978-3319194240
23 pages
Curve Fitting Techniques Explained
No ratings yet
Curve Fitting Techniques Explained
14 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
Lecture 3
No ratings yet
Lecture 3
33 pages
Linear and Logistic Regression Overview
No ratings yet
Linear and Logistic Regression Overview
65 pages
Regression
No ratings yet
Regression
34 pages
Logistic Regression for Coupon Usage
100% (1)
Logistic Regression for Coupon Usage
56 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
UNIT 2 Machine Learning BCAI601BCDS062
No ratings yet
UNIT 2 Machine Learning BCAI601BCDS062
244 pages
Seu Ds610 Mod03
No ratings yet
Seu Ds610 Mod03
45 pages
Notification TISS Field Investigator and Other Posts
No ratings yet
Notification TISS Field Investigator and Other Posts
5 pages
TNPSC Group I & II Exam Overview
No ratings yet
TNPSC Group I & II Exam Overview
13 pages
Analyzing Author Attitudes in Texts
No ratings yet
Analyzing Author Attitudes in Texts
5 pages
The Power of Imagination in Wonderland
No ratings yet
The Power of Imagination in Wonderland
16 pages
Statistics and Numerical Methods Exam
No ratings yet
Statistics and Numerical Methods Exam
5 pages
Partial Differential Equations II: 2D Laplace Equation On 5x5 Grid
No ratings yet
Partial Differential Equations II: 2D Laplace Equation On 5x5 Grid
26 pages
Raven Railey - Google Search
No ratings yet
Raven Railey - Google Search
1 page
IIT JAM 2023 Solution
No ratings yet
IIT JAM 2023 Solution
33 pages
Inventory Descisions
No ratings yet
Inventory Descisions
2 pages
GGSN & CG Lebaran 2018 Project Details
No ratings yet
GGSN & CG Lebaran 2018 Project Details
9 pages
PAES223 - Sprinkler REV
No ratings yet
PAES223 - Sprinkler REV
2 pages
Philosophy in Education Quotes & Practices
83% (6)
Philosophy in Education Quotes & Practices
2 pages
Telematics Product Catalog 2021 February
No ratings yet
Telematics Product Catalog 2021 February
38 pages
Chapter 4 PDF
No ratings yet
Chapter 4 PDF
25 pages
8051 Assembly Language Experiments
No ratings yet
8051 Assembly Language Experiments
28 pages
Maslow's Self-Actualization Measures
No ratings yet
Maslow's Self-Actualization Measures
7 pages
Stages of Curriculum Development
No ratings yet
Stages of Curriculum Development
11 pages
Industrial Epoxy Flooring Solution
No ratings yet
Industrial Epoxy Flooring Solution
2 pages
Cultural and Values Orientations - Final
No ratings yet
Cultural and Values Orientations - Final
15 pages
SME 2023 Mining Bookstore Catalog
No ratings yet
SME 2023 Mining Bookstore Catalog
32 pages
Basic Subscale
No ratings yet
Basic Subscale
2 pages
Top End Overhaul Parts List.
No ratings yet
Top End Overhaul Parts List.
3 pages
Student Learning Outcomes SLO
100% (1)
Student Learning Outcomes SLO
3 pages
Vk20 Service Manual
No ratings yet
Vk20 Service Manual
60 pages
Local Government
No ratings yet
Local Government
3 pages
Azeotropic Distillation
No ratings yet
Azeotropic Distillation
26 pages
ZQ260G VFD User Manual
50% (2)
ZQ260G VFD User Manual
33 pages
Group Assignment 2: Business Statistics Analysis
No ratings yet
Group Assignment 2: Business Statistics Analysis
6 pages
Samsung Strategy for Market Leadership
No ratings yet
Samsung Strategy for Market Leadership
3 pages
Robert Smith: IT Manager/Network Administrator
No ratings yet
Robert Smith: IT Manager/Network Administrator
2 pages

Understanding Regression Analysis Techniques

Uploaded by

Understanding Regression Analysis Techniques

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Chapter · May 2017

Scientific uncertainty of e-cigarettes View project

The user has requested enhancement of the downloaded file.

Regression Linear Regression

Qinghua Yang The estimation target of regression is a function

–1.00 –.50 –.00 .50 1.00 1.50

Semester Mid-term Final Semester

Semester Mid-term Final Semester

observation that can supervise our analysis (James ▶ Statistics

View publication stats

You might also like