You are on page 1of 2

Mean is the average of a collection of values.

Standard deviation represents the magnitude of how far the data points are from the mean. A low
value of standard deviation is an indication of the data being close to the mean, and a high value
indicates that the data is spread to extreme ends, far away from the mean.

t-test - Statistical method for the comparison of the mean of the two groups of the normally
distributed sample(s)

Chi-square test - A statistical method is used to find the difference or correlation between the
observed and expected categorical variables in the dataset.

Hypothesis testing is a form of statistical inference that uses data from a sample to draw conclusions
about a population parameter or a population probability distribution.

Null Hypothesis - It states that the population parameter is equal to the assumed value

Alternate Hypothesis - It states that population parameters are equal or different to the assumed
value

Gradient Descent is an optimisation algorithm used to find the value of the parameters of a function
that minimizes the cost function.

An outlier is a value in the data set that is extremely distinct from most of the other values.

Sampling - It is a process of selecting a group of observations from the population, to study the
characteristics of the data to make conclusions about the population.

Sampling Error - Errors which occur during the sampling process are known as Sampling Errors

Regression Analysis - It is a statistical method to model the relationship between a dependent


(target) variable and independent (one or more) variables.

Linear Regression - Gives the linear relationship between dependent and independent variables. It is
used in OLS(Ordinary Least Square) Method

Logistic Regression - It is used to find the relationships that exist between a dependent binary
variable and one or more independent variables by employing a logistic regression equation. Used
when we must compute the probability of mutually exclusive occurrence such as True/False, Yes/No
etc

Normal Distribution is a probability distribution that is symmetric about the mean. It is also known
as Gaussian Distribution.

Big data has 3 major components – volume (size of data), velocity (inflow of data) and variety (types
of data)

Analysis of Variance (ANOVA) is a statistical formula used to compare variances across the means
(or average) of different groups.
Predictive Analytics

Mean is the average of a collection of values

Standard deviation is used to measure the spread of data around the mean

RMSE is used to measure distance between predicted and actual values.

In statistics, the mean percentage error (MPE) is the computed average of percentage errors by
which forecasts of a model differ from actual values of the quantity being forecast.

Naïve forecasting is a quantitative forecasting method used to predict demand.

Trend (Linear) Regression analysis uses an equation to analyze the relationship between two or
more quantitative variables in order to predict one from the other(s). Linear Regression measures
the relationship between two variables: X and Y. X is the independent variable and Y is the
dependent variable.

Holt's model uses two parameters, one for the overall smoothing and the other for the trend
smoothing equation. The method is also called double exponential smoothing or trend-enhanced
exponential smoothing.

You might also like