You are on page 1of 5

Statistical hypothesis: is a mathematical statement about the nature of a population.

It is
often stated in terms of a population parameter. There are two sorts of statistical hypotheses:
null hypotheses and alternative hypotheses. Even though both beliefs claim the test statistic’s
population value, they support different viewpoints. If we claim that there is neither a
difference nor a link between the two groups under study, it will lead to a null hypothesis
while evaluating a research hypothesis. An alternative view is produced if we claim that a
difference exists.

H0 is the symbol for the null hypothesis. In general, researchers aim to disprove this theory.
It is a declaration that there is no distinction, influence, or connection between the sample or
population’s mean or proportion. To put it another way, the difference is zero or null. It is
impossible to attest to the hypothesis, which is why it is expressed as the null hypothesis. To
test a null hypothesis, researchers choose samples, calculate results, and then decide if the
sample data provides strong support for rejecting or disproving the hypothesis.

Ha represents this as an alternative. It is implied that a specific alternative hypothesis is most


likely to be true if we have enough solid evidence to reject the null hypothesis.

Example;-

 A researcher thinks that if knee surgery patients go to physical therapy twice a week
(instead of 3 times), their recovery period will be longer. Average recovery times for
knee surgery patients are 8.2 weeks.

The hypothesis statement in this question is that the researcher believes the average recovery
time is more than 8.2 weeks. It can be written in mathematical terms as:

H1: μ > 8.2

Next, you’ll need to state the null hypothesis (See: How to state the null hypothesis). That’s
what will happen if the researcher is wrong. In the above example, if the researcher is wrong
then the recovery time is less than or equal to 8.2 weeks. In math, that’s:

H0 μ ≤ 8.2

Statistical Estimation

Estimation is a process in which we obtain the values of unknown population parameters with
the help of sample data. In other words, it is a data analysis framework that combines effect
sizes and confidence intervals to plan an experiment, analyze data, and interpret the results.
Furthermore, the basic purpose of estimating methods is to estimate the size of an effect and
report the effect size along with its confidence interval.

The estimator is a method, formula, or function that specifically tells how to compute an
estimate. In other words, to estimate the value of the population parameter, you can use
information from the sample as an estimator.
In addition to this Estimation in statistics helps companies, election officials, healthcare
professionals, scientists, mathematicians, etc. to determine a trend in data. In order to
determine this trend, measuring data using ''point'' or ''interval'' estimation helps the observer
view and assume certain conclusions. This can be done by using sample size data from a
whole observation. However, dependent on the type of data one is collecting, it could
possibly skew the data.

Think of an election year. If a sample was taken from one locality of the state that typically
has similar voting preferences, then the data will not demonstrate the population as a whole.
Thus, garnering and using sample size data needs to be observed across a wide range to be
able to see any trends.

For that reason, this type of estimation falls under inferential statistics. There are two broad
types of statistical interference, which are estimation and hypothesis testing. Each is unique
to a specific purpose for the type of study or observation being completed. Hypothesis testing
is where an analyst tests an assumption regarding a population parameter. It is used to assess
the validity or truth value of a hypothesis by using sample data, Whereas estimation is the
framework of data analysis to interpret results. It is considered to complement hypothesis
testing.

The main purpose of estimation in statistics is to be able to measure the behaviour of data
within a population. If parameters are too large to navigate and narrow down into a more
concrete topic, confidence intervals are utilized to compute most common confident data.
Confidence intervals are a range of unknown parameters, using percent values such as
confidence. There are two types of estimation that are used within statistics. They are point
estimates or interval estimates. Diving deeper into their similarities and differences will help
with the understanding of their uses.

Point Estimates

When hearing the word point, think of it in terms of geometry. It is a one-dimensional, exact
or fixed location within a coordinate plane. This concept can also be applied to point
estimates. A point estimate is a single sample value estimate of a parameter. The parameter
could be the population mean, population standard deviation, or population variance. A point
estimate will be computed as a definite numerical value. Most commonly, using a population
mean will be used when measuring employee absences, student exam scores, or monthly
profit margins.

Point estimates depending on the sample size of the population may not be the best indicator
to analyze data. Thus, tying in the use of confidence interval can help improve the validity of
point estimation.

Example, 62 is the average (x̅ ) mark achieved by a sample of 15 students randomly collected
from a class of 150 students, which is considered the mean mark of the entire class. Since it is
in the single numeric form, it is a point estimator.
The basic drawback of point estimates is that no information is available regarding their
reliability. In fact, the probability that a single sample statistic is equal to the population
parameter is very unlikely.

Point estimate or single number

Interval Estimates

A confidence interval estimate is a range of values constructed from sample data so that the
population parameter will likely occur within the range at a specified probability.
Accordingly, the specified probability is the level of confidence.

 Broader and probably more accurate than a point estimate


 Used with inferential statistics to develop a confidence interval – where we believe
with a certain degree of confidence that the population parameter lies.
 Any parameter estimate that is based on a sample statistic has some amount of
sampling error.

In statistics, interval estimation uses sample data to calculate an interval of possible


values of an unknown population parameter.

Example, let’s say you wanted to find out the average cigarette use of senior citizens. You
can’t survey every senior citizen on the planet (due to time constraints and finances), so you
take a sample of 1000 senior citizens and find that 10% of them smoke cigarettes. Although
you’ve only taken a sample, you can use that figure to estimate that “about” 10% of the
whole population smoke cigarettes. In reality, it’s unlikely to be exactly 10% (as you only
sampled a small percentage of people), but it’s probably somewhere around there, perhaps
between 5 and 15%. That “somewhere between 5 and 15%” is an interval estimate.

What is simple liner regression?

Simple linear regression is a statistical method for obtaining a formula to predict values of
one variable from another where there is a causal relationship between the two variables.

Simple linear regression is the most commonly used technique for determining how one
variable of interest (the response variable) is affected by changes in another variable (the
explanatory variable). The terms "response" and "explanatory" mean the same thing as
"dependent" and "independent", but the former terminology is preferred because the
"independent" variable may actually be interdependent with many other variables as well.

Simple linear regression is used for three main purposes:

1. To describe the linear dependence of one variable on another

2. To predict values of one variable from values of another, for which more data are available

3. To correct for the linear dependence of one variable on another, in order to clarify other
features of its variability.

Any line fitted through a cloud of data will deviate from each data point to greater or lesser
degree. The vertical distance between a data point and the fitted line is termed a "residual".
This distance is a measure of prediction error, in the sense that it is the discrepancy between
the actual value of the response variable and the value predicted by the line.

Linear regression determines the best-fit line through a scatterplot of data, such that the sum
of squared residuals is minimized; equivalently, it minimizes the error variance. The fit is
"best" in precisely that sense: the sum of squared errors is as small as possible. That is why it
is also termed "Ordinary Least Squares" regression.

Model assumptions

Using the formula Y = mX + b:

 The linear regression interpretation of the slope coefficient, m, is, "The estimated
change in Y for a 1-unit increase of X."
 The interpretation of the intercept parameter, b, is, "The estimated value of Y when X
equals 0."

Example; - The effect of deforestation on land degradation

Y = represent Degradation

mX= The estimated change in Degradation for a 1-unit increase of Deforestation."

b = The estimated value of degradation when deforestation equals with 0.

Assumptions of linear regression

If you're thinking simple linear regression may be appropriate for your project, first make
sure it meets the assumptions of linear regression listed below.

 Linear relationship;- a statistical term used to describe a straight-line relationship


between two variables

 Normally-distributed scatter;- the mean is zero and the standard deviation is 1. It has
zero skew and a kurtosis of 3. Normal distributions are symmetrical, but not all
symmetrical distributions are normal. Many naturally-occurring phenomena tend to
approximate the normal distribution

 Homoscedasticity;- an assumption of equal or similar variances in different groups


being compared. This is an important assumption of parametric statistical tests
because they are sensitive to any dis-similarities. Uneven variances in samples result
in biased and skewed test results.

 No uncertainty in predictors;- An Estimate is the average, or expected, y-value, given


a specific x-value. The uncertainty in. a regression Estimate is a Confidence Interval..

 Independent observations;- Two observations are independent if the occurrence of one


observation provides no information about the occurrence of the other observation.

 Variables (not components) are used for estimation;- a variable that does not itself
belong in the explanatory equation but is correlated with the endogenous explanatory
variables

You might also like