You are on page 1of 5

Introduction to Statistics

Statistics

Statistics is the science that deals with the collection, organization, summarization/presentation,
analysis, and interpretation of data from to assist in making more effective and reasonable
decisions. It is an indispensable tool in almost every field, from scientific research to business
decision-making.

According to Sir R. A Fisher, "The science of statistics is essentially a branch of applied


mathematics applied to observational data
American Heritage dictionary defines statistics as, “The mathematics of the collection,
organization and interpretation of numerical data especially the analysis of population
characteristics by inference from sampling.”

Classification of Statistics

Statistics can broadly be classified into two branches:


✔ Descriptive statistics
✔ Inferential statistics

Descriptive Statistics

Descriptive statistics are procedures used to summarize, organize, and present a data
set/observations in a meaningful way (e.g., tables, graphs, numerical summaries). It gives a sense
of center or location of the data, variability in the data and nature or shape of the distributions.

**In descriptive statistics we analyze the data without considering the theory of
probability. Inferential Statistics

This branch of statistics uses sample data to make estimates, decisions, predictions or other
generalization about the population.

** In inferential statistics we analyze the data based on the theory of probability.


Some basic terms

Population

A population is the entire collection of individuals, objects or units based on some common
characteristics.
For example, suppose we are interested in studying about the CGPA distribution of the students in
BRAC University in a given semester. Here, the collection of all students during the semester
constitutes the population.

However, in most cases, analyzing the entire population is impractical, costly, or

time-consuming. Sample

A small but representative part of the population is called sample.


For example, suppose we are interested in studying about the CGPA distribution of the students in
BRAC University in a given semester. Here, the collection of all students during the year
constitutes the population. Suppose we select students from CSE department only. Then the set of
the students from CSE department constitutes the sample.

Parameter

A parameter is a numerical measure that describes a characteristic of a population. It is a fixed,


unknown value that represents the entire population of interest.

For example, the average CGPA of all students in BRAC University (population mean), the
proportion of voters supporting a particular candidate in an election, or the standard deviation of
salaries for all employees in a company (population variance) etc.

Statistic

A statistic is a numerical measure that describes a characteristic of a sample. For example, the
average CGPA of a sample of 100 students of BRAC University (sample mean), the proportion
of voters supporting a candidate based on a survey of 500 people, or the standard deviation of
salaries for a randomly selected group of employees (sample variance) etc.

Estimator

Any function of sample values, a statistic, when used to estimate a parameter is called an
estimator. Example: sample mean and sample variance are estimator.

Estimate

If we get a numeric value of the estimator, then it is called estimate.


Example: mean of a specific sample, variance of a specific sample.
Variable

Variable is a measurable quantity which can vary within its domain. That means its value may
change from one object to another.
For example, family size is a variable, because it is measurable quantity vary within its domain.

Types of variables

There are two basic types of variables


1) Qualitative variable (Categorical variable): Qualitative variable is one for which
numerical measurement is not possible. Each unit only be classified into one of a group of
categories. For example, marital status (It can only be categorized as unmarried, married,
widowed, divorced etc.), Hair color, religion, home district etc.
2) Quantitative variable (Numerical variable): A variable is said to be quantitative if its
values are measured inherently on a numerical scale. For example, number of family
members (1,2,3, …), age of students etc.
i. Discrete variable: A variable is said to be discrete if it takes only the
isolated or countable values (never included fraction or decimals). For
example, family size.
ii. Continuous variable: A variable is said to be continuous if it can take any
value on some interval. For example, CGPA, Height, weight.

Data

Data are results from making observation either on a single variable or more.
There are two types of data:
a) Qualitative data: Observed values of a qualitative variable are called qualitative data. b)
Quantitative data: Observed values of a quantitative variable are called quantitative data. i.
Discrete data
ii. Continuous data

Data can also be classified into three types:


⮚ Univariate Data: This type of data consists of observation on a single variable. For
example, the following sample of lifetimes (hours) of brand D batteries put to a certain
use is a numerical univariate data set: 5.6 5.1 6.2 6.0 5.8 6.5 5.8 5.5
⮚ Bivariate Data: This type of data consists of observations that are made on each of two
variables. For example, a data set might consist of a (height, weight) pair for each cricket
player on a team, with the first observation as (72, 168), the second as (75, 212), and so
on.
⮚ Multivariate Data: This type of data arises when observations are made on more than one
variable (so bivariate is a special case of multivariate). For example, a research physician
might determine the systolic blood pressure, diastolic blood pressure, and serum
cholesterol level for each patient participating in a study. Each observation would be a
triple of numbers, such as (120, 80, 146).

Operations with notations


Given a data set consisting of n observation on a variable ��
1. ���� + ���� + ����
Summation: + … + ���� = ∑ ����
�� = {����, ����,
��
��=��
����, … , ����} ����=��

2. �������� + �������� + �������� + … + �������� = ∑ ��������

Product:
��
��=��
���� ∗ ���� ∗ ���� ∗ … ∗ ���� = ∏ ����

Relationship between probability and statistics

In a statistics problem, characteristics of a sample are available to the experimenter, and this
information enables the experimenter to draw conclusions about the population. The relationship
between the two disciplines can be summarized by saying that probability reasons from the
population to the sample (deductive reasoning), whereas inferential statistics reasons from the
sample to the population (inductive reasoning).

***Elements in probability allow us to draw conclusions about characteristics of hypothetical


data (sample) taken from the population, based on known features of the population. This type of
reasoning is deductive in nature.

***The sample along with inferential statistics allows us to draw conclusions about the
population, with inferential statistics making clear use of elements of probability. This reasoning
is inductive in nature.
Nur-E Jannat Hoque
Adjunct Faculty (Statistics), MNS
Department BRAC University
Exercise:

Classify each variable as qualitative or quantitative:

i) Marital status of nurses in a hospital


ii) Time it takes to run a marathon
iii) Color of automobiles in a shopping center parking lot.
iv) Age of people living in a personal care home.

Discrete or continuous:
ii) Number of pizzas sold by Pizza express each day.
iii) Lifetime (in hours) of 15 iPod batteries
iv) Number of students in STA201section 20.
v) Age of the students of a class.

Do yourself
What are the scopes of statistics in life science and engineering?

You might also like