You are on page 1of 10

PROBABILITY AND STATISTICS FOR COMPUTER SCIENCE

CHAPTER 1
INTRODUCTION TO STATISTICS
LATEXVersion

Prepared By: Bonsa Girma

March 11, 2022


Diredawa, Ethiopia
CHAPTER 1 PROBABILITY AND STATISTICS

1.1 Introduction

Definition of statistics Statistics can be defined in two senses

(a) In plural sense (as Statistical Data): statistics are the raw data themselves , like statis-
tics of births, statistics of deaths, statistics of plants, statistics of students, statistics
of imports and exports, etc.

(b) In singular sense (as Statistical Methods): statistics is the subject that deals with the
collection, organization, presentation, analysis and interpretation of data.

Classifications: statistics classified in to two main areas or branches

(1) Descriptive statistics: Descriptive statistics are concerned with collecting, summa-
rizing and describing the characteristics of data

– With descriptive statistics we are only concerned with the data collected and
make no effort to generalize it to any other data, such as for the population

– Concerned with summary calculations, graphs, charts and tables.

(2) Inferential statistics In inferential statistics we select a random sample and we use
the information from it to make generalization about the population from which the
sample was taken. Inferential statistics generalizes from sample to populations, per-
forming estimations and hypothesis tests, determining relationships among variables,
and making predictions.

1
CHAPTER 1 PROBABILITY AND STATISTICS

Stages in statistical investigation: The common stages or steps in any statistical inves-
tigation are.

(1) Collection of data: the process of measuring, gathering, assembling the raw data up
on which the statistical investigation is to be based. Data can be collected in a variety
of ways; one of the most common methods is through the use of survey. Survey can
also be done in different methods, three of the most common methods are:

• Structured Questionnaire

• Telephone survey

• Mailed questionnaire

• Personal interview

(2) Organization of data: Summarization of data in some meaningful way, e.g table
form

(3) Presentation of the data: The process of re-organization, classification, compilation,


and summarization of data to present it in a meaningful form.

(4) Analysis of data: The process of extracting relevant information from the summa-
rized data, mainly through the use of elementary mathematical operation.

(5) Inference: the interpretation and further observation of the various statistical mea-
sures through the analysis of the data by implementing those methods by which con-
clusions are formed and inferences made.

Definitions of some terms

(1) Population: A collection of individuals or items about which we want to draw Con-
clusions. The population represents the target of an investigation, and the objective
of the investigation is to draw conclusions about the population hence we sometimes
call it target population.
Examples:

2
CHAPTER 1 PROBABILITY AND STATISTICS

Population of households in Dire Dawa

Population of items to be inspected

Population of female students etc. . .

population could be finite or infinite

(2) Census : The collection of information from the whole population or it’s a complete
enumeration of the population.

(3) Sample : A selection of information from a subset of the population. Selected using
some pre-defined sampling technique in such a way that they represent the population
very well.

(4) Parameter : A numerical quantity measuring some aspect of a population or Char-


acteristic or measure obtained from a population.

(5) Statistic : A quantity calculated from data gathered from a sample. It’s Character-
istic or measure obtained from a sample. It is usually used to estimate a population
parameter.

(6) Sampling : The process or method of sample selection from the population.

(7) Sample size: The number of elements or observation to be included in the sample.

(8) Data : (singular datum) Information about individuals in a population or values


(measurements or observations) that the variables can assume.

(8) Variable : Is a characteristic or attribute that can assume different values.

(8) Distribution : The pattern of variation of data.

(9) Random sample : Is a sample which is randomly selected from a population where
every individuals items having the same chance of being selected.

3
CHAPTER 1 PROBABILITY AND STATISTICS

Examples of Parameters and Statistic

Example 1.1: When examining the mean age of all first year students in a given collage,
the mean age found would be a parameter. If we took a random sample of 300 first year
students in that collage, then the mean age would be a statistic.
Exercise 1.1: A business is considering purchasing newly produced light bulbs it will make
the purchase if no more than 1.5% of the bulbs are defective. Because of time factors in
testing all 40,000 bulbs the business decides to test a random sample of 400 for defects.
They will then use the results of this sample to estimate the percentage of defectives for the
population to be purchased.

(a) What is the population ans the sample size?

(c) What population parameter is of interest to the business?

(d) What statistic is being used to estimate the parameter?

Applications, Uses and Limitations of Statistics.

Applications of Statistics: In this modern time, statistical information plays a very im-
portant role in a wide range of fields. Today, statistics is applied in almost all fields of human
endeavor.

In Scientific Research: Statistics is used as a tool in a scientific research. Statistical


formulas and concepts are applied on a data which are results of an experiment.

In Quality Control: Statistical methods help to check whether a product satisfies a


given standard.

For Decision Making: Statistics helps to enhance the power of decision making in
the face of uncertainty by providing sufficient information.

In Agriculture: Experiments are designed and analysed using statistical procedures.

In Public Health and Medicine: Statistical methods are used for computation and
interpretation of birth and death rates.

In Economics: For modeling functional relationships between or among variables

4
CHAPTER 1 PROBABILITY AND STATISTICS

Uses of Statistics: The main function of statistics is to enlarge our knowledge of complex
phenomena. The following are some uses of statistics:

It presents facts in a definite and precise form.

Data reduction.

Measuring the magnitude of variations in data.

Furnishes a technique of comparison

Estimating unknown population characteristics.

Testing and formulating of hypothesis.

Studying the relationship between two or more variable.

Forecasting future events.

Limitations of Statistics: As a science statistics has its own limitations. The following
are some of the limitations:

Deals with only aggregate of facts and not with individual data items.

Statistical data are only approximately and not mathematical correct.

Statistics can be easily misused and therefore should be used by experts.

Statistical results are true on average; i.e. Laws of statistics are not universally true
like the laws of physics, chemistry and mathematics.

Statistics are liable to be misused or misinterpreted. This may be due to incomplete


information, inadequate and faulty procedures during data collection and sample se-
lection and mainly due to ignorance (lack of knowledge).

5
CHAPTER 1 PROBABILITY AND STATISTICS

1.2 Describing Data

We can describe data in three ways:

(1) Based on the variable type

(2) Based on Scales of Measurement

(3) Based on the source,

Based on the variable type: Data can be described as

~ Qualitative Variables: are non-numeric variables and can’t be measured. Can be


described by a quality or characteristic that is essentially non-numeric individuals are
described by different categories.
Examples: Gender, favorite color, religious affiliation etc...

~ Quantitative Variables are numerical variables and can be counted or measured.


Example: Age, Height, Weight, Length, Temperature etc. . .
Quantitative variable can be either

(a) Discrete variables or

(b) Continuous variables

Discrete variables: assume only certain values, and there are usually ”gaps”
between the values, discrete variables can assign values such as 0, 1, 2, 3 and are
said to be countable. Some examples are

• Number of animals in a zoo


• Number of customers waiting for service
• Number of students in a classroom e.t.c..

Continuous variables: Can assume an infinite number of values between any


two specific values. They are obtained by measuring. They often include fractions
and decimals. Some examples are

• Weight of a person
• Distance covered by athlete

6
CHAPTER 1 PROBABILITY AND STATISTICS

Exercise 1.2: Identify the following variable as qualitative or quantitative

(1) Birth place of a person

(2) Handedness of an individual

(3) Marks of students in maths class

Based on Scales of Measurement: Proper knowledge about the nature and type of data
to be dealt with is essential in order to specify and apply the proper statistical method for
their analysis and inferences. Measurement scale refers to the property of value assigned to
the data based on the properties of order, distance and fixed zero.

The for scale types or scale of measurements are

1. Nominal

2. Ordinal

3. Interval

4. Ratio

Nominal level of measurement: Classifies data into mutually exclusive (non-overlapping)


categories in which no order or ranking can be imposed on the data. Some example are

Sex (Male or Female.)

Religion (Orthodox, Protestant, Muslim )

Race (Black or white)

Favorite color (Blue, pink, red etc. . . )

Country name

Country code

7
CHAPTER 1 PROBABILITY AND STATISTICS

Ordinal level of measurement: Data measured at this level can be placed into categories,
and these categories can be ordered, or ranked. Precise differences between the ranks do not
exist. Some examples are

Letter grades (A, B, C, D, and F).

Rating scales (Excellent, Very good, Good, Fair, poor).

Economic status(low, medium, high)

Interval scales of measurements: are measurement systems that possess the properties
of Order and existence of precise differences between units, but not the property of fixed
zero, Interval scales are numerical scales in which intervals have the same interpretation
throughout. some examples are

Temperature in Fahrenheit scale

IQ

The difference between 30o F and 40o F represents the same temperature difference between
45o F and 55o F. This because each 10 degree interval has the same physical meaning in terms
of the kinetic energy of molecules and 0o F does not mean no heat at all.

Ratio Scales of measurements: Level of measurement which classifies data that can
be ranked, differences are meaningful, and there is a true zero. True ratios exist between
the different units of measure. All arithmetic and relational operations are applicable. Some
examples are

Weight

Blood pressure.

Length

Based on the source: data can be classified into two:

~ Primary Data

~ Secondary Data

8
CHAPTER 1 PROBABILITY AND STATISTICS

Primary Data: are data collected for the first time either through direct observation or by
enquiring individuals. It refers to the data collected either by or under the direct supervision
and instruction of the researcher.

Secondary Data: are data obtained from published or unpublished sources like news-
papers, journals, official records, etc.

You might also like