You are on page 1of 9

1

Chapter 1

What is Statistics?
At the end of this chapter, the students should be able to:
 Define statistics and understand the process of statistics.
 Define descriptive statistics and inferential statistics.
 Distinguish between descriptive statistics and inferential statistics.
 Differentiate between a population and a sample.
 Define the terms used in statistics.
 Distinguish between qualitative variables, quantitative variables, discrete random
variables, and continuous random variables.

1.1 Definition of Statistics

Statistics is a discipline of study dealing with collecting, organizing, presenting, analysing,


and interpreting of data in order to draw conclusions and make decisions. Moreover, this
field is also utilized by others fields such as business and industry to help the control the
quality of goods and services that they produce. Social scientist and psychologists use
statistical approaches to study our behaviours. Because of its broad range of applicability,
a course in statistics is required of majors in disciplines such as sociology, physiology,
criminal justice, nursing, exercise science, pharmacy, education, and many others. To
accommodate this diverse group of users, examples and problems in this outline are
chosen from many different sources. Additionally, the process of statistics follows four
steps:

Step 1
Identify the research objective
The researcher must have in mind why he (she) does the study and determine the
detailed of the question he (she) wants to answer. He (she) also needs to set a
targeted group for the study and focus only on that group.

Step 2
Collect the information needed
Collection of information can be conducted from a population or sample. However,
the data from population is often difficult to collect and expensive due to the huge
amount of data. Therefore, the survey is normally based on a sample of the
population.
2

Step 3
Organize, summarize and analyse the information
This step is called descriptive statistics. The data collected from a population or
sample is organized either in a numerical method or graphical method. It provides
an overview of the information collected.

Step 4
Make decision or draw conclusion
This step is called inferential statistics. The information collected from the sample
generalized to the population.

1.2 Descriptive and Inferential Statistics

Descriptive Statistics
This statistics includes the method of organizing, displaying and describing data by using
tables, graphs and summary measures.

Example 1.1
The compilation of batting average, runs butted in, and number of home runs for each
player, as well as earned run average, won/lost percentage, number of saves, etc, for each
pitcher from the official score sheets for major league baseball players is an example of
descriptive statistics. These statistical measures allow us to compare players, determine
whether a player is having an “off year” or “good year”, etc.

Inferential Statistics
This statistics includes the method that uses sample results to help make decision or
prediction about a population.

Example 1.2
The techniques of inferential statistics are applied in many industrial processes to control
the quality of the products produced. In industrial settings, the population may consist of
the daily production of toothbrushes, computer chips, bolts, and so forth. The sample will
consist of a random and representative selection of items from the process producing the
toothbrushes, computer chips, bolts, etc. The information contained in the daily sample is
used to construct control charts. The control charts are then used to monitor the quality of
the products.
3

1.3 Population and Sample

Definition of Population
A population is a complete collection of all elements of the target group (individuals, items
or objects) whose characteristics are being studied. The collection is complete in the sense
that it includes all the subjects to be studied. It is also known as target population.

Definition of Sample
A sample is a subset of a population. It is a collection of a few elements selected from a
population, i.e., it consists of a portion of the population selected for study.

Example 1.3
The results of polls are widely reported by both the written and the electronic media. The
techniques of inferential statistics are widely utilized by pollsters. Table 1.1 explores
several examples of populations and samples encountered in polls reported by the media.
The methods of inferential statistics are used to make inferences about the populations
based upon the results found in the sample and to give an indication about the reliability
of these inferences. Suppose the results of a poll of 600 registered voters are reported as
follows: forty percent of the voters approve of the president’s economic policies. The
margin of error for the survey is 4%. The survey indicates that an estimated 40% of all
registered voters approve of the economic policies, but it might be as low as 36% or as
high as 44%.

Table 1.1
Population Sample
All registered voters A telephone survey of 600 registered voters
All owners of handguns A telephone survey of 1000 handgun owners
Households headed by single parent The results from questionnaires sent to 2500
households headed by a single parent
The CEOs of all private companies The results from surveys sent to 150 CEO’s of
private companies

Example 1.4
The statements for population are as follows:
 The heights of all citizens in Malaysia.
 The monthly incomes of all workers in MMU.
 The tuition fees of all students in an university.

The statements for sample are as follows:


 The ages of 50 students in Centre of Diploma Studies at UTHM.
4

 The gross profit of 100 local companies in 2013.


 The percentage of failures in SPM examination results for 30 schools.

1.4 Variable, Observation and Data Set

Definition of variable
A variable is a characteristic of interest concerning the individual elements of a population
or a sample.

Definition of observation
An observation or measurement is a value of a variable or characteristic for an element.

Definition of data set


A raw data or data set is an observation (such as measurement, gender and survey
response) that has been collected.

Definition of element or member


An element or member is specific subject or object about which the information is
collected.

Example 1.5
The following is the observation on the total number of students in each foundation.

Table 1.2
Foundation Number of students Variable
Element Engineering 500
IT 200
Management 800
Law 100 Observation
Life Sciences 50
5

1.5 Types of Variables (Data Type)

Definition of Quantitative Variable


A quantitative variable is determined when the description of the characteristic of interest
results in a numerical value. For example, temperatures, price of text books, weights,
heights, etc. Additionally, there are two types of this variable: discrete variable and
continuous variable.

Definition of Discrete Variable


A discrete variable is a variable whose value is countable. The possible value can be whole
number only with no intermediate value.

Definition of Continuous Variable


A continuous variable is a variable whose value is measurable. The possible value can be
any value over a certain interval or range.

Example 1.6
Table 1.3
Discrete variable Possible values for the variable
The number of bulbs in a classroom 3, 4, 5, or 10
The number of accidents within a week 0, 1, 2, 3, ..., 10
The number of TV sold in a week 0, 1, 2, 3, 4 (finite value)
The number of customer in a week 0, 1, 2,... (infinite value)

Continuous variable Possible values for the variable


The time by person to arrive at home from work 30.1’, 30.2’, or 45.01’, etc
The temperature in a room 24.9 C, 25.02 C, etc
The price of textbook RM 45.9, RM 57.9, etc

Definition of Qualitative Variable


A qualitative variable is determined when the description of the characteristic of interest
results in a nonnumeric value. A qualitative variable may be classified into two or more
categories.

Example 1.7
Table 1.4 gives several examples of qualitative variables along with a set of categories into
which they may be classified.

Table 1.4
Qualitative variable Possible categories for the variable
Marital status Single, married, divorced, separated
Blood type O, A, B, AB
Gender Male, Female
Pain level None, low, moderate, severe
6

The possible categories for qualitative variables are often coded for the purpose of
performing computerized statistical analysis. Marital status might be coded as 1, 2, 3, or 4
where 1 represents single, 2 represents married, 3 represents divorced, and 4 represents
separated. The variable gender might be coded as 0 for female and 1 for male. The
categories for any qualitative variable may be coded in a similar fashion. Even though
numerical values are associated with the characteristic of interest after being coded, the
variable is considered a qualitative variable.

Scales of Measurements
The qualitative data can be measured by using four scales such as nominal, ordinal,
interval and ratio. Through this section, each scale will be explained clearly.
 Nominal scale (Classification) is characterized by data that consist of names, labels,
or categories only. This scale data cannot be arranged in an ordering scheme.
Example 1.7

Qualitative variable Possible nominal level data values associated with


the variable
Blood type A, B, AB, O → [ 1, 2, 3, 4]
State of residence Johor, Melaka,..., Kedah → [1, 2, ..., n]
Religion Moslem, Hindu, Buddha, other → [1, 2, 3, ...]
Gender Male, Female → [1, 2]

 Ordinal scale (Ranking); numbers are used to place objects in order, but there is no
information regarding the differences (intervals) between points on the scale.

Example 1.8:
Grading systems [A, B, C, D, E]. Customer’s satisfaction [1, 2, 3, 4, 5]. Level of
education, Linkert’s scale, team/individual standing, socioeconomic status.

 Interval scale (Equal intervals); precise differences between units of measure


exists, but there is no meaningful zero. If a zero exists, it is an arbitrary point.

Example 1.9:
IQ scores, it makes sense to talk about someone having an IQ 20 points higher
than another person, but an IQ zero has no meaning. Celsius and Fahrenheit
temperature scales, most psychological measures.

 Ratio scale; a measurement scale that has equal units of measurement and a
rational zero point for the scale (absolute zero).

Example 1.10:
Kelvin temperature scale, income in Ringgit, length, area, or volume, height and
weight.
7

1.6 Application in Computer Sciences

Example 1.11:
The following tables indicate the collection of values that a variable took during the
measurement. Identify the element and types of variable in (a), (b), and (c).

(b)

(a)

(c)

(a) The element in the table is the student named, X1 to X8 and the variable is the grade
of each student, X. The measured variable is quantitative and discrete because it is
described in an only finite numerical value.
(b) The element in the table is the types of operating systems, which are Windows, Linux,
BSD and MacOS. The variable is the algorithm that is generated in each operating
system in the element. The measured variable is qualitative and discrete because it is
described by the name of algorithms and nonnumeric value.
(c) The element of the table is the number of trial and the variable is the run time taken
for each trial. The measured variable is also quantitative and discrete because it is
described in an only finite numerical value.

Example 1.12:
The KSW computer science aptitude test consists of 25 questions. The score reported is
reflective of the computer science aptitude of the test taker. How would the score likely be
reported for the test? What are the possible values for the scores? Is the variable discrete
or continuous?

Answer:
The score reported would likely be the number or percent of correct answers. The number
correct would be a whole number from 0 to 25 and the percent correct would range from
0 to 100 in steps of size 4. However, if the test evaluator considered the reasoning process
used to arrive at the answers and assigned partial credit for each problem, the scores
could range from 0 to 25 or 0 to 100 percent continuously. That is, the score could be any
real number between 0 and 25 or any real number between 0 and 100 percent. We might
8

say that for all practical purposes, the variable is discrete. However, theoretically the
variable is continuous.

Example 1.13: Try by yourself!


Which of the following are qualitative variables?
(a) The color of automobiles involved in several severe accidents
(b) The length of time required for rats to move through a maze
(c) The classification of police administrations as city, county, or state
(d) The rating given to a pizza in a taste test as poor, good, or excellent
(e) The number of times subjects in a sociological research study have been married

Example 1.14:
The pain level following surgery for an intestinal blockage was classified as none, low,
moderate, or severe for several patients. Give three different numerical coding schemes
that might be used for the purpose of inclusion of the responses in a computer data file.
Does this coding change the variable to a quantitative variable?

Answer:
The responses none, low, moderate, or severe might be coded as 0, 1, 2, or 3 or 1, 2, 3, or
4 or as 10, 20, 30, or 40. There is no limit to the number of coding schemes that could be
used. Coding the variable does not change it into a quantitative variable. Many times
coding a qualitative variable simplifies the computer analysis performed on the variable.

Example 1.15:
Indicate the scale of measurement for each of the following variables: racial origin,
monthly phone bills, Fahrenheit and centigrade temperature scales, military ranks, time,
ranking of a personality trait, clinical diagnoses, and calendar numbering of the years.

Answer:
Racial origin: nominal, time: ratio, monthly phone bills: ratio ranking of personality trait:
ordinal, temperature scales: interval clinical diagnoses: nominal, military ranks: ordinal
calendar numbering of the years: interval.

Example 1.16:
In a sociological study involving 35 low-income households, the number of children per
household was recorded for each household. What is the variable? How many
observations are in the data set?

Answer:
The variable is the number of children per household. The data set contains 35
observations.
9

Example 1.17: by yourself!


A national survey was mailed to 5000 households and one question asked for the number
of handguns per household. Three thousand of the surveys were completed and returned.
What is the variable and how large is the data set?

Example 1.18: by yourself!


The number of hours spent per week on paper work was determined for 200 middle level
managers. The minimum was 0 hours and the maximum was 27 hours. What is the
variable? How many observations are in the data set?

COMPUTER SOFTWARE AND STATISTICS


The techniques of descriptive and inferential statistics involve lengthy repetitive
computations as well as the construction of various graphical constructs. These
computations and graphical constructions have been simplified by the development of
statistical computer software. These computer software programs are referred to as
statistical software packages, or simply statistical packages. These statistical packages are
large computer programs which perform the various computations and graphical
constructions discussed in this outline plus many other ones beyond the scope of the
outline. Statistical packages are currently available for use on mainframes, minicomputers,
and microcomputers. There are currently available numerous statistical packages. Five
widely used statistical packages will be utilized in this book: SAS, SPSS, MINITAB, EXCEL,
MatLab and STATISTIX. The author would like to thank all five of the companies that
produce this software for permission to include output from the five packages in Beginning
Statistics. A student of statistics needs to be able to read output from the various packages
as well as to use the packages. I hope to accomplish this by including output from the
various packages and including practice problems involving the software.

You might also like