Introduction to Statistics

Introduction, examples and deﬁnitions

Introduction

We begin the module with some basic data analysis. Since Statistics involvesthe collection and interpretation of data, we must ﬁrst know how tounderstand, display and summarise large amounts of quantitative information,before undertaking a more sophisticated analysis.Statistical analysis of quantitative data is important throughout the pure andsocial sciences. For example, during this module we will consider examplesfrom Biology, Medicine, Agriculture, Economics, Business and Meteorology.

Examples

Survival of cancer patients:

A cancer patient wants to know the probabilitythat he will survive for at least 5 years. By collecting data on survival

rates of people in a similar situation, it is possible to obtain an empiricalestimate of survival rates. We cannot know whether or not the patient willsurvive, or even know exactly what the

probability

of survival is. However,we can

estimate

the

proportion

of patients who survive from

data

.

Car maintenance:

When buying a certain type of new car, it would be usefulto know how much it is going to cost to run over the ﬁrst three years fromnew. Of course, we cannot predict exactly what this will be — it will varyfrom car to car. However, collecting data from people who bought similarcars will give some idea of the

distribution

of costs across the

population

of car buyers, which in turn will provide information about the

likely

costof running the car.

Deﬁnitions

The quantities measured in a study are called

random variables

, and aparticular outcome is called an

observation

. Several observations are

collectively known as

data

. The collection of all possible outcomes is calledthe

population

.In practice, we cannot usually observe the whole population. Instead weobserve a sub-set of the population, known as a

sample

. In order to ensurethat the sample we take is

representative

of the whole population, we usuallytake a

random sample

in which all members of the population are

equally likely

to be selected for inclusion in the sample. For example, if we areinterested in conducting a survey of the amount of physical exerciseundertaken by the general public, surveying people entering and leaving agymnasium would provide a

biased

sample of the population, and the resultsobtained would

not

generalise to the population at large.Variables are either

qualitative

or

quantitative

. Qualitative variables havenon-numeric outcomes, with no natural ordering. For example, gender,disease status, and type of car are all qualitative variables. Quantitativevariables have numeric outcomes. For example, survival time, height, age,number of children, and number of faults are all quantitative variables.

