Notes on Statistics

Notes on Statistics

What is statistics? ..................................... 2

Population and Sample ............................. 2

Descriptive and Inferential Statistics ........ 2

Parameters and Statistics ......................... 2

Statistical data analysis ............................. 2

Variables and organization of the data ......... 3

Variables.................................................... 3

Scales..................................................... 3

Organization of the data ........................... 3

Presentation of Data ............................. 3

Describing data by tables and graphs ........... 4

Qualitative variable ................................... 4

Quantitative variable ................................ 4

Three Popular Data Displays ..................... 5

Stem and Leaf Diagrams ....................... 5

Frequency Histograms .......................... 5

Relative Frequency Histograms ............ 5

Sample and Population Distributions ....... 5

Measures of center ....................................... 6

The Mean .................................................. 6

Sample mean......................................... 6

Population mean ................................... 6

The Median ............................................... 6

The Mode .................................................. 6

Measures of variation ................................... 7

The Range.................................................. 7

The Variance and the Standard Deviation 7

The Nature of Statistics includes the includes methods

construction of like point

What is statistics? graphs, charts, and estimation, interval

- the methodology for collecting, tables, and the estimation and

analyzing, interpreting and drawing calculation of hypothesis testing

conclusions from information various descriptive which are all based

- consists of a body of methods for measures such as on probability

collecting and analyzing data averages, measures theory

- the science of gaining information of variation, and

from numerical and categorical data percentiles

- provides methods for:

o design

o description

Parameters and Statistics

o inference Parameter Statistic

- the science of dealing with uncertain unknown numerical known numerical

phenomenon and events summary of the summary of the

population sample which can

Population and Sample be used to make

Population Sample inference about

the collection of all part of the parameters

individuals or items population from e.g. The proportion e.g. The proportion

under consideration which information p of 18-30 year-olds ˆ p of 18-30 year-

in a statistical study is collected going to movies at olds going to

the set of the set of least once a month movies at least

measurements (or measurements that once a month

record of some are actually calculated from the

qualitative trait) collected in the sample of 18-30

corresponding to course of an year-olds

the entire collection investigation

of units for which

inferences are to be Statistical data analysis

made 1. Formulate the research problem

- finite population, can be physically 2. Deﬁne population and sample

listed 3. Collect the data

- hypothetical population, more 4. Do descriptive data analysis

abstract and may arise from the 5. Use appropriate statistical methods

phenomenon under consideration to solve the research problem

6. Report the results

Descriptive and Inferential Statistics

Descriptive Inferential

consist of methods consist of methods

for organizing and for drawing and

summarizing measuring the

information reliability of

conclusions about

population based

on information

obtained from a

sample of the

population

Variables and organization of Organization of the data

data values of the

the data variables for one or

Variables more people or

- characteristics that varies from one things

person or thing to another observation each individual

- quantitative or qualitative piece of data

- quantitative: data set or data the values of

o discrete matrix variables recorded

has only a countable for a set of

number of distinct sampling units

possible values

can assume only a

Presentation of Data

ﬁnite numbers of

1. data list

values or as many

a. in list

values as there are

integers

o continuous

quantities such as

length, weight, or b. in set notation

temperature

measured arbitrarily 2. data frequency table

accurately o each distinct value x is listed

Scales in the first row

Qualitative Quantitative Variables o frequency, f, which is the

Variables number of times the value x

nominal scale interval scale appears in the data set, is

- no - can compare listed below it in the second

natural diﬀerences row

ordering between

measurements

of the variable

meaningfully,

but not the

ratio of the

measurements

ordinal scale ratio scale

- in order - can compare

both the

diﬀerences

between

measurements

of the variable

and the ratio

of the

measurements

meaningfully

Describing data by tables and Quantitative variable

If the discrete variable can have a lot of

graphs diﬀerent values or the quantitative variable is

Qualitative variable the continuous variable, group into classes:

1. frequency (or count) - number of 1. Find the minimum and the

observations that fall into particular maximum values variable have in the

class (or category) data set

2. frequency distribution - table listing 2. Choose intervals of equal length that

all classes and their frequencies cover the range between the

minimum and the maximum without

overlapping (class intervals, class

limits)

3. Count the number of observations in

the data that belongs to each class

interval. The count in each class is

the class frequency.

4. Calculate the relative frequencies of

each class by dividing the class

frequency by the total number of

3. relative frequency – percentage of a observations in the data

class; dividing the frequency of the

class by the total number of class mark

observations and multiplying the - number in the middle of the class

result by 100 real class limit

- number in the middle of the upper

class limit of one class and the lower

4. relative frequency distribution - class limit of the other class

table listing all classes and their histogram

relative frequencies

5. cumulative frequency (cumulative

relative frequency) – sum of the

frequencies (relative frequencies) of

all classes up to the speciﬁc class

6. pie chart - a disk divided into pie-

shaped pieces proportional to the

relative frequencies of the classes;

multiply the relative frequencies by

360 degrees

7. horizontal bar graph - displays the

classes on the horizontal axis and the

frequencies (or relative frequencies)

of the classes on the vertical axis

8. vertical bar graph - classes are

displayed on the vertical axis and the

frequencies of the classes on the

horizontal axis

Three Popular Data Displays Sample and Population Distributions

Data example:

Frequency Histograms

Measures of center are on its left and the other 50% on

its right

The Mean - the sample median x^~ of a set of

- should be used when variable is sample data for which there are an

quantitative with symmetric odd number of measurements is the

distribution middle measurement when the data

are arranged in numerical order

Sample mean

- the sample median x^~ of a set of

sample data for which there are an

even number of measurements is the

mean of the two middle

measurements when the data are

arranged in numerical order

The Mode

- the sample mode of a set of sample

data is the most frequently occurring

value

- if the greatest frequency is 1 (i.e. no

value occurs more than once), then

the variable has no mode

- should be used when calculating

Population mean measure of center for the qualitative

variable

- on a relative frequency histogram,

the highest point of the histogram

corresponds to the mode of the data

set

The Median

- should be used when quantitative

variable has skewed distribution

in a data set so that 50% of the data

Measures of variation

The Range

- the number R defined by the

formula: R = xmax−xmin

Deviation

