Statistics

MEMBER OF GRUP 1 :

DETYA INDRIAWAN

DIAH AULIA I

KARINA PRAVITASARI

MASATUL FARHAH

TIARA ARISENDA K

2013

Statistics?

A collection of quantitative data from a sample or

population.

The science that deals with the collection, tabulation,

analysis, interpretation, and presentation of quantitative

data.

Statistic types

Deductive or descriptive statistics

describe and analyze a complete

data set

Inductive statistics

deal with a limited amount of data

(sample).

Conclusions: probability?

Population

A population is any entire collection of people, animals,

plants or things from which we may collect data.

It is the entire group we are interested in, which we wish

to describe or draw conclusions about.

For each population there are many possible samples.

Sample

A sample is a group of units selected from a larger group

(population).

By studying the sample it is hoped to draw valid conclusions about

population.

The sample should be representative of the general population.

The best way is by random sampling.

Parameter

A parameter is a value, usually unknown (and which

therefore has to be estimated), used to represent a

certain population characteristic.

For example, the population mean is a parameter that is

often used to indicate the average value of a quantity.

Inferential Statistics

Statistical Inference makes use of information from a

sample to draw conclusions (inferences) about the

population from which the sample was taken.

Types of data

Variables data

quality characteristics that are measurable values.

measurable and normally continuous;

may take on any value - eg. weight in kg

Attribute data

quality characteristics that are observed to be either

present or absent, conforming or nonconforming.

countable and normally discrete; integer - eg: 0, 1,

5, 25, , but cannot 4.65

Graphical:

Plot or picture of a frequency distribution.

Analytical:

Summarize data by computing a measure of central tendensy

and dispersion.

Sampling Methods

Sampling methods are methods for selecting a

sample from the population:

Simple random sampling - equal chance for each

member of the population to be selected for the sample.

Systematic sampling - the process of selecting every n-th

member of the population arranged in a list.

Stratified sample - obtained by dividing the population

into subgroups and then randomly selecting from each

subgroups.

Cluster sampling - In cluster sampling groups are selected

rather than individuals.

Incidental or convenience sampling - Incidental or

convenience sampling is taking an intact group (e.g. your

own forth grade class of pupils)

Frequency Distribution

Consider the following

set of data which are the

high temperatures

recorded for 30

consequetive days.

We wish to summarize

this data by creating a

frequency distribution of

the temperatures.

Temperatures for 30

Days

50

45

49

50

43

49

50

49

45

49

47

47

44

51 51

44

47

46

50

44

51

49

43

43

49

45

46

45

51

46

Example of Frequency

Distribution

Temperatures

Temperature

51

50

49

Tally

////

////

//////

48

47

46

45

44

43

Frequency

4

4

6

0

///

///

////

///

///

3

3

4

3

3

N=

30

A cummulative freq distribution can be created by

adding an additional column called "Cummulative

Frequency."

The cum. frequency for a given value can be obtained

by adding the frequency for the value to the

cummulative value for the value below the given

value.

For example: The cum. frequency for 45 is 10 which is

the cum. frequency for 44 (6) plus the frequency for 45

(4).

Finally, notice that the cum. frequency for the highest

value should be the same as the total of the frequency

column.

Temperatures

Temperatu

Frequenc

Cummulative

Tally

Frequency

re

y

51

50

49

48

47

46

45

44

43

////

////

/////

/

///

///

////

///

///

N

=

4

4

6

0

3

3

4

3

3

30

30

26

22

16

16

13

10

6

3

In some cases it is necessary to group the values of the

data to summarize the data properly.

Eg., we wish to create a freq. distribution for the IQ

scores of 30 pupils.

The IQ scores in the range 73 to 139.

To include these scores in a freq. distribution we would

need 67 different score values (139 down to 73).

This would not summarize the data very much.

To solve this problem we would group scores together

and create a grouped freq. distribution.

If data has more than 20 score values, we should create

a grouped freq. distribution by grouping score values

together into class intervals.

Grouped frequency

high temperatures for 50

days.

and the lowest temperature is

39.

We would have 21

temperature values.

This is greater than 20 values

so we should create a

grouped frequency

distribution.

Temperatures for 50 Days

57

39

52

52

43

50

53

42

58

55

58

50

53

50

49

45

49

51

44

54

49

57

55

59

45

50

45

51

54

58

53

49

52

51

41

52

40

44

49

45

43

47

47

43

51

55

55

46

54

41

Class Interval

Tally

Interval

Midpoint

Frequency

57-59

//////

58

54-56

55

52

11

48-50

///////

/////////

//

/////////

49

45-47

///////

46

42-44

//////

43

39-41

////

40

N=

50

51-53

Histograms

Constructing a Histogram for Discrete Data

First, determine the frequency and relative frequency of each x value.

Then mark possible x value on a horizontal scale.

Descriptive statistics

Measures of Central Tendency

Describes the center position of the data

Mean, Median, Mode

Measures of Dispersion

Describes the spread of the data

Range, Variance, Standard deviation

Measures of central

tendency: Mean N

1

Arithmetic mean: x =

xi

N i 1

follows and N is the number of observations

the mean is (0+2+5+9+12)/5 = 28/5 = 5.6

Median - mode

Median = the observation in the middle of sorted data

Mode = the most frequently occurring value

100 91 85 84 75 72 72 69 65

Mode

Median

Mean = 79.22

Measures of dispersion:

range

The range is calculated by taking the maximum value and

subtracting the minimum value.

2 4 6 8 10 12 14

Range = 14 - 2 = 12

Measures of dispersion:

variance

Calculate the deviation from the mean for every

observation.

Square each deviation

Add them up and divide by the number of observations

( xi

i 1

Measures of dispersion:

standard deviation

The standard deviation is the square root of the variance.

The variance is in square units so the standard

deviation is in the same units as x.

( xi

i 1

Sample

General formula/ungrouped data:

n

(X

i 1

X )2

n 1

n

n X

i 1

2

i

i 1

n(n 1)

Sample

Grouped data:

n ( f i X )

i 1

2

i

fX

i 1

n(n 1)

curve shape

If is small, there is a high probability for getting a value

close to the mean.

If is large, there is a correspondingly higher probability

for getting values further away from the mean.

The normal curve or the normal

frequency distribution or Gaussian

distribution is a hypothetical

distribution that is widely used in

statistical analysis.

The characteristics of the normal

curve make it useful in education and

in the physical and social sciences.

Characteristics of the

Normal Curve

The normal curve is a symmetrical distribution

of data with an equal number of data above and

below the midpoint of the abscissa.

Since the distribution of data is symmetrical the

mean, median, and mode are all at the same

point on the abscissa.

In other words, mean = median = mode.

If we divide the distribution up into standard

deviation units, a known proportion of data lies

within each portion of the curve.

34.13% between and 1 below the mean.

Approximately two-thirds (68.28 %) within 1 of the mean.

13.59% of the data lie between one and two standard

deviations

Finally, almost all of the data (99.74%) are within 3 of the

mean.

Standardized normal

value,

Z

When a score is expressed in standard deviation

mean has a Z-score of 1.

A score that is one standard deviation below the

mean has a Z-score of -1.

A score that is at the mean would have a Z-score

of 0.

The normal curve with Z-scores along the

abscissa looks exactly like the normal curve with

standard deviation units along the abscissa.

Z-value

Deviation IQ Scores, sometimes called Wechsler IQ scores,

are a standard score with a mean of 100 and a standard

deviation of 15.

What percentage of the general population have deviation

IQs lower than 85?

So an IQ of 85 is equivalent to a z-value of 1.

So 50 % - 34.13 % = 15.87% of the population has IQ

scores lower than 85.

Frequency Polygon

A frequency polygon is what you may think of as a curve.

A frequency polygon can be created with interval or ratio

data.

Let's create a frequency polygon with the data we used

earlier to create a histogram.

Arrange the values along the abscissa (horizonal axis).

Arrange the lowest data on the left & the highest on

the right.

Add one value below the lowest data and one above

the highest data.

Create a ordinate (vertical axis).

Arrange the frequency values along the abscissa.

Provide a label for the ordinate (Frequency).

Create the body of the frequency polygon by placing a

dot for each value.

Connect each of the dots to the next dot with a straight

line.

Provide a title for the frequency polygon.

