0 views

Uploaded by DattaSandesh

Statistics

Statistics

© All Rights Reserved

- Introduction to Biostatistics
- Modul Statistik
- Introduction to Statistics
- Likert Scale [Muhammad Amirrul Fahmi Bin Mukhtar]
- Evaluating Content Uniformity NJPhAST Sep 22 2011
- qantitative analyais
- Introduction to IBM SPSS Statistics 22
- Median for Grouped Data
- Managing Variability Example
- Model de regresie - teorie Orzan.pdf
- Best research paper editing, proofreading, data analysis services in Delhi India
- Research Problem
- Ffa Aceite
- The Nature of Probability and Statistics
- Lies, Damned Lies, Statistics and Pundits
- Business Statistics
- Stat Quiz Abm172
- RM Past Year
- Metodologi penelitian -Measurement 1
- New Microsoft Word Document

You are on page 1of 31

Methods

Chapter 1 : Data and Statistics

Kui Zhang, Mathematical Sciences

Introduction

Statistics deals largely with principles and procedures for colleting,

describing, and drawing conclusions from data

The purpose of this chapter is to:

1.

2.

3.

4.

Define the components of a data set

Present some tools that are used to describe a data set

Discuss methods of data collection

representing one or more characteristics of some objects or units.

Definition 1.2 - A population is a data set representing the entire

entity of interest.

Chapter 1, MA5701 Statistical Methods, Fall 2016

Respondent

1

9

22

36

AGE

41

31

64

26

SEX

1

2

2

1

HAPPY

2

1

3

2

TVHOURS

0

0

0

2

Definition 1.3 - A sample is a data set consisting of a portion of a

population. (Obtained in a way to represent the population)

Where we can obtain the data:

Primary data are collected as the part of study

Secondary data are obtained from other resources

Observational Study can infer association but not causality

Designed Experiments can help establish causality

Data format

Observation(s) a row in the data file

Variables(s) a column in the data file

Chapter 1, MA5701 Statistical Methods, Fall 2016

Variables - Qualitative (Categorical) and Quantitative

Definition 1.4 - A discrete variable can assume only accountable number of

values.

Definition 1.5 A continuous variable is one that can take any one of an

uncountable number of values in an interval.

Definition 1.6 The ratio scale of measurements uses the concept of unit

of distance or measurement and requires a unique of a zero value.

Definition 1.7 The interval scale of measurement also uses the concept

of distance or measurement and requires a zero point, but the definition

of zero is arbitrary.

Chapter 1, MA5701 Statistical Methods, Fall 2016

Variables (Cont.)

Definition 1.8 The ordinal scale distinguishes between

measurements on the basis of the relative amounts of some

characteristic they process.

You can convert ratio or interval scale to ordinal scale, but the criteria is not

always clear, and it induces the loss of information.

name or classification.

Generally for categorical or qualitative variables

Weakest scale

Can convert ratio, interval, or ordinal scale to nominal scale

Chapter 1, MA5701 Statistical Methods, Fall 2016

Variables Example

Obs

Zip

Age

1

3

5

7

9

3

4

1

3

1

21

7

51

8

51

Bed Bath

3

1

3

3

2

2

1

1

2

1

Size

Lot

Exter

garage

fp

Price

951

676

1186

1368

1176

64904

54450

10857

.

6259

Other

Other

Other

Frame

Frame

0

2

1

0

1

0

0

0

0

1

30000

46500

51500

56990

65500

Distributions

Definition 1.10 A frequency distribution is a listing of frequencies

of all categories of the observed values of variable.

Definition 1.11 A relative frequency distribution consists of the

relative frequencies, or proportions (percentages), of observations

belong to each category.

Definition A cumulative frequency distribution gives the frequency

of observed values less than or equal to the upper limit of that class

interval.

Definition A cumulative percent gives the relative frequency of

observed values less than or equal to the upper limit of that class

interval.

Chapter 1, MA5701 Statistical Methods, Fall 2016

bed

bed

Frequency

Percent

1

2

3

4

5

1

3

46

16

3

1.45

4.35

66.67

23.19

4.35

Cumulative Cumulative

Frequency

Percent

1

1.45

4

5.80

50

72.46

66

95.65

69

100

price

Frequency

Percent

[ 0, 50k)

[ 50k, 100k)

[100k, 150k)

[150k, 200k)

[200k, 250k)

[250k, 300k)

[300k, 350k)

[350k, 400k)

4

22

23

10

2

1

4

3

5.80

31.88

33.33

14.49

2.90

1.45

5.80

4.35

Cumulative Cumulative

Frequency

Percent

4

5.80

26

37.68

49

71.01

59

85.51

61

88.41

62

89.86

66

95.65

69

100.00

10

11

12

Distributions Histogram

13

Components in a correctly constructed chart and graph

Captioned Correctly

Bars with equal width

Sizes of figures properly proportioned

With only Relevant information

constructed chart and graph with some adjustments

Some graphical and table representation of distributions contain too

much details and are useful for univariate analysis

Chapter 1, MA5701 Statistical Methods, Fall 2016

14

Graphical Representation

15

Descriptive Statistics

Exterior = Brick

Exterior = Frame

Chapter 1, MA5701 Statistical Methods, Fall 2016

16

Definition 1.12 The mean is the sum of all the observed values divided by

the number of values ( y ( y ) / n )

Definition 1.13 The median of a set of observed values is defined to be

the middle value when the measurement are arranged from lowest to the

highest. ( y y y )

Definition 1.18 The pth percentile is defined to be that value for which at

most (p)% of the measurement are less and at most (100-p)% of the

measurement are greater.

Definition Quartiles, 25%, 50%, 75% percentile (Location)

Definition Range (Dispersion) and Midrange (location) are the difference

and mean of the smallest and largest observed values, respectively

Definition Mode is the most frequently observed value

n

i 1

(1)

(2)

(n)

17

Definition Range and Midrange are the difference and mean of the

smallest and largest observed values, respectively.

Definition 1.14 The sample variance, denoted by s2 is defined by

s

SS

n

i 1

( yi y ) 2

n 1

new distance =

i 1 ( yi y ) 2

n

s 2 mean square =

n

i 1

n

i 1

| yi y |

yi2 ( i 1 yi ) 2 / n

n

SS

df

defined to be the positive square root of the variance.

Definition 1.17 The coefficient of variation (CV) is the ratio of the

standard deviation to the mean, expressed in percentage terms.

Definition 1.19 The interquartile range is the length of the interval

between the 25th and 75th percentiles (Dispersion).

Chapter 1, MA5701 Statistical Methods, Fall 2016

18

Usefulness of the Mean and Standard Deviation

Interval (mean +- 1*SD) contains approximately 68% of observations

Interval (mean +- 2*SD) contains approximately 95% of observations

Interval (mean +- 3*SD) contains virtually all of the observations

Change of Scale

Linear transformation (from the change of unit)

Non-Linear transformation (squared transformation, log transformation)

What will be changed?

Mean

Variance and Standard Deviation

CV

19

Box plot show distribution shapes and detect unusual observations

20

Schematic Box-and-Whiskers Plot

21

Box plot show distribution shapes and detect unusual observations

22

Contingency Table (Frequency Table from PROC Freq)

Table of exter by zip

exter(exter)

zip(zip)

1

brick

frame

other

Total

Total

10

30

48

5.80

14.49

5.80

43.48

69.57

8.33

20.83

8.33

62.50

66.67

76.92

25.00

88.24

1.45

1.45

7.25

1.45

11.59

12.50

12.50

62.50

12.50

16.67

7.69

31.25

2.94

13

1.45

2.90

10.14

4.35

18.84

7.69

15.38

53.85

23.08

16.67

15.38

43.75

8.82

13

16

34

69

8.70

18.84

23.19

49.28

100.00

23

Block chart

24

Calculate numerical

descriptions for

each group

Box plot can be used

25

26

Statistical Inference

Definition The population is the values of one or more variables for

the entire collection of units relevant to a particular study

Population Parameters mean and variance, unknown, must be

estimated from samples

Estimates the descriptive measures from samples, can reflect the

population parameters, different from different sets of samples, how

good an estimate is measured by sampling error

Statistics In this book, it is considered as the same as the estimate.

In statistical theory, it refers to the function of a sample, which is

either a random variable or random vector

Chapter 1, MA5701 Statistical Methods, Fall 2016

27

Data Collection

Goal make statements about population according to samples

Radom sampling or some more advanced probability sampling is the

appropriate way to collect data. In this book, we assume all samples

are from simple random sampling

Definition The simple random sampling is a sampling scheme that

each possible sample of the specified size has an equal chance of

occurring

Random sampling can be difficult to implement in practice

Convenience samples are dangerous (Be careful)

Sample size and power calculation

Chapter 1, MA5701 Statistical Methods, Fall 2016

28

Chapter Summary

Variable nominal, ordinal, interval, ratio, discrete,

continuous

Table Frequency table, contingency table

Numerical measurement mean, variance, standard

deviation, largest, smallest, median, range, midrange,

percentile, quartile

Graphic histogram, bar chart, pie chart, block chart,

scatter plot

Chapter 1, MA5701 Statistical Methods, Fall 2016

29

Writing Report

Use appropriate tables and figures to summarize the

data set you have

Do not directly copy tables or output from SAS output,

do some edits (e.g., add descriptions, appropriate row

and/or column names, effective digit)

Produce appropriate figures (see previous slides)

Discrete variables report frequency and percentage

Continuous variables mean and standard deviation

Chapter 1, MA5701 Statistical Methods, Fall 2016

30

Summarize data: the purpose of study, how data was

collected, sample size, the number of variables,

missing data

Detailed description: summary statistics in tables

instead of figures in general, univariate instead of

multivariate, may organize tables according to one

important group variable

Multivariate analysis tables or figures

Chapter 1, MA5701 Statistical Methods, Fall 2016

31

- Introduction to BiostatisticsUploaded byErl D. Melitante
- Modul StatistikUploaded byAnNa Nurjannah Anwar
- Introduction to StatisticsUploaded bynat.g
- Likert Scale [Muhammad Amirrul Fahmi Bin Mukhtar]Uploaded byFahmi_mukhtar
- Evaluating Content Uniformity NJPhAST Sep 22 2011Uploaded bymarrimanu23
- qantitative analyaisUploaded byGaurav
- Introduction to IBM SPSS Statistics 22Uploaded byMarcela Avitia
- Median for Grouped DataUploaded byIchi Ni San
- Managing Variability ExampleUploaded byImperoCo LLC
- Model de regresie - teorie Orzan.pdfUploaded byElisa Barbu
- Best research paper editing, proofreading, data analysis services in Delhi IndiaUploaded byDhimanInfotech Publications
- Research ProblemUploaded bysocial2003
- Ffa AceiteUploaded byKatty Esquivel Peralta
- The Nature of Probability and StatisticsUploaded byCynthia Villasor Plaza
- Lies, Damned Lies, Statistics and PunditsUploaded bysbys
- Business StatisticsUploaded byHillary Grace Verona
- Stat Quiz Abm172Uploaded byJan Ebenezer Moriones
- RM Past YearUploaded byTihalizan Elf
- Metodologi penelitian -Measurement 1Uploaded byyudi
- New Microsoft Word DocumentUploaded byVaibhav Jain
- LA8 HandoutUploaded byAviral Sinha
- Proc Report a Better Way to Display DataUploaded byprakash
- Introduction to Statistics and Data Presentation.pptxUploaded byRobert Troy Romanillos Villaflor
- Round Robin Assessment of the Single Fiber Fragmentation Test - RichUploaded bydeathjester1
- Pertm. 3 IDFUploaded byN Dewi
- at2 bthulborn copy 2Uploaded byapi-356152737
- Comparison of the tree hydraulic property.pdfUploaded bysil_franciley
- 28.IJASRJUN201728Uploaded byTJPRC Publications
- L07 - Editing and Reconciliation (ENG)Uploaded bykillerkiss
- v17n1p021Uploaded bySarah Bibi

- Elias Gygi Nat Methods 2007Uploaded bySuruchi Rao
- Effect Size and AssociationUploaded byHanifa Insani Kamal
- Discrete Probabilities4-Discrete Probabilities - StudentUploaded bySherry_Salvato_130
- ASTM C 1437--Flow of MortarsUploaded byWan T Trianto
- Career 1Uploaded byraivir
- E1601-12Uploaded byIkhwanHafiz
- 1 Statistics DataUploaded byaslam844
- Descriptive StatisticsUploaded byNurul Intan Fairuz
- EUROLAB Cook Book – Doc No 4.2Uploaded byNabu Tada
- ES9-Chap06Uploaded bynikowawa
- AgricolaeUploaded byIsmael Neu
- Longitudinal Evaluation of Dental ArchesUploaded bysauriua
- Badr Ashour ImpactUploaded byJapoy
- STATA Confidence -IntervalsUploaded bysmriti
- 810007Uploaded bymansi
- 2007 Understanding Power and Rule of Thumb for Determining Sample SizeUploaded bySeth Kwei
- IJSR Paper M.Phil (1).pdfUploaded bypodamakri
- Current State of Six Sigma in Service Organizations.PDFUploaded byIvan Abram Pasaribu
- 8. Confidence Interval Estimation.pdfUploaded bysrutadipta
- Beta CalculationUploaded byAnn Karenin
- Managerial Accounting_Lemessa.pptxUploaded byanteneh tesfaw
- Week One Exercise 16Uploaded byTheMomentousGamer
- 1.4 Mohamed KhammarniaUploaded byallan.manaloto23
- Emergency Management of Acute Apical Abscess in the Permanent Dentition[1]Uploaded bysaji_4
- Practice FinalUploaded byterrygoh6972
- ASTM C29.729477-1Uploaded byChristopher Martinez
- Assignment 1111 1.docUploaded bymitroivlad
- Minum IndexUploaded byBoombasticShaggy
- ASTM-D1777-96R02.pdfUploaded byGerman Ocampo
- 05 ModelUploaded byVikas Bhardwaj