35 views

Uploaded by Anonymous d70CGo

- Chapter_0_Ver_1
- Chapter 1 Quiz
- IE198 drills
- Stats Ch3 Tb
- Introductory Statistics
- CH2
- Duflo Dupas Kremer 2008
- Measures of Central Tendency Chap 3
- Data Analysis in Social Sciences
- Common Mistakes Analysts Make During Analysis and How to Avoid Them
- Bon
- Exemplar Probability
- Sds Analisis
- HASIL DWI
- S1 Chapter 2.pdf
- Thurstone Scale
- GUESTS’_PERCEPTIONS
- Standard Deviation and Variance
- Stat a Tutorial
- 11bstatisticprojectlessonplan

You are on page 1of 7

1. Initial steps

1. Read in the data

2. What variables are in the file?

2. Measures of central tendency

1. What to use when?

2. Sum and mean

3. Median

4. Mode

5. Trimmed mean to remove influence of outliers

3. Measures of variability

1. Range

2. Quartiles and IQR

3. Variance and sd

4. Mean absolute deviation, median absolute deviation

4. Measures of shape

1. Skewness

2. Kurtosis

5. Summary of a variable

6. Describing a data frame

1. Descriptive statistics separately for each group

2. Summarizing an entire dataframe

7. Standard scores (z)

Initial Steps

Read in the data

use the load() function

read.table or read.csv

> setwd("~/Documents/statistics/probability_and_statistics_with_R/navarro_datasets")

> load("aflsmall.Rdata")

Two ways:

use head()

load lsr package and use who() function

> library(lsr)

> who()

-- Name --- Class -- -- Size -afl.finalists factor

400

afl.margins

numeric

176

x

integer

1

What to use when

Measure

Data type

Mean

Ratio, Interval

Median

Mode

> sum(afl.margins)

[1] 6213

> sum(afl.margins[1:5])

[1] 183

> sum(afl.margins[1:5]) / 5

[1] 36.6

> mean(x = afl.margins)

[1] 35.30114

# mean of a subset of data

# x is the argument passed to mean()

Median

Usage:

ordinal data

ratio data

interval data

For median, first sort:

> sort(x = afl.margins)

[1] 0 0 1 1 1 1 2 2 3 3 3 3 3 3 3 3 4 4 5

[20] 6 7 7 8 8 8 8 8 9 9 9 9 9 9 10 10 10 10 10

...

> median(x = afl.margins)

[1] 30.5

Mode

Who has played the most finals?

> print(afl.finalists)

[1] Hawthorn

Melbourne

[5] Hawthorn

Carlton

...

> table(afl.finalists)

Carlton

Melbourne

Melbourne

Carlton

afl.finalists

Adelaide

26

Essendon

32

Hawthorn

27

Richmond

6

Western Bulldogs

24

Brisbane

Carlton

Collingwood

25

26

28

Fitzroy

Fremantle

Geelong

0

6

39

Melbourne North Melbourne Port Adelaide

28

28

17

St Kilda

Sydney

West Coast

24

26

38

> modeOf(x = afl.finalists)

[1] "Geelong"

> maxFreq(x=afl.finalists)

[1] 39

> dataset <- c(-15,2,3,4,5,6,7,8,9,12)

> mean(x=dataset)

[1] 4.1

> median(x=dataset)

[1] 5.5

> mean(x=dataset, trim=0.1)

# trim by 10% - one value on either side

[1] 5.5

# trimmed mean is same as median

> mean(x=afl.margins, trim=0.05)

[1] 33.75

Measures of variability

> range(afl.margins)

[1] 0 116

> quantile(x = afl.margins, probs = c(0.25, 0.75)) # gives 25th and 75th percentile

25% 75%

12.75 50.50

> IQR(x = afl.margins)

# tells where the middle half of data sits

[1] 37.75

> var(afl.margins)

[1] 679.8345

> sd(afl.margins)

[1] 26.07364

> mean(abs(afl.margins mean(afl.margins)))

# mean absolute deviation

[1] 21.10124

> mad(afl.margins)

# median absolute deviation

[1] 28.9107

Measures of shape

Skewness (measure of asymmetry) and kurtosis:

> library(psych)

> skew(x=afl.margins)

[1] 0.7671555

> kurtosi(x=afl.margins)

[1] 0.02962633

# note the spelling!

Summary of a variable

> summary(object = afl.margins)

# argument is numeric

Min. 1st Qu. Median Mean 3rd Qu. Max.

0.00 12.75 30.50 35.30 50.50 116.00

> summary(object = afl.finalists)

# argument is a factor

Adelaide

Brisbane

Carlton

Collingwood

26

25

26

28

Essendon

Fitzroy

Fremantle

Geelong

32

0

6

39

Hawthorn

Melbourne North Melbourne Port Adelaide

27

28

28

17

Richmond

St Kilda

Sydney

West Coast

6

24

26

38

Western Bulldogs

24

> f2 <- as.character(afl.finalists)

> summary(object = f2)

Length

Class

Mode

400 character character

var n mean sd median trimmed mad min max range skew kurtosis se

1 1 176 35.3 26.07 30.5 32.82 28.91 0 116 116 0.77

0.03 1.97

e.g. how many blowouts were there?

Blowout = a game in which the winning margin exceeds 50 points.

> blowouts <- afl.margins > 50

> blowouts

[1] TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE

[14] FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE

Mode FALSE TRUE NA's

logical

132

44

0

Describing a dataframe

> load("clinicaltrial.Rdata")

> who(TRUE)

-- Name --- Class -- -- Size -clin.trial

data.frame 18 x 3

$drug

factor

18

$therapy

factor

18

$mood.gain

numeric

18

Three functions:

by()

describeBy()

aggregate()

The describeBy() has argument group, which specifies the grouping variable.

The following gives statistics broken down by therapy type.

group: no.therapy

var n mean sd median trimmed mad min max range skew kurtosis se

drug*

1 9 2.00 0.87 2.0 2.00 1.48 1.0 3.0 2.0 0.00 -1.81 0.29

therapy* 2 9 1.00 0.00 1.0 1.00 0.00 1.0 1.0 0.0 NaN

NaN 0.00

mood.gain 3 9 0.72 0.59 0.5 0.72 0.44 0.1 1.7 1.6 0.51 -1.59 0.20

--------------------------------------------------------------group: CBT

var n mean sd median trimmed mad min max range skew kurtosis se

drug*

1 9 2.00 0.87 2.0 2.00 1.48 1.0 3.0 2.0 0.00 -1.81 0.29

therapy* 2 9 2.00 0.00 2.0 2.00 0.00 2.0 2.0 0.0 NaN

NaN 0.00

mood.gain 3 9 1.04 0.45 1.1 1.04 0.44 0.3 1.8 1.5 -0.03 -1.12 0.15

The by() function has argument FUN, which specifies the name of the function you want to apply

separately to each group.

clin.trial$therapy: no.therapy

var n mean sd median trimmed mad min max range skew kurtosis se

drug*

1 9 2.00 0.87 2.0 2.00 1.48 1.0 3.0 2.0 0.00 -1.81 0.29

therapy* 2 9 1.00 0.00 1.0 1.00 0.00 1.0 1.0 0.0 NaN

NaN 0.00

mood.gain 3 9 0.72 0.59 0.5 0.72 0.44 0.1 1.7 1.6 0.51 -1.59 0.20

--------------------------------------------------------------clin.trial$therapy: CBT

var n mean sd median trimmed mad min max range skew kurtosis se

drug*

1 9 2.00 0.87 2.0 2.00 1.48 1.0 3.0 2.0 0.00 -1.81 0.29

therapy* 2 9 2.00 0.00 2.0 2.00 0.00 2.0 2.0 0.0 NaN

NaN 0.00

mood.gain 3 9 1.04 0.45 1.1 1.04 0.44 0.3 1.8 1.5 -0.03 -1.12 0.15

> by(data = clin.trial, INDICES = clin.trial$therapy, FUN = summary)

clin.trial$therapy: no.therapy

drug

therapy mood.gain

placebo :3 no.therapy:9 Min. :0.1000

anxifree:3 CBT

:0 1st Qu.:0.3000

joyzepam:3

Median :0.5000

Mean :0.7222

3rd Qu.:1.3000

Max. :1.7000

--------------------------------------------------------------clin.trial$therapy: CBT

drug

therapy mood.gain

placebo :3 no.therapy:0 Min. :0.300

anxifree:3 CBT

:9 1st Qu.:0.800

joyzepam:3

Median :1.100

Mean :1.044

3rd Qu.:1.300

Max. :1.800

e.g. Look at average mood gain for all possible combinations of drug and therapy.

> aggregate(formula=mood.gain ~ drug + therapy, data = clin.trial, FUN = mean)

drug therapy mood.gain

1 placebo no.therapy 0.300000

2 anxifree no.therapy 0.400000

3 joyzepam no.therapy 1.466667

4 placebo

CBT 0.600000

5 anxifree

CBT 1.033333

6 joyzepam

CBT 1.500000

> aggregate(formula=mood.gain ~ drug + therapy, data = clin.trial, FUN = sd)

drug therapy mood.gain

1 placebo no.therapy 0.2000000

2 anxifree no.therapy 0.2000000

3 joyzepam no.therapy 0.2081666

4 placebo

CBT 0.3000000

5 anxifree

CBT 0.2081666

6 joyzepam

CBT 0.2645751

> summary(clin.trial)

drug

therapy mood.gain

placebo :6 no.therapy:9 Min. :0.1000

anxifree:6 CBT

:9 1st Qu.:0.4250

joyzepam:6

Median :0.8500

Mean :0.8833

3rd Qu.:1.3000

Max. :1.8000

> describe(x=clin.trial)

# load psych package first

var n mean sd median trimmed mad min max range skew kurtosis se

drug*

1 18 2.00 0.84 2.00 2.00 1.48 1.0 3.0 2.0 0.00 -1.66 0.20

therapy* 2 18 1.50 0.51 1.50 1.50 0.74 1.0 2.0 1.0 0.00 -2.11 0.12

mood.gain 3 18 0.88 0.53 0.85 0.88 0.67 0.1 1.8 1.7 0.13 -1.44 0.13

> x <- c(3,10,8,4,9,11,6)

> mean(x)

[1] 7.285714

> sd(x)

[1] 3.039424

> z <- ((10 - mean(x)) / sd(x))

>z

[1] 0.8930265

> pnorm(0.8930625)

[1] 0.8140881

Interpretation:

z = 0.8930. The individual score is 0.89 sd above the mean.

pnorm value: If 10 had been a score for laziness, then that individual is lazier than 81.4% of the

people sampled.

> partial.data <- c(10, 20, NA, 30)

> mean(x = partial.data)

[1] NA

> mean(x = partial.data, na.rm = TRUE)

[1] 20

- Chapter_0_Ver_1Uploaded byGLoOmItO
- Chapter 1 QuizUploaded bybuilugaw
- IE198 drillsUploaded byIDoBite
- Stats Ch3 TbUploaded byRamiro Diaz
- Introductory StatisticsUploaded byshagakane
- CH2Uploaded byTechKMH
- Duflo Dupas Kremer 2008Uploaded byNazuk Iftikhar Rao
- Measures of Central Tendency Chap 3Uploaded byhyarojasguli
- Data Analysis in Social SciencesUploaded byNoorunnisha
- Common Mistakes Analysts Make During Analysis and How to Avoid ThemUploaded bygyintern
- BonUploaded byUna Bonaventura
- Exemplar ProbabilityUploaded byn
- Sds AnalisisUploaded byHuey Ling Ng
- HASIL DWIUploaded bydaisuke
- S1 Chapter 2.pdfUploaded byShariar Syed
- Thurstone ScaleUploaded byHarsha Katariya
- GUESTS’_PERCEPTIONSUploaded byFadzielah Arsyad
- Standard Deviation and VarianceUploaded byPriyanka Puri
- Stat a TutorialUploaded byjellybean12345
- 11bstatisticprojectlessonplanUploaded byapi-354383739
- Lind5ce Ism Ch06 Final rUploaded bymetal
- Microsoft Excel FunctionsUploaded byVojes Lorenz
- Task 1 FinalUploaded byMJ santos
- Declining Hotel Industry in PakistanUploaded bysheikhabdullah
- Lab HandoutUploaded bySakawdin Mohamed
- MBA AU AssignmentUploaded byMak Lika
- PPD214_Final_2012_12_05_12_00Uploaded byandrew_reker
- Assignment StatisticsUploaded byUyên Nguyễn
- Mba StatUploaded byKalpesh Patel
- Statistika zavrsniUploaded byValentina Holjevac

- Algebra ElementaryUploaded byAnonymous d70CGo
- L8 Py DataStructures DictionariesUploaded byAnonymous d70CGo
- mbh-01Uploaded bymvenkat3272
- r_basicsUploaded byAnonymous d70CGo
- R GraphicsUploaded byAnonymous d70CGo
- R.installation.guideUploaded byAnonymous d70CGo
- Java AlgorithmUploaded byAnonymous d70CGo
- Set TheoryUploaded byAnonymous d70CGo
- Command Line BasicsUploaded byAnonymous d70CGo
- 99999990235888 - 1318 SANSKRIT VYAKARAN KAUMUDI (1929), Khanna, Shiva Prasad, 614p, Sanskrit Vyakaran, sanskrit (1929).pdfUploaded byAnonymous d70CGo
- SubroutinesUploaded byAnonymous d70CGo
- L7 Py DataStructures ListsUploaded byAnonymous d70CGo
- r_basicsUploaded byAnonymous d70CGo
- Learning JavaUploaded byMahesh

- IBM SPSS Statistics BaseUploaded byEdmundo Caetano
- How the Shapes of the Binomial Probability Distribution Vary With the Different Values of ProbabilityUploaded byIno Gal
- Miranda Crusco WP CLS 20137 FINAL.pdfUploaded byMinagaFathmaSonnaya
- Std Deviation QuizUploaded byabhisek
- EducationUploaded bySamrat Chowdhury
- 05 German Tank Problem - Historical NotesUploaded by9tikg
- Chapter 2 SolutionsUploaded byokayigiveup
- Statistics and Probability- Long Test 4th QuarterUploaded byEly Santias
- Measures of Central TendencyUploaded byHimraj Bachoo
- January 2011 QP - S1 EdexcelUploaded byWambui Kahende
- 12 How to analyse rainfall dataUploaded bySagar Jss
- Review of Probability and StatisticsUploaded byAlican Bodur
- Sharpe Ration Performance Measures.pdfUploaded byTodwe Na Murrada
- Binomial DistributionsUploaded byilyasbabar55
- The Median Voter Rule and the Theory of Political Choice 1976 Journal of Public EconomicsUploaded byDaiane Deponti Bolzan
- QMM_4 (1).pptxUploaded bySandeep Vijayakumar
- Temperament AutismUploaded byAnonymous TLQn9SoRRb
- Portfolio AnalysisUploaded byVarun Tomar
- Supplement a- PsychSim 5.0Uploaded byMonica
- Effect of Stock SplitUploaded bymu_t_r
- Short-Sale Constraint and Return Asymmetries in Taiwan Stock MarketUploaded byAmber Fengbo Xi
- Arithmatic FormulaUploaded bySelva Kumar Krishnan
- 002 Course Notes Descriptive StatisticsUploaded byFederico Caruso
- Financial Engineering AssignmentUploaded byWilliam Masterson Shah
- Pertemuan 1 Statistika Ekonomi Dan BisnisUploaded bykinantipc
- SourceUploaded bysathravgupta
- Dispersion (Measures of Variability)Uploaded by9278239119
- Lecture Stat GSUploaded byjhesika
- course outline172 s7Uploaded byapi-225089949
- Leapfrog Filer CircuitUploaded bymailmado