You are on page 1of 7

DataExplorationMiniProject

Investigationintotheaverageageofatsomeonesfirstmarriage,separatedbycountry,
andhowscalingthedatadifferentlycanaffectmeasuresofcenterandspread.Visual
representationsincludehistograms,boxplots,andstemandleafplots.

EmilyHeubaum
DataExplorationMiniProject1
KikerStatsPd3

AverageAgeattheTimeofSomeonesFirstMarriage

ThevaluesIchosetoworkwithrepresenttheaverageageofapersonwhentheyenter
theirfirstmarriage,separatedbycountry.IchosethisdatabecauseIknewtherewouldbedata
forit,asitcancharacterizeacountryspeople.Iwantedtolookatglobaldata,butIdidntwant
toamorbidvariablelikechildmortalityrate.
Thepopulationinquestionisallmarriedpeople.Unmarriedpeoplearenotconsidered
partofthemarriedpopulation.Theunitsusedarewholeyears.Thisspecificunitwasusedto
standardizethedecimalplacethatwasused,inthiscasenone.Mydatawascollectedfromthe
Wikipediaentryonaverageageatfirstmarriagebycountry(Age).
Thesamplesizewas30countries.AllcalculationsweredoneusingthecorrespondingR
functions,andwillbepresentedattheend.Themeanagewas29.13yearsold.Theminimumage
was23,whichbelongedtoChina.Themedianagewas29.5,whilethemaximumagewas34,
whichbelongedbothtoDenmarkandIceland.Thefirstquartilewas27yearsold,andthethird
quartilewas32yearsold.Theinterquartilerangeforthisdatadistributionwas5years.The
overallrangeofageswas11years.Thestandarddeviationforthedistributionoftheseageswas
3.15years,sothevarianceis9.91years.
Therearenooutliersoneitherendofthedistributionwhenusingthe1.5IQRmethod.The
bottomcutoffforoutlierswas19.5,buttheminimumvaluewas23.Similarly,theuppercutoff
was39.5,whilethemaximumvaluewas34.
Ahistogram,aboxplot,andastemplotareshownbelow.

Thenextcalculationswillworkwiththesamedataset,butwith100addedtoeachvalue.
Again,allcalculationsdoneinR,andcodewillbeshownattheend.Thesamplesizeisstill30
countries.Thenewminimumis123yearsold,whilethenewmaximumis134yearsold.The
newfirstquartileis127yearsold,whilethenewthirdquartileis132yearsold.Therangeisstill
11years,andthevarianceisstill9.91years.
Thenewmeanis129.133,whichisthesameasthepreviousmeanwith100addedtoit.
Thenewmedianis129.5,whichagain,isthesameastheoriginalmedianwith100addedtoit.
Thestandarddeviationforthisset,however,isexactlythesameastheoriginalstandard
deviation.
TheIQRisstill5.Thelowercutoffforoutliersfromthelowerquartileis119.5,whichis
belowourminimumvalue.Theuppercutoffforoutliersfromtheupperquartileis139.5,which
isabovethemaximumvalue,sotherearenooutliers.Theoveralleffectisthatthedatasetwas
shiftedover,sotherelationshipsbetweenthecasesstayedsimilarifnotthesame.Graphsforthis
newdatasetarebelow.

Thesenextcalculationswillbedonewiththeoriginaldataset,witheachvalueincreased
by50%.AllcalculationswerecompletedusingRfunctions.Thesamplesizeisstill30countries.
Thenewmeanis47.3,similartotheoriginal,butincreasedby50%.Similarly,themedianis
44.25,whichistheoriginalmeanincreasedby50%.Thestandarddeviationisnotthesameas
theoriginalasitwasinthepreviouscalculations.Thestandarddeviationforthisdatasetis4.72,
whichistheoriginalstandarddeviationincreasedby50%.
Thenewminimumis34.5yearsold,andthenewmaximumis51yearsold.Thefirst
quartileis40.5yearsold,whilethethirdquartileis48yearsold,makingtheinterquartilerange
7.5,whichisa50%increasefromtheoriginalIQR.Thenewrangeis16.5,whichisa50%
increasefromtheoriginalrange.Thenewvarianceis22.3,whichisa125%increasefromthe
original,whichislogicalbecausethevarianceisthestandarddeviationsquared,sothefactorby
whichthestandarddeviationincreasesshouldbesquaredforthevariance.

TheIQRis7.5,sothelowercutoffforoutliersfromthelowerquartileis29.25,whichis
farbelowtheminimumforthisset.Theuppercutoffforoutliersis59.25,wellaboveour
maximumvalueforthisset,sotherearenooutliers.Thehistogram,boxplot,andstemplotare
shownbelow.

Fortheselastcalculations,wewillassumetheoriginaldatasetwasnormallydistributed,
withallcalculationsdoneinR.Thepercentageofdatathatis5unitsabovethemeanisactually
0%.Thisisbecause5unitsabovethemeanis34.13,whichisabovethemaximumvalue.The
percentageofdatathatis3unitsbelowthemeanand2unitsabove,sobetween26.13and31.13
is56.8%ofthedata.Thetop10%ofthedatacorrespondstovaluesequaltoorgreaterthan33.2
yearsold.Icalculatedthisbyfindingthevaluethatwasthe90%percentile(sothelowestofthe
top10%)usingaZTable.IusedtheZScoreequationtofindthatvaluesdeviation,andadded
thedeviationtothemeantogetthelowesttop10%value.
Inconclusion,addingaconstanttodatadoesntchangetherelationshipsbetweenthe
values,itjustshiftsthemup,addsthatconstanttosomeofthemeasuresofcenter,likemedian
andmean.Measuresofdistribution,likerangeandstandarddeviation,dontchange.For
multiplying,likeincreasingby50%,changesthemeasuresofcenterbythesamemultiplying
factor.Thedifferenceis,measuresofvariationchangetoo,alsobythemultiplyingfactor.
Addingaconstantshiftsthewholedatasetup,whilemultiplyingbyaconstantchangesthe
relationshipbetweenthevalues,butbyapredictablefactor.

Source:

"AgeatFirstMarriage."
Wikipedia
.WikimediaFoundation,n.d.Web.22Sept.2015.
<
https://en.wikipedia.org/wiki/Age_at_first_marriage
>.

OriginalData:

+100Data:

>ages<c(Marriage$Age.First.Marriage)

>#ADD100

>#SUMMARY

>plus<ages+100

>fivenum(ages)

>#SUMMARY

[1]23.027.029.532.034.0

>fivenum(plus)

>#MEAN

[1]123.0127.0129.5132.0134.0

>mean(ages)

>#MEAN

[1]29.13333

>mean(plus)

>#MEDIAN

[1]129.1333

>median(ages)

>#MEDIAN

[1]29.5

>median(plus)

>#RANGE

[1]129.5

>range(ages)

>#RANGE

[1]2334

>range(plus)

>3423

[1]123134

[1]11

>#STANDARDDEVIATION

>#STANDARDDEVIATION

>sd(plus)

>sd(ages)

[1]3.148435

[1]3.148435

>#VARIANCE

>#VARIANCE

>(sd(plus))^2

>(sd(ages))^2

[1]9.912644

[1]9.912644

>#IQR

>#IQR

>132127

>32.027.0

[1]5

[1]5

>#OUTLIERS

>#OUTLIERS

>#lower#Minimum1.5IQR

>#lower#Minimum1.5IQR

>127(1.5*5)

>27(1.5*5)

[1]119.5

[1]19.5

>#upper#Maximum+1.5IQR

>#upper#Maximum+1.5IQR

>132.0+(1.5*5)

>32+(1.5*5)

[1]139.5

[1]39.5

>#HISTOGRAM

>#HISTOGRAM

>hist(plus,main="AverageAgeatFirstMarriagebyCountry+

>hist(ages,main="AverageAgeatFirstMarriagebyCountry",

100",xlab="AverageAge+100")

xlab="AverageAge")

>#BOXPLOT

>#BOXPLOT

>boxplot(plus,main="AverageAgeatFirstMarriagebyCountry

>boxplot(ages,main="AverageAgeatFirstMarriagebyCountry",

+100",xlab="AverageAge+100",horizontal=TRUE)

xlab="AverageAge",horizontal=TRUE)

>#STEMPLOT

>#STEMPLOT

>stem(plus,scale=2)

>stem(ages,scale=2)


50%IncreaseData:

NormalDistribution:

>#INCREASE

>#FIVEUNITSABOVE

>times<ages*1.5

>1pnorm(mean(ages)+5)

>#SUMMARY

[1]0

>fivenum(times)

>#BETWEEN3BELOWAND2ABOVE

[1]34.5040.5044.2548.0051.00

>(3/sd(ages))

>#MEAN

[1]0.9528543

>mean(times)

>#LowerZSCOREis.95,correspondingproportionis(1.829)

[1]43.7

belowthatvalue

>#MEDIAN

>(2/sd(ages))

>median(times)

[1]0.6352362

[1]44.25

>#UpperzSCOREis.64,correspondingproportionis.739below

>#RANGE

thatvalue

>range(times)

>#Proportionbetween3and2frommean

[1]34.551.0

>.739(1.829)

>5134.5

[1]0.568

[1]16.5

>#UNITSFORTOP10%

>#STANDARDDEVIATION

>#1.28=zSCORE

>sd(times)

>#(ZScore*SD)=Deviation

[1]4.722653

>#Addmean=unitfortop10%

>#VARIANCE

>(1.28*sd(ages))+mean(ages)

>(sd(times))^2

[1]33.16333

[1]22.30345

>which(Marriage$Age.First.Marriage>33.2)

>#IQR

[1]1725

>4840.5

[1]7.5
>#OUTLIERS
>#lower#Minimum1.5IQR
>40.5(1.5*7.5)
[1]29.25
>#upper#Maximum+1.5IQR
>48+(1.5*7.5)
[1]59.25
>#HISTOGRAM
>hist(times,main="Increasedby50%AverageAgeatFirst
Marriage",xlab="AverageAgeincreasedby50%")
>#BOXPLOT
>boxplot(times,main="Increasedby50%AverageAgeatFirst
Marriage",xlab="AverageAgeincreasedby50%",horizontal=
TRUE)
>#STEMPLOT
>stem(times,scale=1)

You might also like