You are on page 1of 25

Team 11--- Column 11

Lead : Akshaya (19021141011)


Members : Guntaka Karteek Reddy (19021141043)
Snigdha (19021141051
Rohit Karna (190211054)
Riya Mathew (190211088)

Qn:

Text report of the document sent yesterday Bar chart, Frequency count not less than  for 9th
column and observation from the word cloud.

Data set 1: Dimining


Dimining is a data set designed to understand the people perception about the present
situation on Covid 19 in India. The analysis is aimed to understand the hidden patterns in
the Data set Dimining to get a meaningful conclusion.
The questions asked 1) Covid 19 present situation
2) Covid 19 effect on school children
3) Covid 19 treatments that you have confronted (Our
Analysis Column)
4) Covid 19 Social distance
The data set contains 4 questions for which we got 99 responses. As per the assignment we
are working on column 11 “ Covid 19 treatments that you have confronted”.
Syntax:

dmining=Dmining[,9]

dmining

TextDoc <- Corpus(VectorSource(dmining))

TextDoc

# Build a term-document matrix

TextDoc_dtm <- TermDocumentMatrix(TextDoc)

TextDoc_dtm
dtm_m <- as.matrix(TextDoc_dtm)

dtm_m

# Sort by descearing value of frequency

dtm_v <- sort(rowSums(dtm_m),decreasing=TRUE)

dtm_v

dtm_d <- data.frame(word = names(dtm_v),freq=dtm_v)

dtm_d

# Display the top 5 most frequent words

head(dtm_d, 5)

head(dtm_d, 15)

# Plot the most frequent words

barplot(dtm_d[1:5,]$freq, las = 2, names.arg = dtm_d[1:5,]$word,

col ="lightgreen", main ="Top 5 most frequent words",

ylab = "Word frequencies")

#generate word cloud

set.seed(1234)

wordcloud(words = dtm_d$word, freq = dtm_d$freq, min.freq = 5,

max.words=100, random.order=FALSE, rot.per=0.40,

colors=brewer.pal(8, "Dark2"))

output

> detach("package:ggplot2", unload=TRUE)


> dmining=Dmining[,9]
> dmining
[1] ""
[2] "hygienic lifestyle"
[3] "zero"
[4] "Not many"
[5] "i seriously doubt treatment pattern"
[6] "Plasma, Drugs"
[7] "None"
[8] "wear mask and practice social distancing"
[9] "precautions "
[10] "masks, kadha, sanitizing"
[11] "None."
[12] "Plasma therapy"
[13] "None."
[14] "The testing should be swift and results should be quick "
[15] "Making your immunity strong."
[16] "I would not say treatments, but there were preventive measures i
followed"
[17] "none."
[18] "Not yet confronted any."
[19] "Nothing"
[20] "Nothing such. Places are in deficit of testing kits. Immunity
boosting is the only way out."
[21] "remdesivir helps in the easing the symptoms of COVID-19"
[22] "Not planned throughly"
[23] "Coronil, Paracetamol,"
[24] "No such thing from my side."
[25] "Not yet contracted the disease"
[26] "Hydroxychloroquine"
[27] "basic precautions"
[28] "Not very effective"
[29] "Hygenic and safety oriented"
[30] "less Number of quarantine facilities due to which patients have to
shift cities for access to healthcare."
[31] "not enough"
[32] "supplements of Immunity boosters "
[33] "Home quarantine"
[34] "Better to take home remedies for precaution."
[35] "i personally haven't confronted with any covid 19 treatment"
[36] "the treatment is done based on symptoms appeared and no specific
treatment."
[37] "Nil"
[38] "no specific treatment for coronavirus"
[39] "k"
[40] "Steam "
[41] "HCQ is a medicine which is used in the treatment of the Covid-19
patients in most of the hospitals."
[42] "Don't go out without mask ."
[43] "good enough till now but treatments are mostly patient specific
than generic due to no availability of vaccine"
[44] "Remdesivir helps ease symptoms"
[45] "Oxygen and plasma treatment"
[46] "precuations are better "
[47] "Some are effective but does not guarantee cure."
[48] ""
[49] "chaotic"
[50] "Progressive"
[51] "increasing immune system power"
[52] "Better to take home remedies for precaution."
[53] "Drinking warm water, washing hands and wearing mask"
[54] "no as such"
[55] "none"
[56] "Home quarantined for 14 days"
[57] "none"
[58] "Drinking hot water a lot"
[59] "no treatment yet"
[60] "Medicine charges are reasonable but not the bed and other facility
related charges."
[61] "there is challenges due this pandemic"
[62] "I donot remember."
[63] ""
[64] "none"
[65] "Not confronted any treatments directly, but aware that patients are
kept under observations while monitoring and administering medicines to
aid the healing process. "
[66] " Regular Immunity developing treatments."
[67] "I have worn a mask and used hand sanitizers as precaution and
maintained social distance, nothing beyond that. "
[68] "Might work and will soon available to everyone."
[69] "Wearing a mask more often"
[70] "Precautionary methods, sanitation, health suppliments"
[71] "Overburdened"
[72] "None "
[73] "immunity boosters"
[74] "Nothing much just to follow some precautionary steps like wearing a
mask and maintaining physical distance from people when outdoors."
[75] "Not applicable"
[76] "I have not yet confronted yet."
[77] "No specific treatment as such. People with mild symptoms are cured
with help of antibiotics and drinking lots of hot water. At the end of day
prevention is better than cure."
[78] "none"
[79] "-"
[80] "Not at all"
[81] "intake of vitamins and take flu and fever medicines while being in
isolation"
[82] "Home remedies"
[83] "Not any"
[84] "N.A"
[85] "Drinking hot water, Proper sanitation, Drinking Giloy Juice, Social
distancing"
[86] "Effectively carried out at many states but its difficult for the
hot spots. "
[87] "None"
[88] ""
[89] "Staying at home"
[90] "none yet"
[91] "Not till now."
[92] "none"
[93] ""
[94] "medicines"
[95] "Eat healthy food, maintain the hygiene and stay fit"
[96] " Covaxin(vaccine) developed by Bharat Bio tech is at clinical trail
stage"
[97] "Kadhas helps boost immunity"
[98] "no treatment conformed"
[99] "no specific treatment. it is done based on symptoms appeared"
[100] "Maintain Personal Hygiene"
[101] "None"
[102] ""
> TextDoc <- Corpus(VectorSource(dmining))
> TextDoc
<<SimpleCorpus>>
Metadata: corpus specific: 1, document level (indexed): 0
Content: documents: 102
> # Build a term-document matrix
> TextDoc_dtm <- TermDocumentMatrix(TextDoc
+ TextDoc_dtm
Error: unexpected symbol in:
"TextDoc_dtm <- TermDocumentMatrix(TextDoc
TextDoc_dtm"
> # Build a term-document matrix
> TextDoc_dtm <- TermDocumentMatrix(TextDoc)
> TextDoc_dtm
<<TermDocumentMatrix (terms: 265, documents: 102)>>
Non-/sparse entries: 436/26594
Sparsity : 98%
Maximal term length: 18
Weighting : term frequency (tf)
> dtm_m <- as.matrix(TextDoc_dtm)
> dtm_m
Docs
Terms 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
22 23 24 25 26 27 28 29 30 31 32 33 34 35
hygienic 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
lifestyle 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
zero 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
many 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
not 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0
1 0 0 1 0 0 1
0 0 1 0 0 0 0
doubt 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
pattern 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
seriously 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
treatment 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1
Docs
Terms 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62 63 64 65 66 67
hygienic 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
lifestyle 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
zero 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
many 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
not 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 1 0 0
doubt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
pattern 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
seriously 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
treatment 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0
Docs
Terms 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
86 87 88 89 90 91 92 93 94 95 96 97 98 99
hygienic 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
lifestyle 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
zero 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
many 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0
not 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0
doubt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
pattern 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
seriously 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
treatment 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0
Docs
Terms 100 101 102
hygienic 0 0 0
lifestyle 0 0 0
zero 0 0 0
many 0 0 0
not 0 0 0
doubt 0 0 0
pattern 0 0 0
seriously 0 0 0
treatment 0 0 0
[ reached getOption("max.print") -- omitted 256 rows ]
> # Sort by descearing value of frequency
> dtm_v <- sort(rowSums(dtm_m),decreasing=TRUE)
> dtm_v
and not the
none treatment are
16 15 14
10 9 7
mask immunity but
for home yet
6 6 6
6 6 5
symptoms specific drinking
confronted nothing better
5 5 5
4 4 4
hot social none.
helps due have
4 3 3
3 3 3
patients remedies take
any with wearing
3 3 3
3 3 3
medicines many distancing
precautions plasma should
3 2 2
2 2 2
testing there such. covid-
19 remdesivir from
2 2 2
2 2 2
such effective quarantine
which enough boosters
2 2 2
2 2 2
precaution. appeared based
done treatment. medicine
2 2 2
2 2 2
used out than
till treatments cure.
2 2 2
2 2 2
some water, while
precautionary sanitation, people
2 2 2
2 2 2
hygiene maintain hygienic
lifestyle zero doubt
2 2 1
1 1 1
pattern seriously drugs
plasma, practice wear
1 1 1
1 1 1
kadha, masks, sanitizing
therapy quick results
1 1 1
1 1 1
swift making strong.
your followed measures
1 1 1
1 1 1
preventive say treatments,
were would any.
1 1 1
1 1 1
boosting deficit kits.
only out. places
1 1 1
1 1 1
way easing planned
throughly coronil, paracetamol,
1 1 1
1 1 1
side. thing contracted
disease hydroxychloroquine basic
1 1 1
1 1 1
very hygenic oriented
safety access cities
1 1 1
1 1 1
facilities healthcare. less
number shift supplements
1 1 1
1 1 1
covid haven't personally
nil coronavirus steam
1 1 1
1 1 1
hcq hospitals. most
don't without availability
1 1 1
1 1 1
generic good mostly
now patient vaccine
1 1 1
1 1 1
ease oxygen precuations
does guarantee chaotic
1 1 1
1 1 1
progressive immune increasing
power system hands
1 1 1
1 1 1
warm washing days
quarantined lot water
1 1 1
1 1 1
bed charges charges.
facility other reasonable
1 1 1
1 1 1
related challenges pandemic
this donot remember.
1 1 1
1 1 1
administering aid aware
directly, healing kept
1 1 1
1 1 1
monitoring observations process.
that under developing
1 1 1
1 1 1
regular treatments. beyond
distance, hand maintained
1 1 1
1 1 1
precaution sanitizers that.
worn available everyone.
1 1 1
1 1 1
might soon will
work more often
1 1 1
1 1 1
health methods, suppliments
overburdened distance follow
1 1 1
1 1 1
just like maintaining
much outdoors. physical
1 1 1
1 1 1
steps when applicable
yet. antibiotics cured
1 1 1
1 1 1
day end help
lots mild prevention
1 1 1
1 1 1
water. all being
fever flu intake
1 1 1
1 1 1
isolation vitamins n.a
giloy juice, proper
1 1 1
1 1 1
carried difficult effectively
its spots. states
1 1 1
1 1 1
staying now. eat
fit food, healthy
1 1 1
1 1 1
stay bharat bio
clinical covaxin(vaccine) developed
1 1 1
1 1 1
stage tech trail
boost kadhas conformed
1 1 1
1 1 1
personal
1
> dtm_d <- data.frame(word = names(dtm_v),freq=dtm_v)
> dtm_d
word freq
and and 16
not not 15
the the 14
none none 10
treatment treatment 9
are are 7
mask mask 6
immunity immunity 6
but but 6
for for 6
home home 6
yet yet 5
symptoms symptoms 5
specific specific 5
drinking drinking 5
confronted confronted 4
nothing nothing 4
better better 4
hot hot 4
social social 3
none. none. 3
helps helps 3
due due 3
have have 3
patients patients 3
remedies remedies 3
take take 3
any any 3
with with 3
wearing wearing 3
medicines medicines 3
many many 2
distancing distancing 2
precautions precautions 2
plasma plasma 2
should should 2
testing testing 2
there there 2
such. such. 2
covid-19 covid-19 2
remdesivir remdesivir 2
from from 2
such such 2
effective effective 2
quarantine quarantine 2
which which 2
enough enough 2
boosters boosters 2
precaution. precaution. 2
appeared appeared 2
based based 2
done done 2
treatment. treatment. 2
medicine medicine 2
used used 2
out out 2
than than 2
till till 2
treatments treatments 2
cure. cure. 2
some some 2
water, water, 2
while while 2
precautionary precautionary 2
sanitation, sanitation, 2
people people 2
hygiene hygiene 2
maintain maintain 2
hygienic hygienic 1
lifestyle lifestyle 1
zero zero 1
doubt doubt 1
pattern pattern 1
seriously seriously 1
drugs drugs 1
plasma, plasma, 1
practice practice 1
wear wear 1
kadha, kadha, 1
masks, masks, 1
sanitizing sanitizing 1
therapy therapy 1
quick quick 1
results results 1
swift swift 1
making making 1
strong. strong. 1
your your 1
followed followed 1
measures measures 1
preventive preventive 1
say say 1
treatments, treatments, 1
were were 1
would would 1
any. any. 1
boosting boosting 1
deficit deficit 1
kits. kits. 1
only only 1
out. out. 1
places places 1
way way 1
easing easing 1
planned planned 1
throughly throughly 1
coronil, coronil, 1
paracetamol, paracetamol, 1
side. side. 1
thing thing 1
contracted contracted 1
disease disease 1
hydroxychloroquine hydroxychloroquine 1
basic basic 1
very very 1
hygenic hygenic 1
oriented oriented 1
safety safety 1
access access 1
cities cities 1
facilities facilities 1
healthcare. healthcare. 1
less less 1
number number 1
shift shift 1
supplements supplements 1
covid covid 1
haven't haven't 1
personally personally 1
nil nil 1
coronavirus coronavirus 1
steam steam 1
hcq hcq 1
hospitals. hospitals. 1
most most 1
don't don't 1
without without 1
availability availability 1
generic generic 1
good good 1
mostly mostly 1
now now 1
patient patient 1
vaccine vaccine 1
ease ease 1
oxygen oxygen 1
precuations precuations 1
does does 1
guarantee guarantee 1
chaotic chaotic 1
progressive progressive 1
immune immune 1
increasing increasing 1
power power 1
system system 1
hands hands 1
warm warm 1
washing washing 1
days days 1
quarantined quarantined 1
lot lot 1
water water 1
bed bed 1
charges charges 1
charges. charges. 1
facility facility 1
other other 1
reasonable reasonable 1
related related 1
challenges challenges 1
pandemic pandemic 1
this this 1
donot donot 1
remember. remember. 1
administering administering 1
aid aid 1
aware aware 1
directly, directly, 1
healing healing 1
kept kept 1
monitoring monitoring 1
observations observations 1
process. process. 1
that that 1
under under 1
developing developing 1
regular regular 1
treatments. treatments. 1
beyond beyond 1
distance, distance, 1
hand hand 1
maintained maintained 1
precaution precaution 1
sanitizers sanitizers 1
that. that. 1
worn worn 1
available available 1
everyone. everyone. 1
might might 1
soon soon 1
will will 1
work work 1
more more 1
often often 1
health health 1
methods, methods, 1
suppliments suppliments 1
overburdened overburdened 1
distance distance 1
follow follow 1
just just 1
like like 1
maintaining maintaining 1
much much 1
outdoors. outdoors. 1
physical physical 1
steps steps 1
when when 1
applicable applicable 1
yet. yet. 1
antibiotics antibiotics 1
cured cured 1
day day 1
end end 1
help help 1
lots lots 1
mild mild 1
prevention prevention 1
water. water. 1
all all 1
being being 1
fever fever 1
flu flu 1
intake intake 1
isolation isolation 1
vitamins vitamins 1
n.a n.a 1
giloy giloy 1
juice, juice, 1
proper proper 1
carried carried 1
difficult difficult 1
effectively effectively 1
its its 1
spots. spots. 1
states states 1
staying staying 1
now. now. 1
eat eat 1
fit fit 1
food, food, 1
healthy healthy 1
stay stay 1
bharat bharat 1
bio bio 1
clinical clinical 1
covaxin(vaccine) covaxin(vaccine) 1
developed developed 1
stage stage 1
tech tech 1
trail trail 1
boost boost 1
kadhas kadhas 1
conformed conformed 1
personal personal 1
> # Display the top 5 most frequent words
> head(dtm_d, 5)
word freq
and and 16
not not 15
the the 14
none none 10
treatment treatment 9
> head(dtm_d, 15)
word freq
and and 16
not not 15
the the 14
none none 10
treatment treatment 9
are are 7
mask mask 6
immunity immunity 6
but but 6
for for 6
home home 6
yet yet 5
symptoms symptoms 5
specific specific 5
drinking drinking 5
> # Plot the most frequent words
> barplot(dtm_d[1:5,]$freq, las = 2, names.arg = dtm_d[1:5,]$word,
+ col ="lightgreen", main ="Top 5 most frequent words",
+ ylab = "Word frequencies")
> #generate word cloud
> set.seed(1234)
> wordcloud(words = dtm_d$word, freq = dtm_d$freq, min.freq = 5,
+ max.words=100, random.order=FALSE, rot.per=0.40,
+ colors=brewer.pal(8, "Dark2"))
Bar Chart

Wordcloud
Report

The treatment for symptoms are to increase immunity.

It can be done by staying at home, wearing mask and drinking water.

Qn2

Submit the text mining descriptives report and use R functions for Cardata generate report team
wise with syntaxes.

Data set 2: Cardata


We are doing descriptive analytics on data set Cardata. The analysis is aimed to
understand the data better and helps in summarize the data in order identify
the patterns. The dataset contains 16 columns and 11915 rows .
Here we are considering MSRP to find out the mean,median,standard
deviation,skewness and kurtosis.we found the dimensions and structure of
cardata data set.

Cardata

names(Cardata)

nrow(Cardata)

dim(Cardata)

str(Cardata)

mean(Cardata$MSRP)

median(Cardata$MSRP)

sd(Cardata$MSRP)

skewness(Cardata$MSRP)

kurtosis(Cardata$MSRP)

table(Cardata$Engine.HP)

Cardata%>%count(Engine.HP)

x=Cardata%>%count(Engine.HP)

x$prop=(x$n)/11914

x
x$per=((x$n)/11914)*100

x$cum=cumsum(x$n)

barplot(counts,main=" distribution of car",xlab="Engine.Hp")

> mytable<-table(Cardata$Driven_Wheels)
> lbls<-paste(names(mytable),"\n",mytable,sep="")
> pie(mytable,labels=lbls,main="Pie chart of Driven wheels\n(with sample sizes)")

output

Cardata
Make Model Year
1 BMW 1 Series M 2011
2 BMW 1 Series 2011
3 BMW 1 Series 2011
4 BMW 1 Series 2011
5 BMW 1 Series 2011
6 BMW 1 Series 2012
7 BMW 1 Series 2012
8 BMW 1 Series 2012
9 BMW 1 Series 2012
10 BMW 1 Series 2013
11 BMW 1 Series 2013
12 BMW 1 Series 2013
13 BMW 1 Series 2013
14 BMW 1 Series 2013
15 BMW 1 Series 2013
16 BMW 1 Series 2013
17 BMW 1 Series 2013
18 Audi 100 1992
19 Audi 100 1992
20 Audi 100 1992
21 Audi 100 1992
22 Audi 100 1992
23 Audi 100 1993
24 Audi 100 1993
25 Audi 100 1993
26 Audi 100 1993
27 Audi 100 1993
28 Audi 100 1994
29 Audi 100 1994
30 Audi 100 1994
31 Audi 100 1994
32 Audi 100 1994
33 FIAT 124 Spider 2017
34 FIAT 124 Spider 2017
35 FIAT 124 Spider 2017
36 Mercedes-Benz 190-Class 1991
37 Mercedes-Benz 190-Class 1991
38 Mercedes-Benz 190-Class 1992
39 Mercedes-Benz 190-Class 1992
40 Mercedes-Benz 190-Class 1993
41 Mercedes-Benz 190-Class 1993
42 BMW 2 Series 2016
43 BMW 2 Series 2016
44 BMW 2 Series 2016
45 BMW 2 Series 2016
46 BMW 2 Series 2016
47 BMW 2 Series 2016
48 BMW 2 Series 2016
49 BMW 2 Series 2016
50 BMW 2 Series 2016
51 BMW 2 Series 2017
52 BMW 2 Series 2017
53 BMW 2 Series 2017
54 BMW 2 Series 2017
55 BMW 2 Series 2017
56 BMW 2 Series 2017
57 BMW 2 Series 2017
58 BMW 2 Series 2017
59 Audi 200 1990
60 Audi 200 1990
61 Audi 200 1990
62 Audi 200 1991
Engine.Fuel.Type Engine.HP
Engine.Cylinders
1 premium unleaded (required) 335
6
2 premium unleaded (required) 300
6
3 premium unleaded (required) 300
6
4 premium unleaded (required) 230
6
5 premium unleaded (required) 230
6
6 premium unleaded (required) 230
6
7 premium unleaded (required) 300
6
8 premium unleaded (required) 300
6
9 premium unleaded (required) 230
6
10 premium unleaded (required) 230
6
11 premium unleaded (required) 300
6
12 premium unleaded (required) 230
6
13 premium unleaded (required) 300
6
14 premium unleaded (required) 230
6
15 premium unleaded (required) 230
6
16 premium unleaded (required) 320
6
17 premium unleaded (required) 320
6
18 regular unleaded 172
6
19 regular unleaded 172
6
20 regular unleaded 172
6
21 regular unleaded 172
6
22 regular unleaded 172
6
23 regular unleaded 172
6
24 regular unleaded 172
6
25 regular unleaded 172
6
26 regular unleaded 172
6
27 regular unleaded 172
6
28 regular unleaded 172
6
29 regular unleaded 172
6
30 regular unleaded 172
6
31 regular unleaded 172
6
32 regular unleaded 172
6
33 premium unleaded (recommended) 160
4
34 premium unleaded (recommended) 160
4
35 premium unleaded (recommended) 160
4
36 regular unleaded 130
4
37 regular unleaded 158
6
38 regular unleaded 158
6
39 regular unleaded 130
4
40 regular unleaded 130
4
41 regular unleaded 158
6
42 premium unleaded (required) 240
4
43 premium unleaded (required) 240
4
44 premium unleaded (required) 320
6
45 premium unleaded (required) 240
4
46 premium unleaded (required) 240
4
47 premium unleaded (required) 320
6
48 premium unleaded (required) 240
4
49 premium unleaded (required) 320
6
50 premium unleaded (required) 320
6
51 premium unleaded (recommended) 335
6
52 premium unleaded (recommended) 335
6
53 premium unleaded (recommended) 335
6
54 premium unleaded (recommended) 335
6
55 premium unleaded (recommended) 248
4
56 premium unleaded (recommended) 248
4
57 premium unleaded (recommended) 248
4
58 premium unleaded (recommended) 248
4
59 regular unleaded 162
5
60 regular unleaded 162
5
61 regular unleaded 162
5
62 regular unleaded 217
5
Transmission.Type Driven_Wheels Number.of.Doors
1 MANUAL rear wheel drive 2
2 MANUAL rear wheel drive 2
3 MANUAL rear wheel drive 2
4 MANUAL rear wheel drive 2
5 MANUAL rear wheel drive 2
6 MANUAL rear wheel drive 2
7 MANUAL rear wheel drive 2
8 MANUAL rear wheel drive 2
9 MANUAL rear wheel drive 2
10 MANUAL rear wheel drive 2
11 MANUAL rear wheel drive 2
12 MANUAL rear wheel drive 2
13 MANUAL rear wheel drive 2
14 MANUAL rear wheel drive 2
15 MANUAL rear wheel drive 2
16 MANUAL rear wheel drive 2
17 MANUAL rear wheel drive 2
18 MANUAL front wheel drive 4
19 MANUAL front wheel drive 4
20 AUTOMATIC all wheel drive 4
21 MANUAL front wheel drive 4
22 MANUAL all wheel drive 4
23 MANUAL front wheel drive 4
24 AUTOMATIC all wheel drive 4
25 MANUAL front wheel drive 4
26 MANUAL front wheel drive 4
27 MANUAL all wheel drive 4
28 AUTOMATIC front wheel drive 4
29 MANUAL all wheel drive 4
30 MANUAL front wheel drive 4
31 AUTOMATIC front wheel drive 4
32 AUTOMATIC all wheel drive 4
33 MANUAL rear wheel drive 2
34 MANUAL rear wheel drive 2
35 MANUAL rear wheel drive 2
36 MANUAL rear wheel drive 4
37 MANUAL rear wheel drive 4
38 MANUAL rear wheel drive 4
39 MANUAL rear wheel drive 4
40 MANUAL rear wheel drive 4
41 MANUAL rear wheel drive 4
42 AUTOMATIC rear wheel drive 2
43 AUTOMATIC rear wheel drive 2
44 AUTOMATIC rear wheel drive 2
45 AUTOMATIC all wheel drive 2
46 AUTOMATIC all wheel drive 2
47 AUTOMATIC rear wheel drive 2
48 MANUAL rear wheel drive 2
49 AUTOMATIC all wheel drive 2
50 AUTOMATIC rear wheel drive 2
51 AUTOMATIC all wheel drive 2
52 AUTOMATIC rear wheel drive 2
53 AUTOMATIC all wheel drive 2
54 AUTOMATIC rear wheel drive 2
55 AUTOMATIC rear wheel drive 2
56 AUTOMATIC rear wheel drive 2
57 AUTOMATIC all wheel drive 2
58 AUTOMATIC all wheel drive 2
59 AUTOMATIC front wheel drive 4
60 MANUAL all wheel drive 4
61 MANUAL all wheel drive 4
62 MANUAL all wheel drive 4
Market.Category Vehicle.Size
Vehicle.Style
1 Factory Tuner,Luxury,High-Performance Compact
Coupe
2 Luxury,Performance Compact
Convertible
3 Luxury,High-Performance Compact
Coupe
4 Luxury,Performance Compact
Coupe
5 Luxury Compact
Convertible
6 Luxury,Performance Compact
Coupe
7 Luxury,Performance Compact
Convertible
8 Luxury,High-Performance Compact
Coupe
9 Luxury Compact
Convertible
10 Luxury Compact
Convertible
11 Luxury,High-Performance Compact
Coupe
12 Luxury,Performance Compact
Coupe
13 Luxury,Performance Compact
Convertible
14 Luxury Compact
Convertible
15 Luxury,Performance Compact
Coupe
16 Luxury,High-Performance Compact
Convertible
17 Luxury,High-Performance Compact
Coupe
18 Luxury Midsize
Sedan
19 Luxury Midsize
Sedan
20 Luxury Midsize
Wagon
21 Luxury Midsize
Sedan
22 Luxury Midsize
Sedan
23 Luxury Midsize
Sedan
24 Luxury Midsize
Wagon
25 Luxury Midsize
Sedan
26 Luxury Midsize
Sedan
27 Luxury Midsize
Sedan
28 Luxury Midsize
Wagon
29 Luxury Midsize
Sedan
30 Luxury Midsize
Sedan
31 Luxury Midsize
Sedan
32 Luxury Midsize
Wagon
33 Performance Compact
Convertible
34 Performance Compact
Convertible
35 Performance Compact
Convertible
36 Luxury Compact
Sedan
37 Luxury Compact
Sedan
38 Luxury Compact
Sedan
39 Luxury Compact
Sedan
40 Luxury Compact
Sedan
41 Luxury Compact
Sedan
42 Luxury,Performance Compact
Coupe
43 Luxury Compact
Convertible
44 Factory Tuner,Luxury,High-Performance Compact
Convertible
45 Luxury,Performance Compact
Coupe
46 Luxury Compact
Convertible
47 Factory Tuner,Luxury,High-Performance Compact
Coupe
48 Luxury,Performance Compact
Coupe
49 Factory Tuner,Luxury,High-Performance Compact
Coupe
50 Factory Tuner,Luxury,High-Performance Compact
Convertible
51 Factory Tuner,Luxury,High-Performance Compact
Coupe
52 Factory Tuner,Luxury,High-Performance Compact
Convertible
53 Factory Tuner,Luxury,High-Performance Compact
Convertible
54 Factory Tuner,Luxury,High-Performance Compact
Coupe
55 Luxury,Performance Compact
Convertible
56 Luxury,Performance Compact
Coupe
57 Luxury,Performance Compact
Coupe
58 Luxury Compact
Convertible
59 Luxury Midsize
Sedan
60 Luxury Midsize
Wagon
61 Luxury Midsize
Sedan
62 Luxury,Performance Midsize
Sedan
highway.MPG city.mpg Popularity MSRP
1 26 19 3916 46135
2 28 19 3916 40650
3 28 20 3916 36350
4 28 18 3916 29450
5 28 18 3916 34500
6 28 18 3916 31200
7 26 17 3916 44100
8 28 20 3916 39300
9 28 18 3916 36900
10 27 18 3916 37200
11 28 20 3916 39600
12 28 19 3916 31500
13 28 19 3916 44400
14 28 19 3916 37200
15 28 19 3916 31500
16 25 18 3916 48250
17 28 20 3916 43550
18 24 17 3105 2000
19 24 17 3105 2000
20 20 16 3105 2000
21 24 17 3105 2000
22 21 16 3105 2000
23 24 17 3105 2000
24 20 16 3105 2000
25 24 17 3105 2000
26 24 17 3105 2000
27 21 16 3105 2000
28 21 16 3105 2000
29 22 16 3105 2000
30 22 17 3105 2000
31 22 16 3105 2000
32 21 16 3105 2000
33 35 26 819 27495
34 35 26 819 24995
35 35 26 819 28195
36 26 18 617 2000
37 25 17 617 2000
38 25 17 617 2000
39 26 18 617 2000
40 26 18 617 2000
41 25 17 617 2000
42 35 23 3916 32850
43 34 23 3916 38650
44 31 20 3916 48750
45 35 23 3916 34850
46 34 22 3916 40650
47 31 20 3916 44150
48 34 22 3916 32850
49 30 20 3916 46150
50 30 20 3916 50750
51 31 21 3916 46450
52 32 21 3916 49050
53 32 21 3916 51050
54 32 21 3916 44450
55 34 23 3916 38950
56 35 24 3916 33150
57 33 24 3916 35150
58 33 23 3916 40950
59 20 16 3105 2000
60 22 15 3105 2000
61 23 15 3105 2000
62 22 16 3105 2000
[ reached getOption("max.print") -- omitted 11852 rows ]
> names(Cardata)
[1] "Make" "Model" "Year"
"Engine.Fuel.Type"
[5] "Engine.HP" "Engine.Cylinders" "Transmission.Type"
"Driven_Wheels"
[9] "Number.of.Doors" "Market.Category" "Vehicle.Size"
"Vehicle.Style"
[13] "highway.MPG" "city.mpg" "Popularity" "MSRP"
> nrow(Cardata)
[1] 11914
> skewness(Cardata$MSRP)
[1] 11.76902
> kurtosis(Cardata$MSRP)
[1] 268.7673

> dim(Cardata)
[1] 11914 16
> str(Cardata)
'data.frame': 11914 obs. of 16 variables:
$ Make : Factor w/ 48 levels "Acura","Alfa Romeo",..: 6 6 6 6 6 6 6 6 6 6
$ Model : Factor w/ 915 levels "09-Jyaistha",..: 4 3 3 3 3 3 3 3 3 3 ...
$ Year : int 2011 2011 2011 2011 2011 2012 2012 2012 2012 2013 ...
$ Engine.Fuel.Type : Factor w/ 10 levels "diesel","electric",..: 9 9 9 9 9 9 9 9 9 9 .
$ Engine.HP : int 335 300 300 230 230 230 300 300 230 230 ...
$ Engine.Cylinders : int 6 6 6 6 6 6 6 6 6 6 ...
$ Transmission.Type: Factor w/ 5 levels "AUTOMATED_MANUAL",..: 4 4 4 4 4 4 4 4 4 4 ...
$ Driven_Wheels : Factor w/ 4 levels "all wheel drive",..: 4 4 4 4 4 4 4 4 4 4 ...
$ Number.of.Doors : int 2 2 2 2 2 2 2 2 2 2 ...
$ Market.Category : Factor w/ 73 levels "Crossover","Crossover,Diesel",..: 39 68 65 6
64 64 ...
$ Vehicle.Size : Factor w/ 3 levels "Compact","Large",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Vehicle.Style : Factor w/ 16 levels "2dr Hatchback",..: 9 7 9 9 7 9 7 9 7 7 ...
$ highway.MPG : int 26 28 28 28 28 28 26 28 28 27 ...
$ city.mpg : int 19 19 20 18 18 18 17 20 18 18 ...
$ Popularity : int 3916 3916 3916 3916 3916 3916 3916 3916 3916 3916 ...
$ MSRP : int 46135 40650 36350 29450 34500 31200 44100 39300 36900 37200

>

> table(Cardata$Engine.HP)

55 62 63 66 73 74 78 79 81 82 84 88 90 92 93
94 95 96 97
2 2 13 7 9 18 8 12 19 5 6 3 21 25 28
10 22 9 1
98 99 100 101 102 103 105 106 107 108 109 110 111 113 114
115 116 118 119
32 13 45 22 1 4 10 31 8 31 32 32 7 14 25
33 30 7 14
120 121 122 123 124 125 126 127 128 130 131 132 133 134 135
136 137 138 140
67 18 40 1 4 37 15 58 11 85 12 65 4 44 15
15 16 199 143
141 142 143 144 145 146 147 148 150 151 152 153 154 155 156
157 158 159 160
31 4 43 2 32 12 17 94 232 1 40 23 1 156 2
16 82 21 146
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
176 177 178 179
23 44 2 37 73 57 11 55 46 351 22 44 59 33 87
30 33 32 19
180 181 182 184 185 186 187 188 189 190 191 192 193 194 195
196 197 198 199
135 24 87 85 241 9 6 54 14 132 3 21 12 23 56
15 28 2 14
200 201 202 203 204 205 206 207 208 210 211 212 214 215 217
218 219 220 221
456 90 18 50 11 72 6 43 32 320 9 12 3 39 19
12 6 171 5
222 223 224 225 227 228 230 231 232 234 235 236 237 238 239
240 241 242 244
21 2 7 72 49 8 88 25 16 5 27 21 14 4 16
268 43 20 15
245 248 250 251 252 253 254 255 256 257 259 260 261 263 264
265 266 268 270
54 29 120 6 73 13 4 56 14 3 17 133 100 12 11
47 34 84 73
271 272 273 274 275 276 278 279 280 281 282 283 284 285 287
288 290 291 292
10 54 16 11 123 30 84 20 120 39 52 70 8 246 20
43 132 26 67
293 295 296 297 298 300 301 302 303 304 305 306 308 310 311
315 316 317 318
16 80 17 20 2 192 6 94 39 36 89 62 26 123 15
35 9 40 30
320 321 322 323 325 328 329 330 332 333 335 337 338 340 342
343 345 348 349
69 18 11 6 93 35 23 57 93 37 54 5 3 44 4
3 21 23 2
350 354 355 359 360 361 362 365 370 372 375 377 380 381 382
383 385 386 389
38 6 158 2 30 1 10 63 11 1 19 2 18 152 7
3 47 7 3
390 394 395 400 401 402 403 404 410 415 416 420 424 425 426
429 430 435 438
44 3 7 52 3 15 9 20 6 6 7 133 1 13 4
14 45 11 3
440 442 443 444 445 449 450 451 453 454 455 456 460 464 467
469 470 475 480
4 6 8 5 36 12 27 1 13 16 30 1 26 4 4
7 4 4 2
483 485 490 493 500 503 505 510 515 518 520 521 523 525 526
530 532 535 536
15 19 2 1 17 6 4 66 6 3 15 6 9 8 6
3 5 1 4
540 543 545 550 552 553 556 557 560 562 563 565 567 568 570
572 573 577 580
15 6 5 27 10 1 7 1 24 7 15 6 12 6 16
1 1 20 5
582 583 592 597 600 604 605 610 611 616 617 620 621 622 624
626 631 632 640
2 6 1 2 11 2 4 4 4 5 2 6 18 1 3
5 7 4 6
641 645 650 651 660 661 662 670 700 707 720 731 750 1001
3 14 21 3 1 1 4 1 6 6 4 3 2 3
> Cardata%>%count(Engine.HP)
> Cardata%>%count(Engine.HP)
# A tibble: 357 x 2
Engine.HP n
<int> <int>
1 55 2
2 62 2
3 63 13
4 66 7
5 73 9
6 74 18
7 78 8
8 79 12
9 81 19
10 82 5
# ... with 347 more rows

> x=Cardata%>%count(Engine.HP)
> x$prop=(x$n)/11914
> x
# A tibble: 357 x 3
Engine.HP n prop
<int> <int> <dbl>
1 55 2 0.000168
2 62 2 0.000168
3 63 13 0.00109
4 66 7 0.000588
5 73 9 0.000755
6 74 18 0.00151
7 78 8 0.000671
8 79 12 0.00101
9 81 19 0.00159
10 82 5 0.000420
# ... with 347 more rows

> x$per=((x$n)/11914)*100
> x$cum=cumsum(x$n)
> x
# A tibble: 357 x 5
Engine.HP n prop per cum
<int> <int> <dbl> <dbl> <int>
1 55 2 0.000168 0.0168 2
2 62 2 0.000168 0.0168 4
3 63 13 0.00109 0.109 17
4 66 7 0.000588 0.0588 24
5 73 9 0.000755 0.0755 33
6 74 18 0.00151 0.151 51
7 78 8 0.000671 0.0671 59
8 79 12 0.00101 0.101 71
9 81 19 0.00159 0.159 90
10 82 5 0.000420 0.0
Conclusion:

We are doing descriptive analytics on data set Cardata. The analysis is aimed to
understand the data better and helps in summarize the data in order identify
the patterns. The dataset contains 16 columns and 11915 rows .
Here we are considering MSRP to find out the mean,median,standard
deviation,skewness and kurtosis.we found the dimensions and structure of
cardata data set.

th

You might also like