Professional Documents
Culture Documents
Assignment Submitted By-Srishti Bhateja 19021141116: STR (Crew - Data)
Assignment Submitted By-Srishti Bhateja 19021141116: STR (Crew - Data)
Import “crew data.csv” from MS teams>files and answer to the following questions
1. List the categorical and numeric variables of the data set
In order to know the categorical and numerical variables structure function is used .
Categorical: Hire.date, Lastname, Firstname,Location, EmpId, Job.code
Numeric:bonus
SYNTAX
> str(Crew.data)
'data.frame':69 obs. of 9 variables:
$ Hire.date: Factor w/ 69 levels "1-Jul-87","1-Mar-90",..: 35 50 3 16 27 36 62 60 24 17 ...
$ Lastname : Factor w/ 69 levels "BEAUMONT","BERGAMASCO",..: 21 35 69 19 41 18 42
64 67 9 ...
$ Firstname: Factor w/ 69 levels "ANITA M.","ANNETTE M.",..: 30 29 24 58 54 26 68 39 59
37 ...
$ Location : Factor w/ 3 levels "CARY","FRANKFURT",..: 1 2 3 1 3 2 3 2 2 3 ...
$ Phne : int 1168 2164 1565 1157 2360 1595 2366 1197 1553 1369 ...
$ EmpId : Factor w/ 69 levels "E00034","E00084",..: 53 36 49 46 31 4 25 29 41 18 ...
$ Job.code : Factor w/ 6 levels "FLTAT1","FLTAT2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Salary : int 21000 22000 22000 23000 24000 25000 25000 26000 27000 28000 ...
$ bonus : num 2100 2200 2200 2300 2400 2500 2500 2600 2700 2800 ...
2. Describe the numeric variable using descriptive function
These functions are used to find mean , median , variance and also the standard deviation
in order to analyse the data and provide various observations
SYNTAX
> summary(Crew.data$bonus)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2100 3300 4200 5214 7300 11200
> sd(Crew.data$bonus)
[1] 2552.178
> var(Crew.data$bonus)
[1] 6513610
3. How many groups are containing in the variable “Job code
here piping(%>%)used to know how many types of job codes are there .
There are 6 groups in variable “Job.code”
SYNTAX
> Crew.data%>%count(Job.code)
Job.code n
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8
1. Enumerate all functions explained in the video for “Job code”
the functions explained are table
> table(Crew.data$Job.code)
FLTAT1 FLTAT2 FLTAT3 PILOT1 PILOT2 PILOT3
14 18 12 8 9 8
> Emptb=table(Crew.data$Job.code)
> Emptb
FLTAT1 FLTAT2 FLTAT3 PILOT1 PILOT2 PILOT3
14 18 12 8 9 8
> class(Emptb)
[1] "table"
> Empf=as.data.frame(Emptb)
> Empf
Var1 Freq
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8
> names(Empf)=c("Jobcat","count")
> Empf
Jobcode count
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8
If we use dplyr
> Crew.data%>%count(Job.code)
Job.code n
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8
> Crew.data%>%group_by(Job.code)%>%summarise(count=n())
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 6 x 2
Job.code count
<fct> <int>
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8
> Crew.data%>%group_by(Job.code)%>%summarise(mean(Salary))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 6 x 2
Job.code `mean(Salary)`
<fct> <dbl>
1 FLTAT1 25643.
2 FLTAT2 35111.
3 FLTAT3 44250
4 PILOT1 69500
5 PILOT2 80111.
6 PILOT3 99875
2. Enumerate all functions explained in the video for “Salary”
> summary(Crew.data$Salary)
Min. 1st Qu. Median Mean 3rd Qu. Max.
21000 33000 42000 52145 73000 112000
> table(Crew.data$Salary)
21000 22000 23000 24000 25000 26000 27000 28000 29000 30000 32000 33000
34000 35000
1 2 1 1 2 1 1 2 2 1 1 3 4 3
36000 37000 38000 41000 42000 43000 44000 45000 47000 48000 65000 66000
68000 69000
2 2 3 2 1 1 3 2 2 1 1 1 1 1
71000 72000 73000 75000 76000 77000 78000 81000 82000 83000 86000 92000
93000 94000
1 2 1 1 1 1 1 1 1 2 1 1 1 1
95000 100000 105000 108000 112000
1 1 1 1 1
> Emptb=table(Crew.data$Salary)
> Emptb
21000 22000 23000 24000 25000 26000 27000 28000 29000 30000 32000 33000
34000 35000
1 2 1 1 2 1 1 2 2 1 1 3 4 3
36000 37000 38000 41000 42000 43000 44000 45000 47000 48000 65000 66000
68000 69000
2 2 3 2 1 1 3 2 2 1 1 1 1 1
71000 72000 73000 75000 76000 77000 78000 81000 82000 83000 86000 92000
93000 94000
1 2 1 1 1 1 1 1 1 2 1 1 1 1
95000 100000 105000 108000 112000
1 1 1 1 1
> Empf=as.data.frame(Emptb)
> Empf
Var1 Freq
1 21000 1
2 22000 2
3 23000 1
4 24000 1
5 25000 2
6 26000 1
7 27000 1
8 28000 2
9 29000 2
10 30000 1
11 32000 1
12 33000 3
13 34000 4
14 35000 3
15 36000 2
16 37000 2
17 38000 3
18 41000 2
19 42000 1
20 43000 1
21 44000 3
22 45000 2
23 47000 2
24 48000 1
25 65000 1
26 66000 1
27 68000 1
28 69000 1
29 71000 1
30 72000 2
31 73000 1
32 75000 1
33 76000 1
34 77000 1
35 78000 1
36 81000 1
37 82000 1
38 83000 2
39 86000 1
40 92000 1
41 93000 1
42 94000 1
43 95000 1
44 100000 1
45 105000 1
46 108000 1
47 112000 1
> names(Empf)=c("Salary","count")
> Empf
Salary count
1 21000 1
2 22000 2
3 23000 1
4 24000 1
5 25000 2
6 26000 1
7 27000 1
8 28000 2
9 29000 2
10 30000 1
11 32000 1
12 33000 3
13 34000 4
14 35000 3
15 36000 2
16 37000 2
17 38000 3
18 41000 2
19 42000 1
20 43000 1
21 44000 3
22 45000 2
23 47000 2
24 48000 1
25 65000 1
26 66000 1
27 68000 1
28 69000 1
29 71000 1
30 72000 2
31 73000 1
32 75000 1
33 76000 1
34 77000 1
35 78000 1
36 81000 1
37 82000 1
38 83000 2
39 86000 1
40 92000 1
41 93000 1
42 94000 1
43 95000 1
44 100000 1
45 105000 1
46 108000 1
47 112000 1
Using dplyr
> Crew.data%>%count(Salary)
Salary n
1 21000 1
2 22000 2
3 23000 1
4 24000 1
5 25000 2
6 26000 1
7 27000 1
8 28000 2
9 29000 2
10 30000 1
11 32000 1
12 33000 3
13 34000 4
14 35000 3
15 36000 2
16 37000 2
17 38000 3
18 41000 2
19 42000 1
20 43000 1
21 44000 3
22 45000 2
23 47000 2
24 48000 1
25 65000 1
26 66000 1
27 68000 1
28 69000 1
29 71000 1
30 72000 2
31 73000 1
32 75000 1
33 76000 1
34 77000 1
35 78000 1
36 81000 1
37 82000 1
38 83000 2
39 86000 1
40 92000 1
41 93000 1
42 94000 1
43 95000 1
44 100000 1
45 105000 1
46 108000 1
47 112000 1
> Crew.data%>%group_by(Salary)%>%summarise(count=n())
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 47 x 2
Salary count
<int> <int>
1 21000 1
2 22000 2
3 23000 1
4 24000 1
5 25000 2
6 26000 1
7 27000 1
8 28000 2
9 29000 2
10 30000 1
# ... with 37 more rows
> Crew.data%>%group_by(Salary)%>%summarise(mean(Salary))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 47 x 2
Salary `mean(Salary)`
<int> <dbl>
1 21000 21000
2 22000 22000
3 23000 23000
4 24000 24000
5 25000 25000
6 26000 26000
7 27000 27000
8 28000 28000
9 29000 29000
10 30000 30000
# ... with 37 more rows
Execute “mtcars” in built data in R and answer to the following questions
1. Enumerate all functions explained in the video for all categorical and numerical
variables of the data set.
There are numerical variable in this dataset so we analyse all of them using summary
function
SYNTAX
> data(mtcars) #importing the dataset
> str(mtcars)
'data.frame':32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
> summary(mtcars$mpg)
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.40 15.43 19.20 20.09 22.80 33.90
> summary(mtcars$cyl)
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.000 4.000 6.000 6.188 8.000 8.000
> summary(mtcars$disp)
Min. 1st Qu. Median Mean 3rd Qu. Max.
71.1 120.8 196.3 230.7 326.0 472.0
> summary(mtcars$hp)
Min. 1st Qu. Median Mean 3rd Qu. Max.
52.0 96.5 123.0 146.7 180.0 335.0
> summary(mtcars$drat)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.760 3.080 3.695 3.597 3.920 4.930
> summary(mtcars$wt)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.513 2.581 3.325 3.217 3.610 5.424
> summary(mtcars$qsec)
Min. 1st Qu. Median Mean 3rd Qu. Max.
14.50 16.89 17.71 17.85 18.90 22.90
> summary(mtcars$vs)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.0000 0.4375 1.0000 1.0000
> summary(mtcars$am)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.0000 0.4062 1.0000 1.0000
> summary(mtcars$gear)
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.000 3.000 4.000 3.688 4.000 5.000
> summary(mtcars$carb)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 2.000 2.812 4.000 8.000