Assignment Submitted By-Srishti Bhateja 19021141116: STR (Crew - Data)

Assignment
Submitted by-Srishti bhateja

19021141116
Import “crew data.csv” from MS teams>files and answer to the following questions
1. List the categorical and numeric variables of the data set
In order to know the categorical and numerical variables structure function is used .
Categorical: Hire.date, Lastname, Firstname,Location, EmpId, Job.code
Numeric:bonus
SYNTAX
> str(Crew.data)
'data.frame':69 obs. of 9 variables:
$ Hire.date: Factor w/ 69 levels "1-Jul-87","1-Mar-90",..: 35 50 3 16 27 36 62 60 24 17 ...
$ Lastname : Factor w/ 69 levels "BEAUMONT","BERGAMASCO",..: 21 35 69 19 41 18 42
64 67 9 ...
$ Firstname: Factor w/ 69 levels "ANITA M.","ANNETTE M.",..: 30 29 24 58 54 26 68 39 59
37 ...
$ Location : Factor w/ 3 levels "CARY","FRANKFURT",..: 1 2 3 1 3 2 3 2 2 3 ...
$ Phne : int 1168 2164 1565 1157 2360 1595 2366 1197 1553 1369 ...
$ EmpId : Factor w/ 69 levels "E00034","E00084",..: 53 36 49 46 31 4 25 29 41 18 ...
$ Job.code : Factor w/ 6 levels "FLTAT1","FLTAT2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Salary : int 21000 22000 22000 23000 24000 25000 25000 26000 27000 28000 ...
$ bonus : num 2100 2200 2200 2300 2400 2500 2500 2600 2700 2800 ...

2. Describe the numeric variable using descriptive function
These functions are used to find mean , median , variance and also the standard deviation
in order to analyse the data and provide various observations
SYNTAX
> summary(Crew.data$bonus)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2100 3300 4200 5214 7300 11200
> sd(Crew.data$bonus)
[1] 2552.178
> var(Crew.data$bonus)
[1] 6513610
3. How many groups are containing in the variable “Job code
here piping(%>%)used to know how many types of job codes are there .
There are 6 groups in variable “Job.code”
SYNTAX
> Crew.data%>%count(Job.code)
Job.code n
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8
1. Enumerate all functions explained in the video for “Job code”
the functions explained are table
> table(Crew.data$Job.code)
FLTAT1 FLTAT2 FLTAT3 PILOT1 PILOT2 PILOT3
14 18 12 8 9 8
> Emptb=table(Crew.data$Job.code)
> Emptb
FLTAT1 FLTAT2 FLTAT3 PILOT1 PILOT2 PILOT3
14 18 12 8 9 8
> class(Emptb)
[1] "table"
> Empf=as.data.frame(Emptb)
> Empf
Var1 Freq
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8
> names(Empf)=c("Jobcat","count")
> Empf
Jobcode count
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8
If we use dplyr
> Crew.data%>%count(Job.code)
Job.code n
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8
> Crew.data%>%group_by(Job.code)%>%summarise(count=n())
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 6 x 2
Job.code count
<fct> <int>
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8
> Crew.data%>%group_by(Job.code)%>%summarise(mean(Salary))
# A tibble: 6 x 2
Job.code `mean(Salary)`
<fct> <dbl>
1 FLTAT1 25643.
2 FLTAT2 35111.
3 FLTAT3 44250
4 PILOT1 69500
5 PILOT2 80111.
6 PILOT3 99875
2. Enumerate all functions explained in the video for “Salary”
> summary(Crew.data$Salary)
21000 33000 42000 52145 73000 112000
> table(Crew.data$Salary)
21000 22000 23000 24000 25000 26000 27000 28000 29000 30000 32000 33000
34000 35000
1 2 1 1 2 1 1 2 2 1 1 3 4 3
36000 37000 38000 41000 42000 43000 44000 45000 47000 48000 65000 66000
68000 69000
2 2 3 2 1 1 3 2 2 1 1 1 1 1
71000 72000 73000 75000 76000 77000 78000 81000 82000 83000 86000 92000
93000 94000
1 2 1 1 1 1 1 1 1 2 1 1 1 1
95000 100000 105000 108000 112000
1 1 1 1 1
> Emptb=table(Crew.data$Salary)
> Emptb
21000 22000 23000 24000 25000 26000 27000 28000 29000 30000 32000 33000
34000 35000
1 2 1 1 2 1 1 2 2 1 1 3 4 3
36000 37000 38000 41000 42000 43000 44000 45000 47000 48000 65000 66000
68000 69000
2 2 3 2 1 1 3 2 2 1 1 1 1 1
71000 72000 73000 75000 76000 77000 78000 81000 82000 83000 86000 92000
93000 94000
1 2 1 1 1 1 1 1 1 2 1 1 1 1
95000 100000 105000 108000 112000
1 1 1 1 1

> Empf=as.data.frame(Emptb)
> Empf

Var1 Freq
1 21000 1
2 22000 2
3 23000 1
4 24000 1
5 25000 2
6 26000 1
7 27000 1
8 28000 2
9 29000 2
10 30000 1
11 32000 1
12 33000 3
13 34000 4
14 35000 3
15 36000 2
16 37000 2
17 38000 3
18 41000 2
19 42000 1
20 43000 1
21 44000 3
22 45000 2
23 47000 2
24 48000 1
25 65000 1
26 66000 1
27 68000 1
28 69000 1
29 71000 1
30 72000 2
31 73000 1
32 75000 1
33 76000 1
34 77000 1
35 78000 1
36 81000 1
37 82000 1
38 83000 2
39 86000 1
40 92000 1
41 93000 1
42 94000 1
43 95000 1
44 100000 1
45 105000 1
46 108000 1
47 112000 1

> names(Empf)=c("Salary","count")
> Empf

Salary count
1 21000 1
2 22000 2
3 23000 1
4 24000 1
5 25000 2
6 26000 1
7 27000 1
8 28000 2
9 29000 2
10 30000 1
11 32000 1
12 33000 3
13 34000 4
14 35000 3
15 36000 2
16 37000 2
17 38000 3
18 41000 2
19 42000 1
20 43000 1
21 44000 3
22 45000 2
23 47000 2
24 48000 1
25 65000 1
26 66000 1
27 68000 1
28 69000 1
29 71000 1
30 72000 2
31 73000 1
32 75000 1
33 76000 1
34 77000 1
35 78000 1
36 81000 1
37 82000 1
38 83000 2
39 86000 1
40 92000 1
41 93000 1
42 94000 1
43 95000 1
44 100000 1
45 105000 1
46 108000 1
47 112000 1
Using dplyr
> Crew.data%>%count(Salary)
Salary n
1 21000 1
2 22000 2
3 23000 1
4 24000 1
5 25000 2
6 26000 1
7 27000 1
8 28000 2
9 29000 2
10 30000 1
11 32000 1
12 33000 3
13 34000 4
14 35000 3
15 36000 2
16 37000 2
17 38000 3
18 41000 2
19 42000 1
20 43000 1
21 44000 3
22 45000 2
23 47000 2
24 48000 1
25 65000 1
26 66000 1
27 68000 1
28 69000 1
29 71000 1
30 72000 2
31 73000 1
32 75000 1
33 76000 1
34 77000 1
35 78000 1
36 81000 1
37 82000 1
38 83000 2
39 86000 1
40 92000 1
41 93000 1
42 94000 1
43 95000 1
44 100000 1
45 105000 1
46 108000 1
47 112000 1

> Crew.data%>%group_by(Salary)%>%summarise(count=n())

# A tibble: 47 x 2
Salary count
<int> <int>
1 21000 1
2 22000 2
3 23000 1
4 24000 1
5 25000 2
6 26000 1
7 27000 1
8 28000 2
9 29000 2
10 30000 1
# ... with 37 more rows
> Crew.data%>%group_by(Salary)%>%summarise(mean(Salary))
# A tibble: 47 x 2
Salary `mean(Salary)`
<int> <dbl>
1 21000 21000
2 22000 22000
3 23000 23000
4 24000 24000
5 25000 25000
6 26000 26000
7 27000 27000
8 28000 28000
9 29000 29000
10 30000 30000
# ... with 37 more rows

Execute “mtcars” in built data in R and answer to the following questions
1. Enumerate all functions explained in the video for all categorical and numerical
variables of the data set.
There are numerical variable in this dataset so we analyse all of them using summary
function
SYNTAX
> data(mtcars) #importing the dataset
> str(mtcars)
'data.frame':32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
> summary(mtcars$mpg)
10.40 15.43 19.20 20.09 22.80 33.90
> summary(mtcars$cyl)
4.000 4.000 6.000 6.188 8.000 8.000
> summary(mtcars$disp)
71.1 120.8 196.3 230.7 326.0 472.0
> summary(mtcars$hp)
52.0 96.5 123.0 146.7 180.0 335.0
> summary(mtcars$drat)
2.760 3.080 3.695 3.597 3.920 4.930
> summary(mtcars$wt)
1.513 2.581 3.325 3.217 3.610 5.424
> summary(mtcars$qsec)
14.50 16.89 17.71 17.85 18.90 22.90
> summary(mtcars$vs)
0.0000 0.0000 0.0000 0.4375 1.0000 1.0000
> summary(mtcars$am)
0.0000 0.0000 0.0000 0.4062 1.0000 1.0000
> summary(mtcars$gear)
3.000 3.000 4.000 3.688 4.000 5.000
> summary(mtcars$carb)
1.000 2.000 2.000 2.812 4.000 8.000

Assignment Submitted By-Srishti Bhateja 19021141116: STR (Crew - Data)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assignment Submitted By-Srishti Bhateja 19021141116: STR (Crew - Data)

Uploaded by

Copyright:

Available Formats

Assignment

Submitted by-Srishti bhateja

You might also like