You are on page 1of 9

BUSINESS ANALYTICS ASSIGNMENT

RIYA MATHEW
19021141088

Import “crew data.csv” from MS teams>files and answer to the following questions

1. List the categorical and numeric variables of the data set

Categorical: Hire.date, Lastname, Firstname, Location, EmpId, Job.code


Numeric:bonus

> str(Crew.data)

'data.frame': 69 obs. of 9 variables:


$ Hire.date: Factor w/ 69 levels "1-Jul-87","1-Mar-90",..: 35 50 3 16 27 36 62 60 24 17 ...
$ Lastname : Factor w/ 69 levels "BEAUMONT","BERGAMASCO",..: 21 35 69 19 41 18
42 64 67 9 ...
$ Firstname: Factor w/ 69 levels "ANITA M.","ANNETTE M.",..: 30 29 24 58 54 26 68 39
59 37 ...
$ Location : Factor w/ 3 levels "CARY","FRANKFURT",..: 1 2 3 1 3 2 3 2 2 3 ...
$ Phne : int 1168 2164 1565 1157 2360 1595 2366 1197 1553 1369 ...
$ EmpId : Factor w/ 69 levels "E00034","E00084",..: 53 36 49 46 31 4 25 29 41 18 ...
$ Job.code : Factor w/ 6 levels "FLTAT1","FLTAT2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Salary : int 21000 22000 22000 23000 24000 25000 25000 26000 27000 28000 ...
$ bonus : num 2100 2200 2200 2300 2400 2500 2500 2600 2700 2800 ...

2. Describe the numeric variable using descriptive function

> summary(Crew.data$bonus)

Min. 1st Qu. Median Mean 3rd Qu. Max.


2100 3300 4200 5214 7300 11200

> sd(Crew.data$bonus)

[1] 2552.178

> var(Crew.data$bonus)

[1] 6513610
3. How many groups are containing in the variable “Job code”

There are 6 groups in variable “Job.code”

> Crew.data%>%count(Job.code)
Job.code n
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8

4. Enumerate all functions explained in the video for “Job code”

> table(Crew.data$Job.code)

FLTAT1 FLTAT2 FLTAT3 PILOT1 PILOT2 PILOT3


14 18 12 8 9 8

> Emptb=table(Crew.data$Job.code)
> Emptb

FLTAT1 FLTAT2 FLTAT3 PILOT1 PILOT2 PILOT3


14 18 12 8 9 8

> class(Emptb)

[1] "table"

> Empf=as.data.frame(Emptb)
> Empf
Var1 Freq
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8

> names(Empf)=c("Jobcat","count")
> Empf

Jobcat count
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8

Using dplyr
> Crew.data%>%count(Job.code)

Job.code n
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8

> Crew.data%>%group_by(Job.code)%>%summarise(count=n())

`summarise()` ungrouping output (override with `.groups` argument)


# A tibble: 6 x 2
Job.code count
<fct> <int>
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8

> Crew.data%>%group_by(Job.code)%>%summarise(mean(Salary))

`summarise()` ungrouping output (override with `.groups` argument)


# A tibble: 6 x 2
Job.code `mean(Salary)`
<fct> <dbl>
1 FLTAT1 25643.
2 FLTAT2 35111.
3 FLTAT3 44250
4 PILOT1 69500
5 PILOT2 80111.
6 PILOT3 99875

5. Enumerate all functions explained in the video for “Salary”

> summary(Crew.data$Salary)

Min. 1st Qu. Median Mean 3rd Qu. Max.


21000 33000 42000 52145 73000 112000

> table(Crew.data$Salary)

21000 22000 23000 24000 25000 26000 27000 28000 29000 30000 32000 33000
34000 35000
1 2 1 1 2 1 1 2 2 1 1 3 4 3
36000 37000 38000 41000 42000 43000 44000 45000 47000 48000 65000 66000
68000 69000
2 2 3 2 1 1 3 2 2 1 1 1 1 1
71000 72000 73000 75000 76000 77000 78000 81000 82000 83000 86000 92000
93000 94000
1 2 1 1 1 1 1 1 1 2 1 1 1 1
95000 100000 105000 108000 112000
1 1 1 1 1

> Emptb=table(Crew.data$Salary)
> Emptb

21000 22000 23000 24000 25000 26000 27000 28000 29000 30000 32000 33000
34000 35000
1 2 1 1 2 1 1 2 2 1 1 3 4 3
36000 37000 38000 41000 42000 43000 44000 45000 47000 48000 65000 66000
68000 69000
2 2 3 2 1 1 3 2 2 1 1 1 1 1
71000 72000 73000 75000 76000 77000 78000 81000 82000 83000 86000 92000
93000 94000
1 2 1 1 1 1 1 1 1 2 1 1 1 1
95000 100000 105000 108000 112000
1 1 1 1 1

> Empf=as.data.frame(Emptb)
> Empf

Var1 Freq
1 21000 1
2 22000 2
3 23000 1
4 24000 1
5 25000 2
6 26000 1
7 27000 1
8 28000 2
9 29000 2
10 30000 1
11 32000 1
12 33000 3
13 34000 4
14 35000 3
15 36000 2
16 37000 2
17 38000 3
18 41000 2
19 42000 1
20 43000 1
21 44000 3
22 45000 2
23 47000 2
24 48000 1
25 65000 1
26 66000 1
27 68000 1
28 69000 1
29 71000 1
30 72000 2
31 73000 1
32 75000 1
33 76000 1
34 77000 1
35 78000 1
36 81000 1
37 82000 1
38 83000 2
39 86000 1
40 92000 1
41 93000 1
42 94000 1
43 95000 1
44 100000 1
45 105000 1
46 108000 1
47 112000 1

> names(Empf)=c("Salary","count")
> Empf

Salary count
1 21000 1
2 22000 2
3 23000 1
4 24000 1
5 25000 2
6 26000 1
7 27000 1
8 28000 2
9 29000 2
10 30000 1
11 32000 1
12 33000 3
13 34000 4
14 35000 3
15 36000 2
16 37000 2
17 38000 3
18 41000 2
19 42000 1
20 43000 1
21 44000 3
22 45000 2
23 47000 2
24 48000 1
25 65000 1
26 66000 1
27 68000 1
28 69000 1
29 71000 1
30 72000 2
31 73000 1
32 75000 1
33 76000 1
34 77000 1
35 78000 1
36 81000 1
37 82000 1
38 83000 2
39 86000 1
40 92000 1
41 93000 1
42 94000 1
43 95000 1
44 100000 1
45 105000 1
46 108000 1
47 112000 1
Using dplyr
> Crew.data%>%count(Salary)

Salary n
1 21000 1
2 22000 2
3 23000 1
4 24000 1
5 25000 2
6 26000 1
7 27000 1
8 28000 2
9 29000 2
10 30000 1
11 32000 1
12 33000 3
13 34000 4
14 35000 3
15 36000 2
16 37000 2
17 38000 3
18 41000 2
19 42000 1
20 43000 1
21 44000 3
22 45000 2
23 47000 2
24 48000 1
25 65000 1
26 66000 1
27 68000 1
28 69000 1
29 71000 1
30 72000 2
31 73000 1
32 75000 1
33 76000 1
34 77000 1
35 78000 1
36 81000 1
37 82000 1
38 83000 2
39 86000 1
40 92000 1
41 93000 1
42 94000 1
43 95000 1
44 100000 1
45 105000 1
46 108000 1
47 112000 1

> Crew.data%>%group_by(Salary)%>%summarise(count=n())

`summarise()` ungrouping output (override with `.groups` argument)


# A tibble: 47 x 2
Salary count
<int> <int>
1 21000 1
2 22000 2
3 23000 1
4 24000 1
5 25000 2
6 26000 1
7 27000 1
8 28000 2
9 29000 2
10 30000 1
# ... with 37 more rows

> Crew.data%>%group_by(Salary)%>%summarise(mean(Salary))

`summarise()` ungrouping output (override with `.groups` argument)


# A tibble: 47 x 2
Salary `mean(Salary)`
<int> <dbl>
1 21000 21000
2 22000 22000
3 23000 23000
4 24000 24000
5 25000 25000
6 26000 26000
7 27000 27000
8 28000 28000
9 29000 29000
10 30000 30000
# ... with 37 more rows

Execute “mtcars” in built data in R and answer to the following questions


1. Enumerate all functions explained in the video for all categorical and numerical
variables of the data set.

Note: There are only numeric variables in the dataset


> data(mtcars) #importing the dataset
> str(mtcars)

'data.frame': 32 obs. of 11 variables:


$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...

> summary(mtcars$mpg)

Min. 1st Qu. Median Mean 3rd Qu. Max.


10.40 15.43 19.20 20.09 22.80 33.90

> summary(mtcars$cyl)

Min. 1st Qu. Median Mean 3rd Qu. Max.


4.000 4.000 6.000 6.188 8.000 8.000

> summary(mtcars$disp)

Min. 1st Qu. Median Mean 3rd Qu. Max.


71.1 120.8 196.3 230.7 326.0 472.0

> summary(mtcars$hp)
Min. 1st Qu. Median Mean 3rd Qu. Max.
52.0 96.5 123.0 146.7 180.0 335.0

> summary(mtcars$drat)

Min. 1st Qu. Median Mean 3rd Qu. Max.


2.760 3.080 3.695 3.597 3.920 4.930

> summary(mtcars$wt)

Min. 1st Qu. Median Mean 3rd Qu. Max.


1.513 2.581 3.325 3.217 3.610 5.424

> summary(mtcars$qsec)

Min. 1st Qu. Median Mean 3rd Qu. Max.


14.50 16.89 17.71 17.85 18.90 22.90

> summary(mtcars$vs)

Min. 1st Qu. Median Mean 3rd Qu. Max.


0.0000 0.0000 0.0000 0.4375 1.0000 1.0000

> summary(mtcars$am)

Min. 1st Qu. Median Mean 3rd Qu. Max.


0.0000 0.0000 0.0000 0.4062 1.0000 1.0000

> summary(mtcars$gear)

Min. 1st Qu. Median Mean 3rd Qu. Max.


3.000 3.000 4.000 3.688 4.000 5.000

> summary(mtcars$carb)

Min. 1st Qu. Median Mean 3rd Qu. Max.


1.000 2.000 2.000 2.812 4.000 8.000

You might also like