You are on page 1of 16

NATIONAL INSTITUTE OF FASHION

TECHNOLOGY

BUSINESS ANALYTICS AND BIG DATA

MID-TERM ASSIGNMENT

SUBMITTED BY:

Vanshika

P a g e 1 | 16
ACKNOWLEDGEMENT

We would like to express our heartfelt gratitude to all those who contributed to the creation
and completion of this document.
We extend our thanks to Asst. Prof. Akanksha Dayma for her expertise, thoughtful insights,
and dedication, which greatly enriched the content of this document.
Additionally, we acknowledge our college National Institute of Fashion Technology, Kangra for
providing the necessary resources and environment conducive to the development of this
material.

P a g e 2 | 16
Data types of R
1. Numeric
2. Complex
3. Logical
4. Characters

1. Numeric Data Type

The numeric data type is for numeric values which involves numbers (1234….)
NUM1 = 5
> NUM1
[1] 5

class(num1)
>
class(NUM1)
[1] "numeric"
A = 3.14
> class (3.14)
[1] "numeric"

2.Complex Data Type


The complex data type is to store numbers with an imaginary component.

> COMPLEX1=5+3i
> COMPLEX1
[1] 5+3i
> class(COMPLEX1)
[1] "complex"
COMPLEX1 = 5+3i
> COMPLEX1
[1] 5+3i
> class(COMPLEX1)
[1] "complex"

>

P a g e 3 | 16
3. Logical Data Type
The logical data type stores logical values like TRUE or FALSE.
> log1=FALSE
> log1
[1] FALSE
> class(log1)
[1] "logical"
log1 = FALSE
> class (log1)
[1] "logical"

4.Character Data Type


The character data type stores character values like name written in double
inverted commas.
> A = "pen"
>A
[1] "pen"

[1] " vans "


> A = " vans "
> class (A)
[1] "character"

TYPES OF OPERATORS:
• Assignment- to assign a value to variables. Can be used to assign value to a variable) It is
described in 3 ways: b = 1 or b<-1 or 1->b but if I want to write 1>-b then error will show.
• Arithmetic- used on numeric values only e.g.- +, -, *
• Relational- to show comparison
• Logical- And, OR

Assignment Operators
B a variable name (possibly quoted).

1 is the value to be assigned to B

P a g e 4 | 16
ARITHMETIC (+, -, *, /)
NUM1=100
> NUM2=25
> NUM1+NUM2
[1] 125
> NUM1-NUM2
[1] 75
> NUM1*NUM2
[1] 2500
> NUM1/NUM2
[1] 4
> NUM2-NUM1
[1] -75
> NUM2/NUM1
[1] 0.25

Q. Write a program where principal amount = 10,000, ROI= 10%, Time= 3


P=10000
> R=.1
> T=3
> P*R*T/100
[1] 30

Q. Calculate gross salary of a person in which gross salary =


basic+HRA+DA-PF
Basic= 60,000
HRA= 11% of basic 0.11
DA= 30% of basic 0.3
PF= 12% of basic 0.12

Ans.
> B=60000
> H=60000*11/100
>H

P a g e 5 | 16
[1] 6600
> D=60000*30/100
> P=60000*12/100
> B+H+D-P
[1] 77400
> S=B+H+D-P
>S
[1] 77400
>S
[1] 77400

Relational ‘’ < = > NOT EQUAL ‘’


A=70
> B=130
> A>B
[1] FALSE

p=20
> q=20
> p!=q
[1] FALSE

>

P=20
> Q=20
> P==Q
[1] TRUE

LOGICAL:
AND ( &), in this case there are 3 false 1 true
OR (|), in this case 1 false 3 true

In case of AND: &


LOG1=TRUE
> LOG2=FALSE
> LOG1&LOG2
[1] FALSE
> LOG2&LOG1
[1] FALSE
> LOG2&LOG2
[1] FALSE
> LOG1&LOG1
[1] TRUE

P a g e 6 | 16
In case of OR:
Log1=TRUE
> LOG2=FALSE
> LOG1|LOG2
[1] TRUE
> LOG2|LOG1
[1] TRUE
> LOG2|LOG2
[1] FALSE
> LOG1|LOG1

DATA STRUCTURES
1. Vector: Homogeneous single dimension data frame.
2. List: Heterogenous single dimension data frame (won’t change the values of the data
type of the variable)
3. Metrix: 2-dimensional homogenous data structure.
4. Array: Multi-dimensional homogenous data structure composition of matrix which
means we can stack one matrix over another matrix.
5. Factor: To change the categorical data into numerical data.
6. Data frame: Table or Excel Sheet, 2-dimensional heterogenous data structure,
dimensional means rows and columns, heterogenous means different data types, the
columns of tables is called variables.

1. Vector
It is a homogeneous single dimension data frame higher precedence is for
character.
Example1 – where vector is in numeric form.
VEC1
[1] 1 2 3
> VEC1<-c(1,2,3)
> VEC1
[1] 1 2 3
> CLASS(VEC1)
class (VEC1)
[1] "numeric"

P a g e 7 | 16
Example2 – where vector is in character form.
VEC= c("varun,Ekagr,bharti")
> VEC
[1] "varun,Ekagr,bharti"
class(VEC)
[1] "character"

Example3- Define a variable Shivangi, put values true, false, true


SHIVANGI=c(TRUE,FALSE,TRUE)
> class(SHIVANGI)
[1] "logical"

> MIXBAG1<-c(1,TRUE,2,FALSE)
> MIXBAG1
[1] 1 1 2 0
> class(MIXBAG1)
[1] "numeric"
> MIXBAG2<-c(1,"A",2,"B")
> MIXBAG2
[1] "1" "A" "2" "B"
> class(MIXBAG2)
[1] "character"
> MIXBAG<-c(1,"A",TRUE)
> MIXBAG3
> MIXBAG
[1] "1" "A" "TRUE"

> class(MIXBAG)
[1] "character"

HIGHEST PRECEDENCE IS OF CHARACTER>NUMERIC>LOGIC


> MIXBAG2[1]
[1] "1"
> MIXBAG2[2]
[1] "A"
> MIXBAG2[1:3]
[1] "1" "A" "2"

P a g e 8 | 16
DEFINE A VERIABLE M1 PUT THE VALUE 10 TO 20 NOW, FITCH
THE 8TH VALUE
> M1=c(10:20)
> M1
[1] 10 11 12 13 14 15 16 17 18 19 20
> M1[8]
[1] 17

2. List
Heterogenous single dimension data frame (won’t change the values of the data
type of the variable)
L1=list(1,"a",T)
> L1
[[1]]
[1] 1

[[2]]
[1] "a"

[[3]]
[1] TRUE

> class(L1[[1]])
[1] "numeric"
> class(L1[[2]])
[1] "character"
> class(L1[[3]])
[1] "logical"

Three list of vector


l2=list(c(1,2,3),c("a","b","c"),c(TRUE,FALSE,TRUE))
> L2
Error: object 'L2' not found
> L2=list(c(1,2,3),c("a","b","c"),c(TRUE,FALSE,TRUE))
> L2
[[1]]
[1] 1 2 3

[[2]]
[1] "a" "b" "c"

[[3]]
[1] TRUE FALSE TRUE

P a g e 9 | 16
3. Matrix
2-dimensional homogenous data structure
m1=matrix(c(1,2,3,4,5,6))
> m1
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
[5,] 5
[6,] 6
> m1=matrix(c(1,2,3,4,5,6),nrow=2,ncol=3)
> m1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> m1=matrix(c(1,2,3,4,5,6),nrow=2,ncol=3,byrow=TRUE)
> m1
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6

4. Aryee
Multi dimensional homogeneous data structure ,also is a composition of matrix
which means we can stack one matrix over another matrix
vec2=(7:12)
> vec2
[1] 7 8 9 10 11 12
> vec2=(7:12)
> vec1=(1:6)
> array(c(vec1,vec2),dim=c(2,3,2))
, , 1

[,1] [,2] [,3]


[1,] 1 3 5
[2,] 2 4 6

, , 2

[,1] [,2] [,3]


[1,] 7 9 11
[2,] 8 10 12

P a g e 10 | 16
2 means row
3 means column
2 means dimension of number of matrix

a1[row,column,dimension ]

a1=array(c(vec1,vec2),dim=c(2,3,2))

a1[1,2,2]
[1] 9

5. Factores
Factors become very important when we are making machine learning modal,
machine, learning modal cannot implement on categorical data, data has to
work on numerical data.

To change the categorical data into numerical data.


jam=c("blue","green","yellow")
> jam
[1] "blue" "green" "yellow"
> as.factor(co1)
Error: object 'co1' not found
> factor(jam)
[1] blue green yellow
Levels: blue green yellow

6. Data Frame
Table or Excel Sheet
(2 dimensional heterogenous data structure)
(dimensional means rows and columns, heterogenous means different data
types) the cloumns of tables is called variables
data.frame(fruit_name=c("apple","banana","guava"),fruit_cost=c(10,20,30
))
fruit_name fruit_cost
1 apple 10
2 banana 20
3 guava 30

P a g e 11 | 16
Inbuilt function
1.Str
2.Head
3.Tail
4.Table
5.Min
6.Max
7.range
1. Str(iris)
str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1
1 1 1 1 1 ...

2. Tail(iris)
tail(iris)
Sepal.Length Sepal.Width Petal.Length
145 6.7 3.3 5.7
146 6.7 3.0 5.2
147 6.3 2.5 5.0
148 6.5 3.0 5.2
149 6.2 3.4 5.4
150 5.9 3.0 5.1
Petal.Width Species
145 2.5 virginica
146 2.3 virginica
147 1.9 virginica
148 2.0 virginica
149 2.3 virginica
150 1.8 virginica

P a g e 12 | 16
3. Head(iris)
head(iris)
Sepal.Length Sepal.Width Petal.Length
1 5.1 3.5 1.4
2 4.9 3.0 1.4
3 4.7 3.2 1.3
4 4.6 3.1 1.5
5 5.0 3.6 1.4
6 5.4 3.9 1.7
Petal.Width Species
1 0.2 setosa
2 0.2 setosa
3 0.2 setosa
4 0.2 setosa
5 0.2 setosa
6 0.4 setosa

4. Table(iris$Species)

table(iris$Species)

setosa versicolor virginica


50 50 50
> min(iris$Sepal.Length)
[1] 4.3
> max(iris$Sepal.Width)
[1] 4.4
> mean(iris$Petal.Length)
[1] 3.758
> range(iris$Petal.Width)
[1] 0.1 2.5

P a g e 13 | 16
Decision making
if.. &..else

if(iris$Sepal.Length[1]>4){print("sepal length is greater than 4")}


[1] "sepal length is greater than 4"
> if(iris$Sepal.Length[1]>4){print("sepal length is greater than 4")}el
se{print("sepal length is greater than 4")}
[1] "sepal length is greater than 4"

Looping statement
For loop
vec1=1:9
> for(i in vec1){print(i+5)}
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14

P a g e 14 | 16
While loop

USER-DEFINED FUNCTION

[1] 2000

Data manipulation
Data manipulation is the process of filtering, aggregation and summarising the
data.

> install.packages("dplyr")
Now library is the function to call package
>library(dplyr)

CSV- comma separated values

> library(dplyr)
> laptops<-read.csv(file.choose(),header=T)
> View(laptops)

Pipe Function
%>5 is used to connect different function or connect different data frame and
data laptops%>%select(1,2)->laptops1_2
laptops%>%select(1,2)->laptops3_6
P a g e 15 | 16
select funtion
piper funtion is used
select funtion

Filter
> laptop %>% filter(brand=="acer")->laptop_acer
> View(laptop_acer)

> laptop %>% filter(discount>10)->laptop_discount


> View(Discount)
View(laptop_discount)

> laptop %>% filter(display_size>15)->laptop_display15


> View(laptop_display15)

> laptop %>% filter(brand=="acer"&old_price>25000)->laptop_price


> View(laptop_price)

COMBINING SELECT & FILTER


laptop %>% select("brand","model","latest_price") %>% filter(brand=="acer")-
>laptopfilterselect

laptop %>% select("brand","model","latest_price","ram_gb",ram_type) %>%


filter(brand=="acer"&ram_gb=="4 GB GB")->laptop11
> View(laptop11)

P a g e 16 | 16

You might also like