Babd Mid-Term

NATIONAL INSTITUTE OF FASHION
TECHNOLOGY
BUSINESS ANALYTICS AND BIG DATA
MID-TERM ASSIGNMENT
SUBMITTED BY:
Vanshika
P a g e 1 | 16
ACKNOWLEDGEMENT
We would like to express our heartfelt gratitude to all those who contributed to the creation
and completion of this document.
We extend our thanks to Asst. Prof. Akanksha Dayma for her expertise, thoughtful insights,
and dedication, which greatly enriched the content of this document.
Additionally, we acknowledge our college National Institute of Fashion Technology, Kangra for
providing the necessary resources and environment conducive to the development of this
material.
P a g e 2 | 16
Data types of R
1. Numeric
2. Complex
3. Logical
4. Characters
1. Numeric Data Type
The numeric data type is for numeric values which involves numbers (1234….)
NUM1 = 5
> NUM1
[1] 5
class(num1)
>
class(NUM1)
[1] "numeric"
A = 3.14
> class (3.14)
[1] "numeric"
2.Complex Data Type

The complex data type is to store numbers with an imaginary component.
> COMPLEX1=5+3i
> COMPLEX1
[1] 5+3i
> class(COMPLEX1)
[1] "complex"
COMPLEX1 = 5+3i
> COMPLEX1
[1] 5+3i
> class(COMPLEX1)
[1] "complex"
>
P a g e 3 | 16
3. Logical Data Type
The logical data type stores logical values like TRUE or FALSE.
> log1=FALSE
> log1
[1] FALSE
> class(log1)
[1] "logical"
log1 = FALSE
> class (log1)
[1] "logical"
4.Character Data Type

The character data type stores character values like name written in double
inverted commas.
> A = "pen"
>A
[1] "pen"
[1] " vans "

> A = " vans "
> class (A)
[1] "character"
TYPES OF OPERATORS:
• Assignment- to assign a value to variables. Can be used to assign value to a variable) It is
described in 3 ways: b = 1 or b<-1 or 1->b but if I want to write 1>-b then error will show.
• Arithmetic- used on numeric values only e.g.- +, -, *
• Relational- to show comparison
• Logical- And, OR
Assignment Operators
B a variable name (possibly quoted).
1 is the value to be assigned to B
P a g e 4 | 16
ARITHMETIC (+, -, *, /)
NUM1=100
> NUM2=25
> NUM1+NUM2
[1] 125
> NUM1-NUM2
[1] 75
> NUM1*NUM2
[1] 2500
> NUM1/NUM2
[1] 4
> NUM2-NUM1
[1] -75
> NUM2/NUM1
[1] 0.25
Q. Write a program where principal amount = 10,000, ROI= 10%, Time= 3

P=10000
> R=.1
> T=3
> P*R*T/100
[1] 30
Q. Calculate gross salary of a person in which gross salary =

basic+HRA+DA-PF
Basic= 60,000
HRA= 11% of basic 0.11
DA= 30% of basic 0.3
PF= 12% of basic 0.12
Ans.
> B=60000
> H=60000*11/100
>H
P a g e 5 | 16
[1] 6600
> D=60000*30/100
> P=60000*12/100
> B+H+D-P
[1] 77400
> S=B+H+D-P
>S
[1] 77400
>S
[1] 77400
Relational ‘’ < = > NOT EQUAL ‘’

A=70
> B=130
> A>B
[1] FALSE
p=20
> q=20
> p!=q
[1] FALSE
>
P=20
> Q=20
> P==Q
[1] TRUE
LOGICAL:
AND ( &), in this case there are 3 false 1 true
OR (|), in this case 1 false 3 true
In case of AND: &

LOG1=TRUE
> LOG2=FALSE
> LOG1&LOG2
[1] FALSE
> LOG2&LOG1
[1] FALSE
> LOG2&LOG2
[1] FALSE
> LOG1&LOG1
[1] TRUE
P a g e 6 | 16
In case of OR:
Log1=TRUE
> LOG2=FALSE
> LOG1|LOG2
[1] TRUE
> LOG2|LOG1
[1] TRUE
> LOG2|LOG2
[1] FALSE
> LOG1|LOG1
DATA STRUCTURES
1. Vector: Homogeneous single dimension data frame.
2. List: Heterogenous single dimension data frame (won’t change the values of the data
type of the variable)
3. Metrix: 2-dimensional homogenous data structure.
4. Array: Multi-dimensional homogenous data structure composition of matrix which
means we can stack one matrix over another matrix.
5. Factor: To change the categorical data into numerical data.
6. Data frame: Table or Excel Sheet, 2-dimensional heterogenous data structure,
dimensional means rows and columns, heterogenous means different data types, the
columns of tables is called variables.
1. Vector
It is a homogeneous single dimension data frame higher precedence is for
character.
Example1 – where vector is in numeric form.
VEC1
[1] 1 2 3
> VEC1<-c(1,2,3)
> VEC1
[1] 1 2 3
> CLASS(VEC1)
class (VEC1)
[1] "numeric"
P a g e 7 | 16
Example2 – where vector is in character form.
VEC= c("varun,Ekagr,bharti")
> VEC
[1] "varun,Ekagr,bharti"
class(VEC)
[1] "character"
Example3- Define a variable Shivangi, put values true, false, true

SHIVANGI=c(TRUE,FALSE,TRUE)
> class(SHIVANGI)
[1] "logical"
> MIXBAG1<-c(1,TRUE,2,FALSE)
> MIXBAG1
[1] 1 1 2 0
> class(MIXBAG1)
[1] "numeric"
> MIXBAG2<-c(1,"A",2,"B")
> MIXBAG2
[1] "1" "A" "2" "B"
> class(MIXBAG2)
[1] "character"
> MIXBAG<-c(1,"A",TRUE)
> MIXBAG3
> MIXBAG
[1] "1" "A" "TRUE"
> class(MIXBAG)
[1] "character"
HIGHEST PRECEDENCE IS OF CHARACTER>NUMERIC>LOGIC

> MIXBAG2[1]
[1] "1"
> MIXBAG2[2]
[1] "A"
> MIXBAG2[1:3]
[1] "1" "A" "2"
P a g e 8 | 16
DEFINE A VERIABLE M1 PUT THE VALUE 10 TO 20 NOW, FITCH
THE 8TH VALUE
> M1=c(10:20)
> M1
[1] 10 11 12 13 14 15 16 17 18 19 20
> M1[8]
[1] 17
2. List
Heterogenous single dimension data frame (won’t change the values of the data
type of the variable)
L1=list(1,"a",T)
> L1
[[1]]
[1] 1
[[2]]
[1] "a"
[[3]]
[1] TRUE
> class(L1[[1]])
[1] "numeric"
> class(L1[[2]])
[1] "character"
> class(L1[[3]])
[1] "logical"
Three list of vector

l2=list(c(1,2,3),c("a","b","c"),c(TRUE,FALSE,TRUE))
> L2
Error: object 'L2' not found
> L2=list(c(1,2,3),c("a","b","c"),c(TRUE,FALSE,TRUE))
> L2
[[1]]
[1] 1 2 3
[[2]]
[1] "a" "b" "c"
[[3]]
[1] TRUE FALSE TRUE
P a g e 9 | 16
3. Matrix
2-dimensional homogenous data structure
m1=matrix(c(1,2,3,4,5,6))
> m1
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
[5,] 5
[6,] 6
> m1=matrix(c(1,2,3,4,5,6),nrow=2,ncol=3)
> m1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> m1=matrix(c(1,2,3,4,5,6),nrow=2,ncol=3,byrow=TRUE)
> m1
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
4. Aryee
Multi dimensional homogeneous data structure ,also is a composition of matrix
which means we can stack one matrix over another matrix
vec2=(7:12)
> vec2
[1] 7 8 9 10 11 12
> vec2=(7:12)
> vec1=(1:6)
> array(c(vec1,vec2),dim=c(2,3,2))
, , 1
[,1] [,2] [,3]

[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]

[1,] 7 9 11
[2,] 8 10 12
P a g e 10 | 16
2 means row
3 means column
2 means dimension of number of matrix
a1[row,column,dimension ]
a1=array(c(vec1,vec2),dim=c(2,3,2))
a1[1,2,2]
[1] 9
5. Factores
Factors become very important when we are making machine learning modal,
machine, learning modal cannot implement on categorical data, data has to
work on numerical data.
To change the categorical data into numerical data.

jam=c("blue","green","yellow")
> jam
[1] "blue" "green" "yellow"
> as.factor(co1)
Error: object 'co1' not found
> factor(jam)
[1] blue green yellow
Levels: blue green yellow
6. Data Frame
Table or Excel Sheet
(2 dimensional heterogenous data structure)
(dimensional means rows and columns, heterogenous means different data
types) the cloumns of tables is called variables
data.frame(fruit_name=c("apple","banana","guava"),fruit_cost=c(10,20,30
))
fruit_name fruit_cost
1 apple 10
2 banana 20
3 guava 30
P a g e 11 | 16
Inbuilt function
1.Str
2.Head
3.Tail
4.Table
5.Min
6.Max
7.range
1. Str(iris)
str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1
1 1 1 1 1 ...
2. Tail(iris)
tail(iris)
Sepal.Length Sepal.Width Petal.Length
145 6.7 3.3 5.7
146 6.7 3.0 5.2
147 6.3 2.5 5.0
148 6.5 3.0 5.2
149 6.2 3.4 5.4
150 5.9 3.0 5.1
Petal.Width Species
145 2.5 virginica
146 2.3 virginica
147 1.9 virginica
148 2.0 virginica
149 2.3 virginica
150 1.8 virginica
P a g e 12 | 16
3. Head(iris)
head(iris)
Sepal.Length Sepal.Width Petal.Length
1 5.1 3.5 1.4
2 4.9 3.0 1.4
3 4.7 3.2 1.3
4 4.6 3.1 1.5
5 5.0 3.6 1.4
6 5.4 3.9 1.7
Petal.Width Species
1 0.2 setosa
2 0.2 setosa
3 0.2 setosa
4 0.2 setosa
5 0.2 setosa
6 0.4 setosa
4. Table(iris$Species)
table(iris$Species)
setosa versicolor virginica

50 50 50
> min(iris$Sepal.Length)
[1] 4.3
> max(iris$Sepal.Width)
[1] 4.4
> mean(iris$Petal.Length)
[1] 3.758
> range(iris$Petal.Width)
[1] 0.1 2.5
P a g e 13 | 16
Decision making
if.. &..else
if(iris$Sepal.Length[1]>4){print("sepal length is greater than 4")}

[1] "sepal length is greater than 4"
> if(iris$Sepal.Length[1]>4){print("sepal length is greater than 4")}el
se{print("sepal length is greater than 4")}
[1] "sepal length is greater than 4"
Looping statement
For loop
vec1=1:9
> for(i in vec1){print(i+5)}
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
P a g e 14 | 16
While loop
USER-DEFINED FUNCTION
[1] 2000
Data manipulation
Data manipulation is the process of filtering, aggregation and summarising the
data.
> install.packages("dplyr")
Now library is the function to call package
>library(dplyr)
CSV- comma separated values
> library(dplyr)
> laptops<-read.csv(file.choose(),header=T)
> View(laptops)
Pipe Function
%>5 is used to connect different function or connect different data frame and
data laptops%>%select(1,2)->laptops1_2
laptops%>%select(1,2)->laptops3_6
P a g e 15 | 16
select funtion
piper funtion is used
select funtion
Filter
> laptop %>% filter(brand=="acer")->laptop_acer
> View(laptop_acer)
> laptop %>% filter(discount>10)->laptop_discount

> View(Discount)
View(laptop_discount)
> laptop %>% filter(display_size>15)->laptop_display15

> View(laptop_display15)
> laptop %>% filter(brand=="acer"&old_price>25000)->laptop_price

> View(laptop_price)
COMBINING SELECT & FILTER

laptop %>% select("brand","model","latest_price") %>% filter(brand=="acer")-
>laptopfilterselect
laptop %>% select("brand","model","latest_price","ram_gb",ram_type) %>%

filter(brand=="acer"&ram_gb=="4 GB GB")->laptop11
> View(laptop11)
P a g e 16 | 16

Babd Mid-Term

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Babd Mid-Term

Uploaded by

Copyright:

Available Formats

NATIONAL INSTITUTE OF FASHION

BUSINESS ANALYTICS AND BIG DATA

1. Numeric Data Type

2.Complex Data Type

4.Character Data Type

[1] " vans "

1 is the value to be assigned to B

Q. Write a program where principal amount = 10,000, ROI= 10%, Time= 3

Q. Calculate gross salary of a person in which gross salary =

Relational ‘’ < = > NOT EQUAL ‘’

In case of AND: &

Example3- Define a variable Shivangi, put values true, false, true

HIGHEST PRECEDENCE IS OF CHARACTER>NUMERIC>LOGIC

Three list of vector

[,1] [,2] [,3]

[,1] [,2] [,3]

To change the categorical data into numerical data.

setosa versicolor virginica

if(iris$Sepal.Length[1]>4){print("sepal length is greater than 4")}

CSV- comma separated values

> laptop %>% filter(discount>10)->laptop_discount

> laptop %>% filter(display_size>15)->laptop_display15

> laptop %>% filter(brand=="acer"&old_price>25000)->laptop_price

COMBINING SELECT & FILTER

laptop %>% select("brand","model","latest_price","ram_gb",ram_type) %>%

You might also like