You are on page 1of 20

OMBAIML 301:

Basics of Artificial Intelligence & Machine Learning

Unit 2: R Essential Programming

By: Asst. Prof. Toshi Dave


▪ R is an Interpreted Programming
Language
Introduction To R
▪ Used for :
✓ Statistical inferences like
linear and nonlinear
modeling, classical statistical
tests, time-series analysis,
classification, clustering
✓ Data Analysis
✓ ML Algorithms
✓ Graphical Representation

3
R Data Types

Data Type Description Example


Represents real
Numeric numbers (floating- 3.14, -45.67
point)
Represents whole
Integer 42, -100
numbers
Represents text or
Character "Hello, R", 'Data'
strings
Represents boolean
Logical TRUE, FALSE
values

Represents complex
Complex 2 + 3i, -1 - 0.5i
numbers

Represents raw
Raw as.raw(1:5)
bytes

4
R Data Structures

Data Structure Description Example


A one-dimensional array
Vector of elements of the same c(1, 2, 3)
data type.

A two-dimensional array
with rows and columns,
Matrix matrix(1:6, nrow = 2)
containing elements of
the same data type.

A multi-dimensional
extension of matrices,
Array array(1:8, dim = c(2, 2, 2))
containing elements of
the same data type.

5
Data Structure Description Example
A flexible container that
can hold elements of
list(name = "John", age =
List different data types,
30)
including other data
structures.

A data structure for


factor(c("Male", "Female",
Factor categorical variables, with
"Male"))
predefined levels.

A two-dimensional, table-
data.frame(Name =
like structure for storing
Data Frame c("Alice", "Bob"), Age =
data, where columns can
c(25, 30))
be of different data types.

6
Importing Data To R

7
Exporting Data To R

8
Control Structures in R

Conditional Jumping out


Loops Functions
Statements of Loops

1) If
2) If-else 1) For 1) Break
3) If-else 2) While 2) Next 1) return
if-else 3) repeat

9
SUMMARY
Control Structure Description Example
Executes a block of
if (condition) { # Code
if code if a condition is
to execute }
TRUE.
Executes one block of
if (condition) { # Code
code if a condition is
if-else for TRUE } else { # Code
TRUE and another if it's
for FALSE }
FALSE.
if (condition1) { # Code
for condition1 } else if
Allows for multiple
(condition2) { # Code
if-else if-else conditions to be tested
for condition2 } else { #
in sequence.
Code if no condition is
met }
for (variable in
Iterates over a
for sequence) { # Code to
sequence of values.
execute }

10
Control Structure Description Example
Repeatedly executes a
while (condition) { #
while block of code as long as a
Code to execute }
condition is TRUE.
repeat { # Code to
Repeatedly executes a
execute if condition is
block of code until
repeat TRUE break # Exit the
explicitly stopped using
loop when a condition is
break.
met }
for (i in 1:10) { if (i == 5) {
break Exits a loop prematurely. break } # Exit loop when i
equals 5 }

11
Control Structure Description Example
for (i in 1:5) { if (i == 3)
Skips the current { next } # Skip iteration
next iteration and moves to when i equals 3 # Code
the next one in a loop. to execute for other
iterations }
my_function <-
function(x) { if (x > 10) {
Exits a function and return("Greater than
return
returns a value. 10") } else {
return("Less than or
equal to 10") } }

12
DESCRIPTIVE STATISTICS

13
Data Exploration

14
Qualitative & Quantitative Data

Qualitative Data Quantitative Data


▪ Represents categories or labels.
▪ Represents numerical measurements.
▪ Nominal properties (no inherent order).
▪ Ordinal or continuous properties (can be
▪ Discrete values (distinct categories).
ranked or ordered).
▪ Examples: Types of fruit, colors, marital
▪ Values can be discrete or continuous.
status.
▪ Examples: Age, height, income,
▪ Analyzed using frequency counts,
temperature.
percentages, and visual tools like bar
charts and pie charts.
▪ Analyzed using measures of central
tendency, measures of dispersion, and
statistical tests. Visualized using
histograms, scatter plots, and box plots.

15
MEASUREMENT of CENTRAL TENDENCY

✓ Mean represents the "average" value and is sensitive to outliers.

✓ Mode represents the most frequently occurring value(s) and is


suitable for categorical data.

✓ Median represents the middle value and is robust against outliers

16
Measure of Position

Measure of Position Description Notation


Q1 (25th percentile),
Divide data into four
Quartiles Q2 (median), Q3 (75th
equal parts.
percentile)
D1 (10th percentile),
Divide data into ten
Deciles D2, D3, ..., D9 (90th
equal parts.
percentile)

P1 (1st percentile), P5,


Divide data into 100
Percentiles P10, ..., P90, P95, P99
equal parts.
(99th percentile)

Generalization of
Quintiles, Septiles,
Quantiles quartiles, deciles, and
Noniles, etc.
percentiles.

17
Measure of Dispersion

Measure of Dispersion Description Formula

The difference between the Range = Maximum


Range maximum and minimum Value - Minimum
values in a dataset. Value

The average absolute


Median Absolute Deviation (MAD) deviation of data points from MAD = Median(
the median.

The average of the squared


Variance = Σ(xi -
Variance differences between each
Mean)² / (n - 1)
data point and the mean.

The square root of the


variance and provides a Standard Deviation =
Standard Deviation
more interpretable measure √(Variance)
of dispersion.

18
THANK YOU

You might also like