You are on page 1of 41

BTL ACTIVITY- MARKETING STRATEGY

OF YBI FOUNDATION EDUCATION


PRIVATE LIMITED
AN INTERNSHIP REPORT
Submitted by

CHAUDHARI ROHANKUMAR RAMESHBHAI


200490111006

In partial fulfillment for the award of the degree of

BACHELOR OF ENGINEER
IN
ELECTRONICS & COMMUNICATION

SITARAMBHAI NARANIJI PATEL INSTITUTE OF


TECHNOLOGY & RESEARCH CENTER, UMRAKH

Gujarat Technological University, Ahmedabad


July – August 2023
SITARAMBHAI NARANIJI PATEL INSTITUTE OF
TECHNOLOGY & RESEARCH CENTER, UMRAKH

CERTIFICATE

This is to certify that the project report submitted along with the project entitled
Python Programming with Data Structure and Algorithm has been carried
out by Chaudhari RohanKumar Rameshbhai under my guidance in partial
fulfilment for the degree of Bachelor of Engineer in Electronics &
Communication, 6th semester of Gujarat Technological University, Ahmedabad
during the academic year 2022-2023.

Prof. Mayank A Patel Dr. Salman R Bombaywala


Internal Guide Head of Department
CERTIFICATE
SITARAMBHAI NARANIJI PATEL INSTITUTE OF
TECHNOLOGY & RESEARCH CENTER, UMRAKH

DECLARATION
We hereby declare the Internship report submitted along with the internship
entitled maintenance Python Programming with Data Structures and
Algorithms submitted in partial fulfilment for the degree of Bachelor of
Engineer in Electronics & Communication Engineering to Gujarat
Technological University, Ahmedabad, is a bonafide record of original project
work carried out by me at YBI FOUNDATION EDUCATION PRIVATE
LIMITED under the supervision of Mr. Anup Yadav (External guide) and
that no part of this report has been directly copied from any student report or
taken from any other source, without providing due reference.

Name of Student Sign of Student

Chaudhari RohanKumar
ACKKNOWLEDGMENT

We wish to express our sincere gratitude to our external guide Mr. Anup Yadav
for continuously guidance me at the industry and answering all my doubts with
patience. The best organization called Python programming with data structure
and algorithms center giving me.
I have people Mr. Anup Yadav like around us who are always ready to help and
guides us.
Also thanks to our internal guide Prof. Mayank A Patel for Helping us through
our internship and advice in completing this internship.
Last but not least a huge thanks to Head of Department Dr.Salman
Bombaywala, principal DR. Piyush Jain and all the teaching staff of
SNPIT&RC for providing me with all the guidance, infrastructure and skills
required in professional world.
Thus, in conclusion to the above said, we Once again thanks the staff member
of YBI FOUNDATION EDUCATION PRIVATE LIMITED their valuable
support in completion of internship.
ABSTRACT

Significant improvements have been made in numerous domains as a result of


combining Python with data structures and algorithms. Python's simplicity has
lowered entrance barriers in computer science education, making it easier for
students to understand challenging ideas. Python's rapid prototyping capabilities
are advantageous to sectors like finance, bioinformatics, and machine learning
since they allow for the quick creation and testing of complex algorithms.
In conclusion, Python's symbiotic relationship with data structures and
algorithms supports contemporary software development and problem-solving.
The language is an essential tool for both newcomers and seasoned
professionals due to its simple syntax, huge libraries, and versatility. Python's
significance in forming novel solutions through efficient data structure and
algorithm use is still crucial as technology develops.
TABLE OF CONTENTS
Chapter 1: CompanyIntroduction................................................................. 1
1.1 History of Company ................................................................... 1

1.2 Service of Company ................................................................... 2


Chapter 2: Introduction To Python ............................................................. 3

2.1 Python ........................................................................................ 3

2.2 Zen of python............................................................................. 5

2.3 Popularity of python language.................................................. 5

Chapter 3: Introduction To Google Colab .................................................. 6

Chapter 4: Python Libraries ........................................................................ 7

4.1 NumPy ..................................................................................... .7


4.2 Pandas ......................................................................................... 9
4.3 Mathplotlib ................................................................................ 11

4.4 Seaborn ..................................................................................... 12

4.5 SkLearn ..................................................................................... 14


Chapter 5: Read Data As Data Frame ..................................................... 15

Chapter 6: Explore Data Frame ............................................................... 16


Chapter7: Train Test Split ........................................................................ 21

Chapter 8: Simple Linear Regression Models ........................................ 23

Chapter 9: Logistic Regression Model ..................................................... 25

Chapter 10: Final Project........................................................................... 29

Chapter 11: Conclusion .............................................................................. 33

Chapter 12: Reference ................................................................................ 34


200490111006

CHAPTER 1: COMPANY INTRODUCATION

1.1 History of Company:-


YBI Foundation Education Private Limited is an online education platform that
enables individuals to develop their professional potential in the most engaging
learning environment. Online education is a fundamental disruption to the
traditional model and will be having a far- reaching impact.

Fig 1. Company’s Logo


A non-profit edutech firm with headquarters in Delhi, YBI Foundation wants to
help young people advance in the field of new technology. For students,
academics, and practitioners, they provide a blend of online and offline
techniques to introduce new skills, knowledge, and technologies. They support
the idea that students can learn at any time, anyplace.
For students to succeed in data science, business analytics, machine learning,
cloud computing, and big data, the platform offers free online instructor-led
classes. They want to concentrate on innovation, creativity, and technology
approaches while being current with market demands. They make an effort to
assist students in achieving the best standards.

Vision
Building Careers of Tomorrow.

Mission
To provide opportunities to advance tour professional journey thorugh rigiorous
online program that offer personalised support, developed in collaboration with
best-in-class faculty and industry professionals.

Gujarat Technological University 1 SNPIT & RC, Umrakh


200490111006

1.2 Services of Company:-


We assist students in achieving this goal by assisting them in acquiring the most
recent skills and providing them with hands-on projects. Training and
internships are a crucial component of a student's overall development, which is
why AICTE and Universities have made them mandatory for every engineer
and MCA to experience.
1.2.1 Machine Learning and Internship Program

With Microsoft MTA Certification, learn machine learning, an application of


artificial intelligence (AI) that gives computers the capacity to autonomously
learn from experience without being explicitly programmed.

1.2.2 Data Science and Internship Program

Learn Data science and how to use scientific methods, processes, algorithms
and systems to extract knowledge and insights from structured and unstructured
data as one of the hottest professions in the market today, bundled with
Microsoft MTA Certification

1.2.3 Business Analytics

Learn about business analytics and how it actually helps businesses automate
and improve their business processes. Data-driven businesses use their data as a
corporate asset and use it to gain a competitive edge by using the insights to
uncover novel patterns and connections.

Gujarat Technological University 2 SNPIT & RC, Umrakh


200490111006

CHAPTER 2: INTRODUCTION TO PYTHON


2.1 Python
Python is an extremely popular, interpreted, object-oriented, high-level, general
purpose programming language with dynamical semantics. Whether they are
aware of it or not, consumers use many Python-powered devices on a regular
basis. Code readability is prioritized in its design philosophy, which makes
heavy use of indentation. Python uses garbage collection and dynamic typing. It
supports a variety of paradigms for programming, including functional, object-
oriented, and structured programming. Due to its extensive standard library, it is
frequently referred to as a "batteries include" language.
On February 20, 1991, Guido van Rossum released the first version of Python.
Although you may be familiar with the python as a huge snake, the name of the
computer language comes from the Monty Python Flying Circus comedy sketch
show that aired on BBC in the past. Python was developed by Guido van
Rossum as a replacement for the ABC programming language and originally
made public in 1991 as Python0.9.0. In 2000, Python 2.0 was made available.
2008 saw the release of Python 3.0, a significant update that wasn't entirely
backwards compatible with older versions. The final Python 2 release was
Python 2.7.18, which was made available in 2020. One of the most widely used
programming languages is Python.
Python was created by just one person, which is one of its great features. New
programming languages are typically created and released by businesses with a
large number of experts, and because of copyright laws, it is exceedingly
difficult to identify any of the individuals participating in the project. Python is
a unique case.
Of course, Guido van Rossum did not create and evolve each and every part of
Python. Python's rapid global adoption is the product of the tireless efforts of
thousands of programmers, testers, users, and fans, yet it must be acknowledged
that Guido was the one who had the original idea.
Python is maintained by the python software foundation a non-profit
membership organization and a community devoted to developing, improving,
expanding, and popularizing the python language and its environment.

There are also a couple of factors that make python great for learning:
1. It is easy to learn – The times needs to learn python is shortest than for
many other language; this means that it’s possible to start the actual
programming faster.

Gujarat Technological University 3 SNPIT & RC, Umrakh


200490111006

2. It is easy to use for writing new software – it’s often possible to write
code faster when using python.

3. It’s open source, is easy to obtain , install and deploy – Python is free,
open and multiplatform; not all languages can boast that.

4. It extendable. Python code can be written in other languages and user can
add low – level modules to python interpreter to customize their tools.

5. It has a multiple standard library.

6. Some language required you to modify code to run on different platform,


but python is a cross-platform language, which you can run the same
code on any operating system with a Python interpreter.
Python is a general purpose, open source, high level programming language and
also provide number of libraries and frameworks. Python has popularly because
of its simplicity, easy syntax and user- environment.

The usage of python as follows:


1. Desktop Application
2. Web Application
3. Data Science
4. Artificial Intelligence
5. Machine Learning
6. Scientific Computing
7. Robotics
8. Internet of things
9. Gaming
10. Mobile Apps
Organization using Python:
1. Google
2. Yahoo
3. You Tube
4. Mozilla
5. Dropbox
6. Microsoft
7. Cisco
8. Spotify

Gujarat Technological University 4 SNPIT & RC, Umrakh


200490111006

2.2 Zen of Python:


The Python programming language's architecture is influenced by a set of 19
"guiding principles" called "The Zen of Python" for creating computer
applications. This collection of guidelines was created by Tim Peters and
published on the Python mailing list in 1999.Regarding Guide van Russum,
the original creator of the Python programming language, Peters' list left
open a 20th principle marked "for Guide for an in". The 20th Principles
position is still open.
The language's official python Enhancement proposals included Peters' Zen
of Python as entry number 20 and made it available for public use.
Additionally, the Python interpreter includes it as an Easter egg; by entering
import this, it can be seen.

2.3 Popularity of Python Language:


By looking at how frequently language tutorials are searched on Google, the
PYPL Popularity of Programming Language index is developed.
The more frequently a language tutorial is searched, the more popular it is
thought to be. It is a foreshadowing factor. The uncooked information is
from Google Trends.

Fig 2. Popularity of Language

Gujarat Technological University 5 SNPIT & RC, Umrakh


200490111006

CHAPTER 3: INTRODUCATION TO GOOGLE COLAB

Google Colab is a cloud-based jupyter notebook environment from google colab


or “Colaboratory”, allow you to write and execute python in your brower, with
• Zero configuration required
• Access to GPUs free of charge
• Easy sharing

Creating Google Colab Notebook


• Sign in with your Google Account.
• Once you’ve signed in to colab, you can create a new notebook by cliking
on ‘File’ → ‘New notebook’.
• Google colab allows you to write python code as well as text – using
markdown cells – to include rich text and media.
• Below is a code cell. Once the toolbar button indicate connected, click in
the execute the contents in the following ways:
i. Click the play icon in the left gutter of the cell.
ii. Type Shift+Enter to run the cell in place.
iii. Type Shift + Enter to run the cell and move focus to the next cell .
iv. Type Alt + Enter to run the cell and insert a new code cell
immediately below it.
• This is a text cell. You can doubled – click to edit this cell. Text cells use
markdown syntax.
• To save the project go to file → Save a copy in drive, save a copy as a
Github Gish, Save a copy in Github, Save.
• To download the project go to file → Download.ipynb, Download .py
• It is easy to share.

Gujarat Technological University 6 SNPIT & RC, Umrakh


200490111006

CHAPTER 4: PYTHON LIBRARIES


4.1 NumPy
Large, multi-dimensional arrays and matrices are supported by NumPy, a library
for the Python programming language, along with a substantial number of high-
level mathematical operations that may be performed on these arrays. Jim
Hugunin originally developed Numeric, which was NumPy's forerunner, with
assistance from a number of other programmers. Travis Oliphant developed
NumPy in 2005 by heavily altering Numeric to incorporate capabilities of the
rival Numarray. Numerous people have contributed to the open-source program
NumPy. A Num FOCUS fiscally subsidized project is NumPy.

The equivalent of arrays in Python are lists, although they take a long time to
execute. The goal of NumPy is to offer array objects that are up to 50 times
faster than conventional Python lists. The NumPy array object is referred to as
ndarray, and it has a number of supporting methods that make using ndarray
relatively simple. In data research, where speed and resources are crucial, arrays
are employed a lot.

In contrast to lists, NumPy arrays are stored in a single continuous location in


memory, making it very easy for programs to access and manipulate them. In
computer science, this characteristic is known as locality of reference. This is
the primary factor that makes NumPy faster than lists. Additionally, it is
enhanced to function with modern CPU architectures.

There are the following advantages of using NumPy for data analysis.

1. NumPy performs array-oriented computing.


2. It efficiently implements the multidimensional arrays.
3. It performs scientific computations.
4. It is capable of performing Fourier Transform and reshaping the data
stored in multidimensional arrays.
5. NumPy provides the in-built functions for linear algebra and random
number generation.

Gujarat Technological University 7 SNPIT & RC, Umrakh


200490111006

❖ Example

Convert the following 1-D array with 12 elements into a 3-D array.
The outermost dimension will have 2 arrays that contains 3 arrays, each with 2
elements:

Fig 4.1. Code

Output

Fig 4.2. Output

Gujarat Technological University 8 SNPIT & RC, Umrakh


200490111006

4.2 Pandas
For the purpose of manipulating and analyzing data, the Python programming
language has a software package called pandas. It includes specific data
structures and procedures for working with time series and mathematical tables.
It is free software distributed under the BSD license's three clauses. The word is
derived from "panel data," a phrase used in econometrics to refer to data sets
that contain observations for the same persons throughout a range of time
periods. Its name is a pun on "Python data analysis" in and of itself. When Wes
McKinney worked as a researcher at AQR Capital from 2007 to 2010, he began
creating the pandas that would eventually become famous.

Pandas is primarily used for tabular data analysis and related manipulation in
Data Frames. Data can be imported into Pandas from a variety of file types,
including Microsoft Excel, JSON, Parquet, SQL database tables, and comma-
separated values. Data cleaning and wrangling functions, as well as operations
like merging, restructuring, and choosing, are all supported by Pandas. Many of
the R programming language's proven functionality for working with Data
Frames were brought into Python with the introduction of pandas.

With the aid of Pandas, we can examine large data sets and draw conclusions
based on statistical principles. Pandas can organize disorganized data sets,
making them readable and useful. In data science, relevant data is crucial.

Pandas provides you with information on the data. such as: • Is there a
relationship between two or more columns?

• What is the median value?

Maximum value?

Minimum value?

Rows that are irrelevant or contain incorrect data, such as empty or NULL
values, can also be deleted by Pandas. This process is known as data cleaning.

Gujarat Technological University 9 SNPIT & RC, Umrakh


200490111006

❖ Example

With the help of Pandas library display the data in table form.

Fig 4.3. Code

Output

Fig 4.4. Output

Gujarat Technological University 10 SNPIT & RC, Umrakh


200490111006

4.3 Matplotlib

Python's Matplotlib toolkit provides a complete tool for building static,


animated, and interactive visualizations. Matplotlib makes difficult things
possible and simple things easy.

• Produce plots fit for publication.

• Create zoomable, pannable, and updateable interactive figures.

• Modify the visual appearance and design.

• Export to a variety of file types.

• Integrate with Graphical User Interfaces and Jupyter Lab.

• Employ one of the many third-party packages based on Matplotlib.

For 2D displays of arrays, Matplotlib is a fantastic Python visualization library.


A multi-platform data visualization package called Matplotlib was created to
deal with the larger SciPy stack and is based on NumPy arrays. One of
visualization's biggest advantages is that it gives us visual access to vast
volumes of data in forms that are simple to understand. There are numerous
plots in Matplotlib, including line, bar, scatter, histogram, etc.

❖ Example
Draw a line in a diagram from position (0,0) to position (6,250):

Fig 4.5. Code

Gujarat Technological University 11 SNPIT & RC, Umrakh


200490111006

Output

Fig 4.6. Output

4.4 Seaborn
A package called Seaborn uses Matplotlib as its foundation to plot graphs. In
order to see random distributions, it will be used.
Python's Seaborn visualization module is fantastic for plotting statistical
visualizations. It offers lovely default styles and color schemes to enhance the
appeal of statistics charts. It is based on the Matplotlib software and tightly
connected with the Pandas data structures.
With Seaborn, visualization will be at the heart of data exploration and
comprehension. For a better comprehension of the dataset, it offers dataset-
oriented APIs that allow us to switch between various visual representations for
the same variables.

Gujarat Technological University 12 SNPIT & RC, Umrakh


200490111006

❖ Example

With the help of seaborn library shown graph with example.

Fig 4.7. Code

Output

Fig 4.8. Output

Gujarat Technological University 13 SNPIT & RC, Umrakh


200490111006

4.5 SkLearn

Scikit-learn is mostly written in Python and significantly makes use of the


NumPy module for computations involving arrays and linear algebra. To
further the effectiveness of this library, some basic algorithms are also written
in Cython. Utilizing wrappers created in Cython for LIBSVM and
LIBLINEAR, support vector machines, logistic regression, and linear SVMs
are done. In certain conditions, expanding these functions with Python might
not be practical.

Numerous other Python programs, including as SciPy, Pandas data frames,


NumPy for array vectorization, Matplotlib, seaborn, and plotly for graphing,
among others, play well with Scikit-learn.

Gujarat Technological University 14 SNPIT & RC, Umrakh


200490111006

CHAPTER 5: Read Data as Data Frame

Fig 5.1

Fig 5.2. Step to Read Data as Data Frame

Gujarat Technological University 15 SNPIT & RC, Umrakh


200490111006

CHAPTER 6: Explore Data Frame

# import library
import pandas as pd
# read data
Titanic = pd.read_csv('https://github.com/YBIFoundation/Dataset/raw /main
/Titanic.csv')

# display first 5 rows


titanic.head()

Fig6.1. Head

# display last 5 rows


titanic.tail()

Fig 6.2. Tail

Gujarat Technological University 16 SNPIT & RC, Umrakh


200490111006

# display info
titanic.info()

Fig 6.3. Info

# display summary statistics of numerical columns


titanic.describe()

Fig 6.4. Describe

Gujarat Technological University 17 SNPIT & RC, Umrakh


200490111006

# display summary statistics of all columns


titanic.describe(include='all')

Figu 6.5. All Colums

# display shape
titanic.shape

(1309, 14)
# display column labels
titanic.columns

Index(['pclass', 'survived', 'name', 'sex', 'age', 'sibsp', 'parch', 'ticket', 'fare',


'cabin', 'embarked', 'boat', 'body', 'home.dest'], dtype='object')

Gujarat Technological University 18 SNPIT & RC, Umrakh


200490111006

# select a column as a series


titanic['sex']

Fig 6.6. Column as Series

titanic['sex'].shape

(1309,)

# select a column as a dataframe


titanic[['sex']]

Fig 6.7. Column as Data Frame

Gujarat Technological University 19 SNPIT & RC, Umrakh


200490111006

titanic['sex'].shape
(1309, 1)

# unique categories in a column


titanic['sex'].unique()

array(['female', 'male'], dtype=object)

# number of unique categories in a column


titanic['sex'].nunique()

# categories wise number of observations in a


column titanic['sex'].value_counts()

male 843
female 466
Name: sex, dtype: int64

# count of missing values


titanic.isna().sum()

Fig 6.8. Count of missing value

Gujarat Technological University 20 SNPIT & RC, Umrakh


200490111006

CHAPTER 7: Train Test Split

# import library
import pandas as pd

# read data
diabetes = pd.read_csv('https://github.com/YBIFoundation/Dataset/raw
/main/Diabetes.csv')
# display first 5 rows
diabetes.head()

Fig 7.1. Head

# display columns
diabetes.columns

Index(['pregnancies', 'glucose', 'diastolic', 'triceps', 'insulin', 'bmi','dpf', 'age',


'diabetes'],
dtype='object')

# define target(y)
y = diabetes['diabetes']

# define features(X)

X = diabetes.drop(['diabetes'], axis=1)

X = diabetes[['pregnancies', 'glucose', 'diastolic', 'triceps', 'insulin', 'bmi','dpf',


'age']]

Gujarat Technological University 21 SNPIT & RC, Umrakh


200490111006

# import train test split function from sklearn.model_selection import


train_test_split

X_train,X_test,y_train,y_test=train_test_split(X,y,train_size=0.7,stratify=y,rand
om_state=2529)

X_train.shape,X_test.shape,y_train.shape,y_test.shape

((537, 8), (231, 8), (537,), (231,))

X_train

Fig 7.2. X_Train

Gujarat Technological University 22 SNPIT & RC, Umrakh


200490111006

CHAPTER 8: Simple Linear Regression Models

# Step 1: import library


import pandas as pd

# Step 2 : import data


salary =
pd.read_csv('https://github.com/ybifoundation/Dataset/raw/main/Salary%20

# Step 3 : define target (y) and features (X)

salary.columns

Index(['Experience Years', 'Salary'], dtype='object')


salary.columns
Index(['Experience Years', 'Salary'], dtype='object')
y = salary['Salary']
X = salary[['Experience Years']]

# Step 4 : train test split


from sklearn.model_selection
import train_test_split X_train, X_test, y_train, y_test = train_test_split(X,y,
train_size=0.7, random_state=2529)

# check shape of train and test sample


X_train.shape, X_test.shape, y_train.shape, y_test.shape

((28, 1), (12, 1), (28,), (12,))

# Step 5 : select model


from sklearn.linear_model
import LinearRegression model = LinearRegression()

# Step 6 : train or fit model


model.fit(X_train,y_train)

LinearRegression()

Gujarat Technological University 23 SNPIT & RC, Umrakh


200490111006

26596.961311068262
model.coef_
array([9405.61663234])

# Step 7 : predict model


y_pred = model.predict(X_test)
y_pred

array([ 90555.15441095, 59516.61952424, 106544.70268592,


64219.42784041,68922.23615658, 123474.81262412, 84911.78443155,
63278.86617718,65159.98950364, 61397.74285071, 37883.70126987,
50111.00289191])

# Step 8 : model accuracy


from sklearn.metrics
import mean_absolute_error, mean_absolute_percentage_error,
mean_squared_error
mean_absolute_error(y_test,y_pred)

4005.9263101681768

mean_absolute_percentage_error(y_test,y_pred)

0.06384602996141632

mean_squared_error(y_test,y_pred)

24141421.671440993

Gujarat Technological University 24 SNPIT & RC, Umrakh


200490111006

CHAPTER 9: LOGISTIC REGRESSION MODELS

# Step 1: import library


import pandas as pd

# Step 2 : import data


Diabetes = pd.read_csv('https://github.com/YBIFoundation/Dataset
/raw/main/Diabetes.csv')
diabetes.head()

Fig 9.1. Head

diabetes.info()

Fig 9.2. Info

Gujarat Technological University 25 SNPIT & RC, Umrakh


200490111006

diabetes.describe()

Fig 9.3. Describe

# Step 3 : define target (y) and features (X)


diabetes.columns

Index(['pregnancies', 'glucose', 'diastolic', 'triceps', 'insulin', 'bmi','dpf', 'age',


'diabetes'],
dtype='object')

y = diabetes['diabetes']

X = diabetes.drop(['diabetes'],axis=1)

# Step 4 : train test split


from sklearn.model_selection
import train_test_split X_train, X_test, y_train, y_test = train_test_split
(X,y, train_size=0.7, random_state=2529)

# check shape of train and test sample


X_train.shape, X_test.shape, y_train.shape, y_test.shape

((537, 8), (231, 8), (537,), (231,))

# Step 5 : select model


from sklearn.linear_model
import LogisticRegression model = LogisticRegression(max_iter=500)

# Step 6 : train or fit model


model.fit(X_train,y_train)

Gujarat Technological University 26 SNPIT & RC, Umrakh


200490111006

LogisticRegression(max_iter=500)
model.intercept_
array([-8.13058703])
model.coef
array([[ 1.01259801e-01, 3.60553853e-02, -2.09736871e-02,
-2.57281495e-03, -2.04295785e-04, 8.24680082e-02,
9.51017756e-01, 2.53493255e-02]])

# Step 7 : predict model


y_pred = model.predict(X_test)
y_pred

Fig 9.4. Output

# Step 8 : model accuracy


from sklearn.metrics
import confusion_matrix, accuracy_score, classification_report
confusion_matrix(y_test,y_pred)

array([[133, 12],
[ 41, 45]]

accuracy_score(y_test,y_pred)

0.7705627705627706

Gujarat Technological University 27 SNPIT & RC, Umrakh


200490111006

print(classification_report(y_test,y_pred))

Fig 9.5. Classification

Gujarat Technological University 28 SNPIT & RC, Umrakh


200490111006

CHAPTER 10: Final Project


Hand Written Digital Prediction - Classification Analysis
The digital dataset consists of 8x8 pixel images of digital. The image attribute
of the dataset stores 8x8 arrays of grayscale for each image. We will use these
arrays to visualize the first 4 images. The attribute of the dataset stored the
digital each image represents.

Import Library

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import Data

from sklearn.datasets import load_digits


df=load_digits()
axes =plt.subplots (nrows=1, ncols=4, figsize=(10,3)) for ax, image, label in
zip(axes, df.images, df.target): ax.set_axis_off()
ax.imshow(image,cmap=plt.cm.gray_r,interpolation="nearest")
ax.set_title("Traning:%i" %label)

Data PreProcessing

df.images.shape

array([[ 0., 0., 5., 13., 9., 1., 0., 0.],


[ 0., 0., 13., 15., 10., 15., 5., 0.],
[ 0., 3., 15., 2., 0., 11., 8., 0.],
[ 0., 4., 12., 0., 0., 8., 8., 0.],
[ 0., 5., 8., 0., 0., 9., 8., 0.],
[ 0., 4., 11., 0., 1., 12., 7., 0.],
[ 0., 2., 14., 5., 10., 12., 0., 0.],
[ 0., 0., 6., 13., 10., 0., 0., 0.]])

df.images[0].shape

(8, 8)

len(df.images)

1797

Gujarat Technological University 29 SNPIT & RC, Umrakh


200490111006

n_samples = len(df.images)
data = df.images.reshape((n_samples,-1))

data[0]

array([ 0., 0., 5., 13., 9., 1., 0., 0., 0., 0., 13., 15., 10.,
15., 5., 0., 0., 3., 15., 2., 0., 11., 8., 0., 0., 4.,
12., 0., 0., 8., 8., 0., 0., 5., 8., 0., 0., 9., 8.,
0., 0., 4., 11., 0., 1., 12., 7., 0., 0., 2., 14., 5.,
10., 12., 0., 0., 0., 0., 6., 13., 10., 0., 0., 0.])

data [0].shape

(64,)

data.shape

(1797, 64)

Scaling Image Data

data.min()
0.0
data.max()
16.0
data.min()

0.0
data.max()
16.0

data[0]

array([ 0., 0., 5., 13., 9., 1., 0., 0., 0., 0., 13., 15., 10.,
15., 5., 0., 0., 3., 15., 2., 0., 11., 8., 0., 0., 4.,
12., 0., 0., 8., 8., 0., 0., 5., 8., 0., 0., 9., 8.,
0., 0., 4., 11., 0., 1., 12., 7., 0., 0., 2., 14., 5.,
10., 12., 0., 0., 0., 0., 6., 13., 10., 0., 0., 0.])

Gujarat Technological University 30 SNPIT & RC, Umrakh


200490111006

Train Test Split Data

from sklearn.model_selection import train_test_split


x_train, x_test, y_train, y_test = train_test_split(data, df.target,test_size=0.3)
x_train.shape, x_test.shape, y_train.shape, y_test.shape

((1257, 64), (540, 64), (1257,), (540,))


Random Forest Model
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()
rf.fit(x_train, y_train)
RandomForestClassifier()

Predict Test Data

y_pred = rf.predict(x_test)
y_pred

array([6, 1, 2, 8, 4, 8, 3, 9, 7, 8, 8, 9, 8, 5, 7, 8, 4, 5, 2, 0, 4, 5,
4, 3, 4, 1, 4, 6, 9, 8, 5, 2, 1, 5, 4, 4, 6, 4, 0, 2, 6, 6, 2, 8,
4, 0, 8, 6, 5, 4, 8, 4, 8, 4, 7, 5, 5, 5, 7, 1, 8, 1, 0, 2, 8, 5,
9, 8, 9, 1, 0, 5, 0, 0, 0, 3, 1, 1, 3, 1, 0, 1, 0, 7, 0, 9, 8, 8,
1, 1, 5, 4, 3, 3, 3, 8, 9, 5, 6, 1, 8, 1, 0, 5, 8, 7, 9, 0, 9, 7,
1, 3, 6, 3, 2, 4, 6, 1, 9, 3, 6, 7, 9, 0, 9, 9, 2, 1, 6, 8, 5, 6,
0, 5, 8, 0, 6, 0, 7, 0, 6, 9, 8, 1, 1, 1, 5, 8, 2, 2, 9, 4, 1, 7,
1, 6, 8, 4, 8, 6, 2, 1, 9, 4, 5, 1, 2, 8, 3, 1, 0, 9, 3, 4, 8, 3,
4, 2, 8, 3, 3, 9, 1, 7, 6, 8, 6, 7, 1, 0, 3, 3, 6, 2, 2, 2, 5, 0,
1, 2, 0, 1, 1, 2, 6, 1, 5, 3, 4, 5, 4, 4, 9, 2, 9, 6, 5, 3, 6, 1,
0, 6, 8, 0, 6, 1, 2, 8, 4, 0, 8, 6, 7, 8, 1, 5, 2, 4, 6, 3, 6, 0,
2, 1, 8, 2, 2, 9, 6, 0, 7, 7, 5, 5, 9, 5, 0, 8, 1, 9, 5, 2, 0, 6,
7, 1, 9, 3, 9, 2, 5, 4, 6, 8, 7, 9, 3, 9, 4, 3, 0, 7, 1, 9, 5, 2,
0, 0, 8, 9, 9, 2, 5, 2, 1, 8, 6, 2, 1, 9, 3, 4, 7, 3, 4, 8, 4, 9,
4, 5, 9, 9, 8, 5, 9, 7, 8, 9, 4, 2, 2, 7, 1, 4, 2, 4, 6, 7, 3, 1,
2, 9, 4, 2, 0, 3, 6, 2, 5, 0, 8, 5, 1, 6, 1, 3, 7, 5, 0, 2, 9, 3,
4, 3, 0, 5, 4, 0, 9, 3, 8, 7, 4, 2, 1, 9, 8, 5, 6, 5, 1, 4, 4, 3,
4, 3, 7, 9, 6, 9, 1, 1, 3, 0, 7, 2, 3, 3, 1, 6, 8, 9, 8, 3, 6, 8,
3, 4, 9, 6, 5, 7, 0, 9, 2, 3, 5, 8, 5, 0, 9, 0, 9, 5, 0, 4, 9, 7,
1, 2, 7, 6, 3, 7, 2, 5, 9, 8, 9, 8, 8, 9, 1, 1, 2, 7, 3, 8, 1, 2,
9, 7, 4, 9, 9, 3, 4, 4, 2, 0, 6, 8, 8, 4, 5, 4, 0, 2, 5, 8, 2, 2,
0, 3, 9, 3, 0, 8, 5, 7, 2, 2, 8, 5, 8, 6, 3, 9, 0, 6, 5, 1, 1, 0,
3, 9, 6, 8, 3, 0, 3, 0, 4, 3, 7, 3, 9, 8, 1, 4, 6, 1, 1, 0, 6, 5,
5, 6, 3, 4, 8, 7, 7, 0, 4, 2, 1, 6, 9, 1, 4, 8, 2, 5, 5, 0, 4, 5,
7, 8, 0, 1, 7, 8, 9, 7, 3, 0, 9, 1])

Gujarat Technological University 31 SNPIT & RC, Umrakh


200490111006

from sklearn.metrics import classification_report

print(clasification_report(y_test, y_pred))

Fig 10.1. Classification

Gujarat Technological University 32 SNPIT & RC, Umrakh


200490111006

CHAPTER 11: Conclusion


In conclusion, my internship at the ybi foundation was a valuable learning
experience. While working in this job, I gained extensive knowledge of each
stage of python programming with data structure and algorithms. I also
improved my technical knowledge by learning how to solve data structure
problems using various python tools and libraries. I also had the
opportunity to collaborate with a variety of specialists, which improved
my teamwork and communication skills. Overall, this internship has
strengthened my groundwork for a career. The knowledge and expertise I
have gained are greatly appreciated, and I am confident that they will
benefit me in my futureundertakings. I'm eager to start my next profession
in python with data structure and put my recently acquired skills and
knowledge to action.

Gujarat Technological University 33 SNPIT & RC, Umrakh


200490111006

CHAPTER 12: REFERENCES


1. https://www.ybifoundation.org
2. Introduction to Data Science with Python by simplilearn
3. Anirudh rao “Top 10 Python Libraries” Aug 23
4. Avijeet Biswal “Data Analytics With Python” Jul 23

Gujarat Technological University 34 SNPIT & RC, Umrakh

You might also like