Professional Documents
Culture Documents
BACHELOR OF ENGINEER
IN
ELECTRONICS & COMMUNICATION
CERTIFICATE
This is to certify that the project report submitted along with the project entitled
Python Programming with Data Structure and Algorithm has been carried
out by Chaudhari RohanKumar Rameshbhai under my guidance in partial
fulfilment for the degree of Bachelor of Engineer in Electronics &
Communication, 6th semester of Gujarat Technological University, Ahmedabad
during the academic year 2022-2023.
DECLARATION
We hereby declare the Internship report submitted along with the internship
entitled maintenance Python Programming with Data Structures and
Algorithms submitted in partial fulfilment for the degree of Bachelor of
Engineer in Electronics & Communication Engineering to Gujarat
Technological University, Ahmedabad, is a bonafide record of original project
work carried out by me at YBI FOUNDATION EDUCATION PRIVATE
LIMITED under the supervision of Mr. Anup Yadav (External guide) and
that no part of this report has been directly copied from any student report or
taken from any other source, without providing due reference.
Chaudhari RohanKumar
ACKKNOWLEDGMENT
We wish to express our sincere gratitude to our external guide Mr. Anup Yadav
for continuously guidance me at the industry and answering all my doubts with
patience. The best organization called Python programming with data structure
and algorithms center giving me.
I have people Mr. Anup Yadav like around us who are always ready to help and
guides us.
Also thanks to our internal guide Prof. Mayank A Patel for Helping us through
our internship and advice in completing this internship.
Last but not least a huge thanks to Head of Department Dr.Salman
Bombaywala, principal DR. Piyush Jain and all the teaching staff of
SNPIT&RC for providing me with all the guidance, infrastructure and skills
required in professional world.
Thus, in conclusion to the above said, we Once again thanks the staff member
of YBI FOUNDATION EDUCATION PRIVATE LIMITED their valuable
support in completion of internship.
ABSTRACT
Vision
Building Careers of Tomorrow.
Mission
To provide opportunities to advance tour professional journey thorugh rigiorous
online program that offer personalised support, developed in collaboration with
best-in-class faculty and industry professionals.
Learn Data science and how to use scientific methods, processes, algorithms
and systems to extract knowledge and insights from structured and unstructured
data as one of the hottest professions in the market today, bundled with
Microsoft MTA Certification
Learn about business analytics and how it actually helps businesses automate
and improve their business processes. Data-driven businesses use their data as a
corporate asset and use it to gain a competitive edge by using the insights to
uncover novel patterns and connections.
There are also a couple of factors that make python great for learning:
1. It is easy to learn – The times needs to learn python is shortest than for
many other language; this means that it’s possible to start the actual
programming faster.
2. It is easy to use for writing new software – it’s often possible to write
code faster when using python.
3. It’s open source, is easy to obtain , install and deploy – Python is free,
open and multiplatform; not all languages can boast that.
4. It extendable. Python code can be written in other languages and user can
add low – level modules to python interpreter to customize their tools.
The equivalent of arrays in Python are lists, although they take a long time to
execute. The goal of NumPy is to offer array objects that are up to 50 times
faster than conventional Python lists. The NumPy array object is referred to as
ndarray, and it has a number of supporting methods that make using ndarray
relatively simple. In data research, where speed and resources are crucial, arrays
are employed a lot.
There are the following advantages of using NumPy for data analysis.
❖ Example
Convert the following 1-D array with 12 elements into a 3-D array.
The outermost dimension will have 2 arrays that contains 3 arrays, each with 2
elements:
Output
4.2 Pandas
For the purpose of manipulating and analyzing data, the Python programming
language has a software package called pandas. It includes specific data
structures and procedures for working with time series and mathematical tables.
It is free software distributed under the BSD license's three clauses. The word is
derived from "panel data," a phrase used in econometrics to refer to data sets
that contain observations for the same persons throughout a range of time
periods. Its name is a pun on "Python data analysis" in and of itself. When Wes
McKinney worked as a researcher at AQR Capital from 2007 to 2010, he began
creating the pandas that would eventually become famous.
Pandas is primarily used for tabular data analysis and related manipulation in
Data Frames. Data can be imported into Pandas from a variety of file types,
including Microsoft Excel, JSON, Parquet, SQL database tables, and comma-
separated values. Data cleaning and wrangling functions, as well as operations
like merging, restructuring, and choosing, are all supported by Pandas. Many of
the R programming language's proven functionality for working with Data
Frames were brought into Python with the introduction of pandas.
With the aid of Pandas, we can examine large data sets and draw conclusions
based on statistical principles. Pandas can organize disorganized data sets,
making them readable and useful. In data science, relevant data is crucial.
Pandas provides you with information on the data. such as: • Is there a
relationship between two or more columns?
Maximum value?
Minimum value?
Rows that are irrelevant or contain incorrect data, such as empty or NULL
values, can also be deleted by Pandas. This process is known as data cleaning.
❖ Example
With the help of Pandas library display the data in table form.
Output
4.3 Matplotlib
❖ Example
Draw a line in a diagram from position (0,0) to position (6,250):
Output
4.4 Seaborn
A package called Seaborn uses Matplotlib as its foundation to plot graphs. In
order to see random distributions, it will be used.
Python's Seaborn visualization module is fantastic for plotting statistical
visualizations. It offers lovely default styles and color schemes to enhance the
appeal of statistics charts. It is based on the Matplotlib software and tightly
connected with the Pandas data structures.
With Seaborn, visualization will be at the heart of data exploration and
comprehension. For a better comprehension of the dataset, it offers dataset-
oriented APIs that allow us to switch between various visual representations for
the same variables.
❖ Example
Output
4.5 SkLearn
Fig 5.1
# import library
import pandas as pd
# read data
Titanic = pd.read_csv('https://github.com/YBIFoundation/Dataset/raw /main
/Titanic.csv')
Fig6.1. Head
# display info
titanic.info()
# display shape
titanic.shape
(1309, 14)
# display column labels
titanic.columns
titanic['sex'].shape
(1309,)
titanic['sex'].shape
(1309, 1)
male 843
female 466
Name: sex, dtype: int64
# import library
import pandas as pd
# read data
diabetes = pd.read_csv('https://github.com/YBIFoundation/Dataset/raw
/main/Diabetes.csv')
# display first 5 rows
diabetes.head()
# display columns
diabetes.columns
# define target(y)
y = diabetes['diabetes']
# define features(X)
X = diabetes.drop(['diabetes'], axis=1)
X_train,X_test,y_train,y_test=train_test_split(X,y,train_size=0.7,stratify=y,rand
om_state=2529)
X_train.shape,X_test.shape,y_train.shape,y_test.shape
X_train
salary.columns
LinearRegression()
26596.961311068262
model.coef_
array([9405.61663234])
4005.9263101681768
mean_absolute_percentage_error(y_test,y_pred)
0.06384602996141632
mean_squared_error(y_test,y_pred)
24141421.671440993
diabetes.info()
diabetes.describe()
y = diabetes['diabetes']
X = diabetes.drop(['diabetes'],axis=1)
LogisticRegression(max_iter=500)
model.intercept_
array([-8.13058703])
model.coef
array([[ 1.01259801e-01, 3.60553853e-02, -2.09736871e-02,
-2.57281495e-03, -2.04295785e-04, 8.24680082e-02,
9.51017756e-01, 2.53493255e-02]])
array([[133, 12],
[ 41, 45]]
accuracy_score(y_test,y_pred)
0.7705627705627706
print(classification_report(y_test,y_pred))
Import Library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import Data
Data PreProcessing
df.images.shape
df.images[0].shape
(8, 8)
len(df.images)
1797
n_samples = len(df.images)
data = df.images.reshape((n_samples,-1))
data[0]
array([ 0., 0., 5., 13., 9., 1., 0., 0., 0., 0., 13., 15., 10.,
15., 5., 0., 0., 3., 15., 2., 0., 11., 8., 0., 0., 4.,
12., 0., 0., 8., 8., 0., 0., 5., 8., 0., 0., 9., 8.,
0., 0., 4., 11., 0., 1., 12., 7., 0., 0., 2., 14., 5.,
10., 12., 0., 0., 0., 0., 6., 13., 10., 0., 0., 0.])
data [0].shape
(64,)
data.shape
(1797, 64)
data.min()
0.0
data.max()
16.0
data.min()
0.0
data.max()
16.0
data[0]
array([ 0., 0., 5., 13., 9., 1., 0., 0., 0., 0., 13., 15., 10.,
15., 5., 0., 0., 3., 15., 2., 0., 11., 8., 0., 0., 4.,
12., 0., 0., 8., 8., 0., 0., 5., 8., 0., 0., 9., 8.,
0., 0., 4., 11., 0., 1., 12., 7., 0., 0., 2., 14., 5.,
10., 12., 0., 0., 0., 0., 6., 13., 10., 0., 0., 0.])
y_pred = rf.predict(x_test)
y_pred
array([6, 1, 2, 8, 4, 8, 3, 9, 7, 8, 8, 9, 8, 5, 7, 8, 4, 5, 2, 0, 4, 5,
4, 3, 4, 1, 4, 6, 9, 8, 5, 2, 1, 5, 4, 4, 6, 4, 0, 2, 6, 6, 2, 8,
4, 0, 8, 6, 5, 4, 8, 4, 8, 4, 7, 5, 5, 5, 7, 1, 8, 1, 0, 2, 8, 5,
9, 8, 9, 1, 0, 5, 0, 0, 0, 3, 1, 1, 3, 1, 0, 1, 0, 7, 0, 9, 8, 8,
1, 1, 5, 4, 3, 3, 3, 8, 9, 5, 6, 1, 8, 1, 0, 5, 8, 7, 9, 0, 9, 7,
1, 3, 6, 3, 2, 4, 6, 1, 9, 3, 6, 7, 9, 0, 9, 9, 2, 1, 6, 8, 5, 6,
0, 5, 8, 0, 6, 0, 7, 0, 6, 9, 8, 1, 1, 1, 5, 8, 2, 2, 9, 4, 1, 7,
1, 6, 8, 4, 8, 6, 2, 1, 9, 4, 5, 1, 2, 8, 3, 1, 0, 9, 3, 4, 8, 3,
4, 2, 8, 3, 3, 9, 1, 7, 6, 8, 6, 7, 1, 0, 3, 3, 6, 2, 2, 2, 5, 0,
1, 2, 0, 1, 1, 2, 6, 1, 5, 3, 4, 5, 4, 4, 9, 2, 9, 6, 5, 3, 6, 1,
0, 6, 8, 0, 6, 1, 2, 8, 4, 0, 8, 6, 7, 8, 1, 5, 2, 4, 6, 3, 6, 0,
2, 1, 8, 2, 2, 9, 6, 0, 7, 7, 5, 5, 9, 5, 0, 8, 1, 9, 5, 2, 0, 6,
7, 1, 9, 3, 9, 2, 5, 4, 6, 8, 7, 9, 3, 9, 4, 3, 0, 7, 1, 9, 5, 2,
0, 0, 8, 9, 9, 2, 5, 2, 1, 8, 6, 2, 1, 9, 3, 4, 7, 3, 4, 8, 4, 9,
4, 5, 9, 9, 8, 5, 9, 7, 8, 9, 4, 2, 2, 7, 1, 4, 2, 4, 6, 7, 3, 1,
2, 9, 4, 2, 0, 3, 6, 2, 5, 0, 8, 5, 1, 6, 1, 3, 7, 5, 0, 2, 9, 3,
4, 3, 0, 5, 4, 0, 9, 3, 8, 7, 4, 2, 1, 9, 8, 5, 6, 5, 1, 4, 4, 3,
4, 3, 7, 9, 6, 9, 1, 1, 3, 0, 7, 2, 3, 3, 1, 6, 8, 9, 8, 3, 6, 8,
3, 4, 9, 6, 5, 7, 0, 9, 2, 3, 5, 8, 5, 0, 9, 0, 9, 5, 0, 4, 9, 7,
1, 2, 7, 6, 3, 7, 2, 5, 9, 8, 9, 8, 8, 9, 1, 1, 2, 7, 3, 8, 1, 2,
9, 7, 4, 9, 9, 3, 4, 4, 2, 0, 6, 8, 8, 4, 5, 4, 0, 2, 5, 8, 2, 2,
0, 3, 9, 3, 0, 8, 5, 7, 2, 2, 8, 5, 8, 6, 3, 9, 0, 6, 5, 1, 1, 0,
3, 9, 6, 8, 3, 0, 3, 0, 4, 3, 7, 3, 9, 8, 1, 4, 6, 1, 1, 0, 6, 5,
5, 6, 3, 4, 8, 7, 7, 0, 4, 2, 1, 6, 9, 1, 4, 8, 2, 5, 5, 0, 4, 5,
7, 8, 0, 1, 7, 8, 9, 7, 3, 0, 9, 1])
print(clasification_report(y_test, y_pred))