Professional Documents
Culture Documents
(AUTONOMOUS)
INTERNSHIP REPORT
A report submitted in partial fulfilment of the requirements for the Award of
Degree of
BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE & ENGINEERING
(DATA SCIENCE)
By
N SWETHA REDDY
Regd.No.21781A3294
Under supervision of
Mr/Ms Sarvesh Agarwal (Founder&CEO)
(Duration: 11/06/2023 to 22/07/2023)
SRI VENKATESWARA COLLEGE OF ENGINEERING AND TECHNOLOGY
(AUTONOMOUS)
R.V.S.NAGAR, CHITTOOR – 517 127. (A.P)
(Approved by AICTE, New Delhi, Affiliated to
JNTUA, Anantapur)
(Accredited by NBA, New Delhi & NAAC, Bangalore)
(An ISO 9001:2000 Certified Institution)
2021-2022
CERTIFICATE
This is to certify that the “Internship report” submitted by N SWETHA
REDDY (Regd.No.:21781A3294) is work done by him and submitted
during 2022-2023.Academic year, in partial fulfilment of the
requirements for the award of the Degree of BACHELOR OF
TECHNOLOGY in COMPUTER SCIENCE & ENGINEERING
(DATA SCIENCE), at Intershala Trainings.
Mr.Radhakrishna DR.M.LAVANYA
(DATA SCIENCE)
CERTIFICATE
ACKNOWLEDGEMENT
ABOUT TRAINING
2.6. Functions
2.7. Data Structure
2.8. Lists
2.9. Dictionaries
2.10. Understanding Standard Libraries in Python
4.19. K-means
WEEKLY OVERVIEW OF INTERNSHIP ACTIVITIES
1ST WEEK:
DATE DAY NAME OF THE MODULE/TOPIC COMPLETED
11/06/2023 Sunday Data science overview
12/06/2023 Monday Introduction to python
13/06/2023 Tuesday Understanding the operators
14/06/2023 Wednesday Variables and data types
15/06/2023 Thursday Conditional statements
16/06/2023 Friday Looping statements
17/06/2023 Saturday Functions
2ND WEEK:
DATE DAY NAME OF THE MODULE/TOPIC COMPLETED
18/06/2023 Sunday Data structure
19/06/2023 Monday Lists and Dictionaries
20/06/2023 Tuesday Understanding standard libraries in python
21/06/2023 Wednesday Reading a CSV file in python
22/06/2023 Thursday Data frames and basic operators with data
frames
23/06/2023 Friday Indexing data frame
24/06/2023 Saturday Introduction to statistics
3RD WEEK:
DATE DAY NAME OF THE MODULE/TOPIC COMPLETED
25/06/2023 Sunday Measures of central tendency
26/06/2023 Monday Understanding the spread of data
27/06/2023 Tuesday Data distribution
28/06/2023 Wednesday Introduction to probability
29/06/2023 Thursday Probabilities of discrete and continuous variable
30/06/2023 Friday Central limit theorem and Normal distribution
01/07/2023 Saturday Introduction to inferential statistics
4TH WEEK:
DATE DAY NAME OF THE MODULE/TOPIC COMPLETED
02/07/2023 Sunday Understanding the confidence interval and margin
of error
03/07/2023 Monday Hypothesis testing
04/07/2023 Tuesday T tests and Chi squared tests
05/07/2023 Wednesday Understanding the concept of correlation
06/07/2023 Thursday Introduction to predictive modelling
07/07/2023 Friday Understanding the types of predictive models
08/07/2023 Saturday Stages of predictive models
5TH WEEK:
DATE DAY NAME OF THE MODULE/TOPIC COMPLETED
09/07/2023 Sunday Hypothesis generation
10/07/2023 Monday Data extraction and Data exploration
11/07/2023 Tuesday Reading the data into python
12/07/2023 Wednesday Variable identification
13/07/2023 Thursday Unvariate analysis for continuous variables
14/07/2023 Friday Unvariate analysis for categorial variables
15/07/2023 Saturday Bivariate analysis and treating missing values
6TH WEEK:
DATE DAY NAME OF THE MODULE/TOPIC COMPLETED
16/07/2023 Sunday Treating missing values and how to treat outliers
17/07/2023 Monday Transforming the variables
18/07/2023 Tuesday Basics of model building
19/07/2023 Wednesday Linear Regression
20/07/2023 Thursday Logistic Regression
21/07/2023 Friday Decision Trees and K-Means
22/07/2023 Saturday Final Project
MODULE-1: INTRODUCTION TO DATA SCIENCE
DATA SCIENCE OVERVIEW:
Data science is the study of data. Like biological
sciences is a study of biology, physical sciences, it’s the
study of physical reactions. Data is real, data has real
properties, and we need to study them if we’re going to
work on them. Data Science involves data and some
signs.It is a process not an event.
What is statistical modelling?
The statistical modelling process is a way of applying statistical
analysis to datasets in data science. The statistical model
involves a mathematical relationship between random and
non-random variables. A statistical model can provide intuitive
visualizations that aid data scientists in identifying relationships
between variables and making predictions by applying
statistical models to raw data. Examples of common data sets
for statistical analysis include census data, public health data,
and social media data.
Predictive modelling:
Predictive modelling is a form of artificial intelligence
that uses data mining and probability to forecast or
estimate more granular, specific outcomes. For
example, predictive modelling could help identify
customers who are likely to purchase our new One AI
software over the next 90 days. Machine Learning:
Machine learning is a branch of artificial intelligence
(ai) where computers learn to act and adapt to new
data without being programmed to do so. The computer
is able to act independently of human interaction.
Forecasting:
Forecasting is a process of predicting or estimating
future events based on past and present data and most
commonly by analysis of trends. "Guessing" doesn't cut
it.
2.6. Functions
2.8. Lists
Lists are used to store multiple items in a single variable.Lists
are one of 4 built-in data types in Python used to store
collections of data, the other 3 are Tuple, Set, and Dictionary,
all with different qualities and usage.Lists are created using
square brackets.
2.9. Dictionaries
Histogram:
3.11. T tests
A t test is a statistical test that is used to compare the
means of two groups. It is often used in hypothesis
testing to determine whether a process or treatment
actually has an effect on the population of interest, or
whether two groups are different from one another.
4.6.Data Exploration
First, identify
Predictor
(Input) and
Target
(output) variables. Next, identify the data type and category of
the variables. Example:- Suppose, we want to predict, whether
the students will play cricket or not (refer below data set)
4.9. Univariate Analysis for Continuous Variables:
In case of continuous variables, we need to understand the
central tendency and spread of the variable. These are
measured using various statistical metrics visualization
methods as shown below:
Note:
Univariate analysis is also used to highlight missing and
outlier values. In the upcoming part of this series, we will look
at methods to handle missing and outlier values.
4.10. Univariate Analysis for Categorical
Variables:
For categorical variables, we’ll use frequency table to understand
distribution of each category. We can also read as percentage of values
under each category. It can be be measured using two metrics,
Count and Count%
4.11.Bivariate Analysis:
Bi-variate Analysis finds out the relationship between two variables.
Here, we look for association and disassociation between variables at a
pre-defined significance level. We can perform bi-variate analysis for
any combination of categorical and continuous variables.
Continuous & Continuous:
3. Data preprocessing
4. Data visualization
6. Model evaluation
7. Model prediction
EXAMPLE:
from matplotlib.colors import ListedColormap
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
Use this dataset to train the model. This file contains all the
client and call details as well as the target variable
“subscribed”. You have to train your model using this file.
2.test.csv:
SOLUTION:
CONCLUSION
In conclusion, I can say that internship was a
great experience. Thanks to this project, I
acquired deeper knowledge concerning my
technical skills.I am able to develop the skill to
build and assess data-based model.
Few factors that point out to data science feature
are:
•Companies inability to handle data:
Data is being regularly collected by businesses
and companies for transactions and through
website interactions.Many companies face a
common challenge to analyse and categorize that
the data is collected and stored.Companies can
progress a lot with proper and efficient handling
of data which results in productivity.
•Data Science is constantly evolving:
Data science is a broad career path that is
undergoing development and thus promises
abundant opportunities in the future.
TRAINING CERTIFICATE