Sample Report

Nitte Meenakshi Institute of Technology
(AN AUTONOMOUS INSTITUTION AFFILIATED TO VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAUM
Department of Information Science & Engineering
INTERNSHIP REPORT
Submitted by:
Karthik Senthil
1NT18IS201
A report on
Data Science & Business Analytics Intern - The Sparks Foundation
Submitted in fulfillment for award of Internship for
Bachelor of Engineering
in
Information Science and Engineering
Submitted by
Karthik Senthil
1NT18IS201
CERTIFICATE
This is to certify that the internship entitled Data Science & Business Analytics Intern at
The Sparks Foundation is a bonafide work carried out by Karthik Senthil (1NT18IS201)
in fulfillment for the award of degree of Bachelor of Engineering in Information Science
and Engineering. It is certified that all the corrections/suggestions indicated for internal
assessment have been incorporated in the report submitted in the departmental library. The
report has been approved as it satisfies the academic requirements in respect to the internship
work prescribed for bachelor of engineering degree.
Examination Panel Head of Department
........................................ ........................................
Dr. Mohan S.G.
Professor and Head, Dept. of ISE,
NMIT
CERTIFICATE ISSUED BY COMPANY
ACKNOWLEDGEMENT
The internship opportunity that I had with The Sparks Foundation was a great chance for
learning and professional development. I'm grateful for having a chance to meet so many
wonderful people and professionals through this internship period. The credit for the
successful completion of this internship goes beyond my own work, to those people who
have always been with me throughout. I take this opportunity to express my heartfelt
gratitude to each one of them.
I would like to thank Mr. PRANAV DUBEY (Managing Director) for giving me this
opportunity to work as an intern and also for guiding and supporting me in the completion of
the internship on time.
I would like to convey my heartfelt thanks to Dr. H.C. NAGARAJ, our principal and to
Dr.Mohan S. G., HoD, Department of Information Science and Engineering for giving me a
chance to embark on this opportunity.
I would like to express my deepest gratitude to my friends Mr. SRIRANJAN S and Mr.
KAUSHAL BHAT, for their continuous support and encouragement that enabled me to
complete this internship successfully.
Finally, I would like to thank my beloved parents, friends and dear ones for the continuous
motivation and support.
ABSTRACT
With the boom of data science it is now possible to utilise data that has been historically
collected for generations but never been put to use to be able to make wise business
decisions, storefronts have purchase data and customer information for decades but have only
now realised the seemingly unlimited potential in utilizing this data to make smart business
decisions that aim to help the storefront in optimising it’s stock as well as being able to
efficiently utilise limited resources such as in marketing attempts to reach the largest possible
crowds and identifying their core user demographic while also being able to understand
which departments are performing the best whilst which are underperforming and need to be
cut down.
This data also helps in identifying potential customers for their platforms based off some
simple parameters, this way the business can aim it’s marketing and products towards
customers that fit this profile. The foundation also encourages peers to work and aid others in
their respective tasks by networking and creating video guides that help explain the working
of the project to interested peers.
TABLE OF CONTENTS
No. NAME Page
Title Page
Certificate i
Certificate issued by company ii
Acknowledgement iii
Abstract iv
List of Figures and tables v
1. INTRODUCTION 1
1.1 About the company 1
1.2 About the department 1
1.3 Problem Statement 1
1.4 Objectives 2
1.5 Scope 2
1.6 Timeline of Activities 2
2. IMPLEMENTATION 3
2.1 Beginning stage implementation 3
2.2 Data Visualization 4
2.3 Final model and evaluation 4
3. SNAPSHOTS 5
3.1 Orange Data Flow 5
3.2 Distribution of Labels vs Months 5
3.3 Distribution of count of sales made 6
3.4 Distribution of count of sub category Products sold 6
3.5 Sales vs Profits 7
3.6 State wise sales numbers 7
3.7 Evaluation results of Neural Network 8
3.8 Confusion Matrix 8
4. CONCLUSIONS 9
5. REFERENCES 10
LIST OF FIGURES
Fig no. Description

3.1 Orange Data Flow
3.2 Distribution of Label vs Months
3.3 Distribution of count of sales made
3.4 Distribution of count of sub category
Product sold
3.5 Sales vs Profits
3.6 State wise sales numbers
3.7 Evaluation results of Neural Network
3.8 Confusion Matrix
LIST OF TABLES
Fig no. Description

1.6 Timeline of activity
Data Science & Business Analytics Intern,
The Sparks Foundation
1. INTRODUCTION
1.1 About the company

The Sparks Foundation offers a platform for students from all over the world to connect and
offer symbiotic guidance in relation to various domains. The company is based in Singapore and
was founded in 2017. The core focus of the company involves its Graduate Rotational Internship
Program that allows interns to complete tasks with respect to the domain of their choice while
also offering them to chance other interns with the aid of self recorded YouTube videos.
1.2 About the department

The Data Science & Business Analytics department aims to utilise sample datasets from retail
stores on which appropriate methods can be performed to make inferences with regards to the
financial prospects of the company and identify areas of performance or lack thereof.
1.3 Problem Statement

Storefronts have extensive amounts of raw data that deal with sales made, ranging from customer
locations, time of purchases, category of purchase, type of shipping chosen, overall profits and
sales etc.
This data can be used to identify various aspects of the business, such as which products are
selling the best, this helps in deciding which product category can de with more variety since
there exists a market for it or conversely which category either needs reinvigorating or ultimately
being discontinued so as to cut losses, it can also identify where the core user base and
demographic exist, so as to direct marketing costs towards these hubs so as to maximise reach
and profits. We can also Machine Learning models such as Neural Networks to look at certain
information with regards to an online shopping platform that deals with metrics such as exit
rates, page values and bounce rates to predict as to whether a customer is likely to make a
Dept. of ISE, NMIT 2021-22 Page 1

purchase or not, allowing the storefront to target customers whose usage patterns fit similar
parameters.
1.4 Objectives
● To use visualization tools such as Distributions, Scatter plots and box plots to make
naked eye inferences with regards to the relationship between the features.
● Identify and infer the reasons behind certain patterns in the data and their application
towards making business decisions that lead to increased profitability.
● Use the data as input to an appropriate model like a Neural Network to train and be able
to predict which customers are more likely to to make a purchase on the platform.
● To explain the observations using a YouTube video so as to present the inferences in a
simple and understandable way for the information to be conveyed to someone that isn’t
proficient at Exploratory Data Analysis.
1.5 Scope
The idea is to utilise data that most storefronts collect by default and by putting it to some
genuine use by making inferences on how the business can make certain decisions or changes to
maximise profits or cut down on losses.
1.6 Timeline of Activities
Duration Key Activities/task Status
Week 1, Sep 2021 1. Understand dataset and it’s features Completed

2. Decide on Visualisation tools to be used

Week 2, Sep 2021 1. Code the visualization tools on Python and use Completed
widgets on Orange.
2. Make the appropriate inferences with respect to
the graphs produced.
Week 3, Sep 2021 1. Using the inferences decide which features are to Completed
be used for the ML model.
2. Train the model.
3. Rate the model using evaluation metrics.
Week 4, Sep 2021 1. Record YouTube video explaining the inferences Completed
made.
2. IMPLEMENTATION
2.1 Beginning stage implementation
Initially the implementation started with requirement analysis. The dataset and it’s features are
understood thoroughly as this helps in approaching the visualization aspects. Python was used to
code the visualization while Orange Widget Tool was selected for the ML model.
Tools and Tech Stack
● Orange Widget Tool
● Jupyter Notebook
● Python, libraries:
○ numpy
○ pandas
○ matplotlib
○ seabron

2.2 Data Visualization

We use the built in widgets of Orange tool and the functions offered by the matplotlib Library in
Python to create graphs such as Distribution Plots, Scatter Plots and Box Plots to understand how
where we see some patterns or inferences to be made, using say distribution plots of whether a
purchase is made or not against the months of the year(See fig 3.2) we notice that during months
like March there is significantly lower traffic as opposed to months like November or December
owing to the fact that gift shopping during festive season drives a large amount of sales.
Using a distribution of the sale split by category of product(See fig 3.3), segment of customer,
shipping mode and region we can see which category of products sells the best, it’s clear to see
that it’s Office Supplies, the store may then to choose prioritise in bringing more varieties of
those products while also realising that they might have to offer incentives such as discounts for
other categories to drive up sales. We also notice that the Standard Class is the most chosen
Shipping Mode, this might encourage the store to discontinue something as resource consuming
as “Same Day” shipping as it has a lower user base.
2.3 Final model and evaluation

Using Orange widget tool we provide the dataset as the input to the Neural Network Model to so
as to train it to be able to make predictions about whether a customer will make a purchase or
not, this Neural Network uses the logic of backward propagation, this means the the model
works backwards from the final layer of neurons to compare its results against the true/original
class labels to decide what the associated weights are meant to be set as so as to get the same
label. Once it’s been trained we use our testing set to evaluate the model and judge its accuracy
using metrics such as Precision, Recall and F1 Score which is a derivative of the two, these
metrics are founded on the Confusion Matrix(See fig 3.8) which presents a tabular result of how
many samples were correctly labelled and otherwise.

3. SNAPSHOTS
Figure 3.1: Orange Data Flow
Figure 3.2: Distribution of Labels vs Months

Figure 3.3: Distribution of Count of sales made
Figure 3.4: Distribution of count of sub category Products sold

Figure 3.5: Sales vs Profit
Figure 3.6: State wise sales numbers

Figure 3.7: Evaluation results of Neural Network
Figure 3.8: Confusion Matrix

4. CONCLUSION
In a nutshell, this internship has been an excellent and rewarding experience. I can conclude that
there has been a lot I've learnt from my work at The Sparks Foundation, despite having
completed multiple courses in the field of Exploratory Data Analytics this was my first time
using it to this extent especially in the sense of business oriented domain such as a shopping
website, this allowed me to see the real world application of these tools and models with
reference to a store and its aim to improve business. Using the Orange Widget Tool was quite
helpful as it allowed for ease with experimentation of different parameters for both visualization
and the model itself.

5. REFERENCES
● Plots
○ https://seaborn.pydata.org/generated/seaborn.countplot.html
○ https://orangedatamining.com/widget-catalog/visualize/distributions/
○ https://orangedatamining.com/widget-catalog/visualize/scatterplot/
○ https://orangedatamining.com/widget-catalog/visualize/boxplot/
● Correlations https://orangedatamining.com/widget-catalog/data/correlations/
● Neural Network
○ https://orangedatamining.com/widget-catalog/model/neuralnetwork/
○ https://machinelearningmastery.com/implement-backpropagation-algorithm-scratc
h-python/
○ https://towardsdatascience.com/how-does-back-propagation-in-artificial-neural-ne
tworks-work-c7cad873ea7
○ http://neuralnetworksanddeeplearning.com/chap2.html
○ https://www.jeremyjordan.me/neural-networks-training/
● Confusion Matrix
○ https://orangedatamining.com/widget-catalog/evaluate/confusionmatrix/
○ https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62
● Evaluation Metrics https://orangedatamining.com/widget-catalog/evaluate/testandscore/
● Orange Basics
https://www.youtube.com/watch?v=HXjnDIgGDuI&list=PLmNPvQr9Tf-ZSDLwOzxpv
Y-HrE0yv-8Fy

Sample Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sample Report

Uploaded by

Copyright:

Available Formats

Nitte Meenakshi Institute of Technology

(AN AUTONOMOUS INSTITUTION AFFILIATED TO VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAUM

Department of Information Science & Engineering

Department of Information Science & Engineering

Data Science & Business Analytics Intern - The Sparks Foundation

Submitted in fulfillment for award of Internship for

Department of Information Science & Engineering

Examination Panel Head of Department

Fig no. Description

Fig no. Description

1.1 About the company

1.2 About the department

1.3 Problem Statement

Dept. of ISE, NMIT 2021-22 Page 1

1.6 Timeline of Activities

Duration Key Activities/task Status

Week 1, Sep 2021 1. Understand dataset and it’s features Completed

Dept. of ISE, NMIT 2021-22 Page 2

Tools and Tech Stack

● Orange Widget Tool

Dept. of ISE, NMIT 2021-22 Page 3

2.2 Data Visualization

2.3 Final model and evaluation

Dept. of ISE, NMIT 2021-22 Page 4

Figure 3.1: Orange Data Flow

Figure 3.2: Distribution of Labels vs Months

Dept. of ISE, NMIT 2021-22 Page 5

Figure 3.3: Distribution of Count of sales made

Figure 3.4: Distribution of count of sub category Products sold

Dept. of ISE, NMIT 2021-22 Page 6

Figure 3.5: Sales vs Profit

Figure 3.6: State wise sales numbers

Dept. of ISE, NMIT 2021-22 Page 7

Figure 3.7: Evaluation results of Neural Network

Figure 3.8: Confusion Matrix

Dept. of ISE, NMIT 2021-22 Page 8

Dept. of ISE, NMIT 2021-22 Page 9

Dept. of ISE, NMIT 2021-22 Page 10

You might also like