You are on page 1of 18

Nitte Meenakshi Institute of Technology

(AN AUTONOMOUS INSTITUTION AFFILIATED TO VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAUM

Department of Information Science & Engineering

INTERNSHIP REPORT

Submitted by:

Karthik Senthil

1NT18IS201
Nitte Meenakshi Institute of Technology
(AN AUTONOMOUS INSTITUTION AFFILIATED TO VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAUM

Department of Information Science & Engineering

A report on

Data Science & Business Analytics Intern - The Sparks Foundation

Submitted in fulfillment for award of Internship for

Bachelor of Engineering
in
Information Science and Engineering

Submitted by

Karthik Senthil

1NT18IS201
Nitte Meenakshi Institute of Technology
(AN AUTONOMOUS INSTITUTION AFFILIATED TO VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAUM

Department of Information Science & Engineering

CERTIFICATE
This is to certify that the internship entitled Data Science & Business Analytics Intern at
The Sparks Foundation is a bonafide work carried out by Karthik Senthil (1NT18IS201)
in fulfillment for the award of degree of Bachelor of Engineering in Information Science
and Engineering. It is certified that all the corrections/suggestions indicated for internal
assessment have been incorporated in the report submitted in the departmental library. The
report has been approved as it satisfies the academic requirements in respect to the internship
work prescribed for bachelor of engineering degree.

Examination Panel Head of Department

........................................ ........................................
Dr. Mohan S.G.
Professor and Head, Dept. of ISE,
NMIT
CERTIFICATE ISSUED BY COMPANY
ACKNOWLEDGEMENT

The internship opportunity that I had with The Sparks Foundation was a great chance for
learning and professional development. I'm grateful for having a chance to meet so many
wonderful people and professionals through this internship period. The credit for the
successful completion of this internship goes beyond my own work, to those people who
have always been with me throughout. I take this opportunity to express my heartfelt
gratitude to each one of them.

I would like to thank Mr. PRANAV DUBEY (Managing Director) for giving me this
opportunity to work as an intern and also for guiding and supporting me in the completion of
the internship on time.

I would like to convey my heartfelt thanks to Dr. H.C. NAGARAJ, our principal and to
Dr.Mohan S. G., HoD, Department of Information Science and Engineering for giving me a
chance to embark on this opportunity.

I would like to express my deepest gratitude to my friends Mr. SRIRANJAN S and Mr.
KAUSHAL BHAT, for their continuous support and encouragement that enabled me to
complete this internship successfully.

Finally, I would like to thank my beloved parents, friends and dear ones for the continuous
motivation and support.
ABSTRACT
With the boom of data science it is now possible to utilise data that has been historically
collected for generations but never been put to use to be able to make wise business
decisions, storefronts have purchase data and customer information for decades but have only
now realised the seemingly unlimited potential in utilizing this data to make smart business
decisions that aim to help the storefront in optimising it’s stock as well as being able to
efficiently utilise limited resources such as in marketing attempts to reach the largest possible
crowds and identifying their core user demographic while also being able to understand
which departments are performing the best whilst which are underperforming and need to be
cut down.

This data also helps in identifying potential customers for their platforms based off some
simple parameters, this way the business can aim it’s marketing and products towards
customers that fit this profile. The foundation also encourages peers to work and aid others in
their respective tasks by networking and creating video guides that help explain the working
of the project to interested peers.
TABLE OF CONTENTS
No. NAME Page
Title Page
Certificate i
Certificate issued by company ii
Acknowledgement iii
Abstract iv
List of Figures and tables v

1. INTRODUCTION 1
1.1 About the company 1
1.2 About the department 1
1.3 Problem Statement 1
1.4 Objectives 2
1.5 Scope 2
1.6 Timeline of Activities 2

2. IMPLEMENTATION 3
2.1 Beginning stage implementation 3
2.2 Data Visualization 4
2.3 Final model and evaluation 4

3. SNAPSHOTS 5
3.1 Orange Data Flow 5
3.2 Distribution of Labels vs Months 5
3.3 Distribution of count of sales made 6
3.4 Distribution of count of sub category Products sold 6
3.5 Sales vs Profits 7
3.6 State wise sales numbers 7
3.7 Evaluation results of Neural Network 8
3.8 Confusion Matrix 8

4. CONCLUSIONS 9

5. REFERENCES 10
LIST OF FIGURES

Fig no. Description


3.1 Orange Data Flow
3.2 Distribution of Label vs Months
3.3 Distribution of count of sales made
3.4 Distribution of count of sub category
Product sold
3.5 Sales vs Profits
3.6 State wise sales numbers
3.7 Evaluation results of Neural Network
3.8 Confusion Matrix

LIST OF TABLES

Fig no. Description


1.6 Timeline of activity
Data Science & Business Analytics Intern,
The Sparks Foundation

1. INTRODUCTION

1.1 About the company


The Sparks Foundation offers a platform for students from all over the world to connect and
offer symbiotic guidance in relation to various domains. The company is based in Singapore and
was founded in 2017. The core focus of the company involves its Graduate Rotational Internship
Program that allows interns to complete tasks with respect to the domain of their choice while
also offering them to chance other interns with the aid of self recorded YouTube videos.

1.2 About the department


The Data Science & Business Analytics department aims to utilise sample datasets from retail
stores on which appropriate methods can be performed to make inferences with regards to the
financial prospects of the company and identify areas of performance or lack thereof.

1.3 Problem Statement


Storefronts have extensive amounts of raw data that deal with sales made, ranging from customer
locations, time of purchases, category of purchase, type of shipping chosen, overall profits and
sales etc.

This data can be used to identify various aspects of the business, such as which products are
selling the best, this helps in deciding which product category can de with more variety since
there exists a market for it or conversely which category either needs reinvigorating or ultimately
being discontinued so as to cut losses, it can also identify where the core user base and
demographic exist, so as to direct marketing costs towards these hubs so as to maximise reach
and profits. We can also Machine Learning models such as Neural Networks to look at certain
information with regards to an online shopping platform that deals with metrics such as exit
rates, page values and bounce rates to predict as to whether a customer is likely to make a

Dept. of ISE, NMIT 2021-22 Page 1


Data Science & Business Analytics Intern,
The Sparks Foundation

purchase or not, allowing the storefront to target customers whose usage patterns fit similar
parameters.

1.4 Objectives
● To use visualization tools such as Distributions, Scatter plots and box plots to make
naked eye inferences with regards to the relationship between the features.
● Identify and infer the reasons behind certain patterns in the data and their application
towards making business decisions that lead to increased profitability.
● Use the data as input to an appropriate model like a Neural Network to train and be able
to predict which customers are more likely to to make a purchase on the platform.
● To explain the observations using a YouTube video so as to present the inferences in a
simple and understandable way for the information to be conveyed to someone that isn’t
proficient at Exploratory Data Analysis.

1.5 Scope
The idea is to utilise data that most storefronts collect by default and by putting it to some
genuine use by making inferences on how the business can make certain decisions or changes to
maximise profits or cut down on losses.

1.6 Timeline of Activities

Duration Key Activities/task Status

Week 1, Sep 2021 1. Understand dataset and it’s features Completed


2. Decide on Visualisation tools to be used

Dept. of ISE, NMIT 2021-22 Page 2


Data Science & Business Analytics Intern,
The Sparks Foundation

Week 2, Sep 2021 1. Code the visualization tools on Python and use Completed
widgets on Orange.
2. Make the appropriate inferences with respect to
the graphs produced.

Week 3, Sep 2021 1. Using the inferences decide which features are to Completed
be used for the ML model.
2. Train the model.
3. Rate the model using evaluation metrics.

Week 4, Sep 2021 1. Record YouTube video explaining the inferences Completed
made.

2. IMPLEMENTATION
2.1 Beginning stage implementation
Initially the implementation started with requirement analysis. The dataset and it’s features are
understood thoroughly as this helps in approaching the visualization aspects. Python was used to
code the visualization while Orange Widget Tool was selected for the ML model.

Tools and Tech Stack

● Orange Widget Tool

● Jupyter Notebook

● Python, libraries:
○ numpy
○ pandas
○ matplotlib
○ seabron

Dept. of ISE, NMIT 2021-22 Page 3


Data Science & Business Analytics Intern,
The Sparks Foundation

2.2 Data Visualization


We use the built in widgets of Orange tool and the functions offered by the matplotlib Library in
Python to create graphs such as Distribution Plots, Scatter Plots and Box Plots to understand how
where we see some patterns or inferences to be made, using say distribution plots of whether a
purchase is made or not against the months of the year(See fig 3.2) we notice that during months
like March there is significantly lower traffic as opposed to months like November or December
owing to the fact that gift shopping during festive season drives a large amount of sales.

Using a distribution of the sale split by category of product(See fig 3.3), segment of customer,
shipping mode and region we can see which category of products sells the best, it’s clear to see
that it’s Office Supplies, the store may then to choose prioritise in bringing more varieties of
those products while also realising that they might have to offer incentives such as discounts for
other categories to drive up sales. We also notice that the Standard Class is the most chosen
Shipping Mode, this might encourage the store to discontinue something as resource consuming
as “Same Day” shipping as it has a lower user base.

2.3 Final model and evaluation


Using Orange widget tool we provide the dataset as the input to the Neural Network Model to so
as to train it to be able to make predictions about whether a customer will make a purchase or
not, this Neural Network uses the logic of backward propagation, this means the the model
works backwards from the final layer of neurons to compare its results against the true/original
class labels to decide what the associated weights are meant to be set as so as to get the same
label. Once it’s been trained we use our testing set to evaluate the model and judge its accuracy
using metrics such as Precision, Recall and F1 Score which is a derivative of the two, these
metrics are founded on the Confusion Matrix(See fig 3.8) which presents a tabular result of how
many samples were correctly labelled and otherwise.

Dept. of ISE, NMIT 2021-22 Page 4


Data Science & Business Analytics Intern,
The Sparks Foundation

3. SNAPSHOTS

Figure 3.1: Orange Data Flow

Figure 3.2: Distribution of Labels vs Months

Dept. of ISE, NMIT 2021-22 Page 5


Data Science & Business Analytics Intern,
The Sparks Foundation

Figure 3.3: Distribution of Count of sales made

Figure 3.4: Distribution of count of sub category Products sold

Dept. of ISE, NMIT 2021-22 Page 6


Data Science & Business Analytics Intern,
The Sparks Foundation

Figure 3.5: Sales vs Profit

Figure 3.6: State wise sales numbers

Dept. of ISE, NMIT 2021-22 Page 7


Data Science & Business Analytics Intern,
The Sparks Foundation

Figure 3.7: Evaluation results of Neural Network

Figure 3.8: Confusion Matrix

Dept. of ISE, NMIT 2021-22 Page 8


Data Science & Business Analytics Intern,
The Sparks Foundation

4. CONCLUSION
In a nutshell, this internship has been an excellent and rewarding experience. I can conclude that
there has been a lot I've learnt from my work at The Sparks Foundation, despite having
completed multiple courses in the field of Exploratory Data Analytics this was my first time
using it to this extent especially in the sense of business oriented domain such as a shopping
website, this allowed me to see the real world application of these tools and models with
reference to a store and its aim to improve business. Using the Orange Widget Tool was quite
helpful as it allowed for ease with experimentation of different parameters for both visualization
and the model itself.

Dept. of ISE, NMIT 2021-22 Page 9


Data Science & Business Analytics Intern,
The Sparks Foundation

5. REFERENCES
● Plots
○ https://seaborn.pydata.org/generated/seaborn.countplot.html
○ https://orangedatamining.com/widget-catalog/visualize/distributions/
○ https://orangedatamining.com/widget-catalog/visualize/scatterplot/
○ https://orangedatamining.com/widget-catalog/visualize/boxplot/
● Correlations https://orangedatamining.com/widget-catalog/data/correlations/
● Neural Network
○ https://orangedatamining.com/widget-catalog/model/neuralnetwork/
○ https://machinelearningmastery.com/implement-backpropagation-algorithm-scratc
h-python/
○ https://towardsdatascience.com/how-does-back-propagation-in-artificial-neural-ne
tworks-work-c7cad873ea7
○ http://neuralnetworksanddeeplearning.com/chap2.html
○ https://www.jeremyjordan.me/neural-networks-training/
● Confusion Matrix
○ https://orangedatamining.com/widget-catalog/evaluate/confusionmatrix/
○ https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62
● Evaluation Metrics https://orangedatamining.com/widget-catalog/evaluate/testandscore/
● Orange Basics
https://www.youtube.com/watch?v=HXjnDIgGDuI&list=PLmNPvQr9Tf-ZSDLwOzxpv
Y-HrE0yv-8Fy

Dept. of ISE, NMIT 2021-22 Page 10

You might also like