Professional Documents
Culture Documents
INTERNSHIP REPORT
Submitted by:
Karthik Senthil
1NT18IS201
Nitte Meenakshi Institute of Technology
(AN AUTONOMOUS INSTITUTION AFFILIATED TO VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAUM
A report on
Bachelor of Engineering
in
Information Science and Engineering
Submitted by
Karthik Senthil
1NT18IS201
Nitte Meenakshi Institute of Technology
(AN AUTONOMOUS INSTITUTION AFFILIATED TO VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAUM
CERTIFICATE
This is to certify that the internship entitled Data Science & Business Analytics Intern at
The Sparks Foundation is a bonafide work carried out by Karthik Senthil (1NT18IS201)
in fulfillment for the award of degree of Bachelor of Engineering in Information Science
and Engineering. It is certified that all the corrections/suggestions indicated for internal
assessment have been incorporated in the report submitted in the departmental library. The
report has been approved as it satisfies the academic requirements in respect to the internship
work prescribed for bachelor of engineering degree.
........................................ ........................................
Dr. Mohan S.G.
Professor and Head, Dept. of ISE,
NMIT
CERTIFICATE ISSUED BY COMPANY
ACKNOWLEDGEMENT
The internship opportunity that I had with The Sparks Foundation was a great chance for
learning and professional development. I'm grateful for having a chance to meet so many
wonderful people and professionals through this internship period. The credit for the
successful completion of this internship goes beyond my own work, to those people who
have always been with me throughout. I take this opportunity to express my heartfelt
gratitude to each one of them.
I would like to thank Mr. PRANAV DUBEY (Managing Director) for giving me this
opportunity to work as an intern and also for guiding and supporting me in the completion of
the internship on time.
I would like to convey my heartfelt thanks to Dr. H.C. NAGARAJ, our principal and to
Dr.Mohan S. G., HoD, Department of Information Science and Engineering for giving me a
chance to embark on this opportunity.
I would like to express my deepest gratitude to my friends Mr. SRIRANJAN S and Mr.
KAUSHAL BHAT, for their continuous support and encouragement that enabled me to
complete this internship successfully.
Finally, I would like to thank my beloved parents, friends and dear ones for the continuous
motivation and support.
ABSTRACT
With the boom of data science it is now possible to utilise data that has been historically
collected for generations but never been put to use to be able to make wise business
decisions, storefronts have purchase data and customer information for decades but have only
now realised the seemingly unlimited potential in utilizing this data to make smart business
decisions that aim to help the storefront in optimising it’s stock as well as being able to
efficiently utilise limited resources such as in marketing attempts to reach the largest possible
crowds and identifying their core user demographic while also being able to understand
which departments are performing the best whilst which are underperforming and need to be
cut down.
This data also helps in identifying potential customers for their platforms based off some
simple parameters, this way the business can aim it’s marketing and products towards
customers that fit this profile. The foundation also encourages peers to work and aid others in
their respective tasks by networking and creating video guides that help explain the working
of the project to interested peers.
TABLE OF CONTENTS
No. NAME Page
Title Page
Certificate i
Certificate issued by company ii
Acknowledgement iii
Abstract iv
List of Figures and tables v
1. INTRODUCTION 1
1.1 About the company 1
1.2 About the department 1
1.3 Problem Statement 1
1.4 Objectives 2
1.5 Scope 2
1.6 Timeline of Activities 2
2. IMPLEMENTATION 3
2.1 Beginning stage implementation 3
2.2 Data Visualization 4
2.3 Final model and evaluation 4
3. SNAPSHOTS 5
3.1 Orange Data Flow 5
3.2 Distribution of Labels vs Months 5
3.3 Distribution of count of sales made 6
3.4 Distribution of count of sub category Products sold 6
3.5 Sales vs Profits 7
3.6 State wise sales numbers 7
3.7 Evaluation results of Neural Network 8
3.8 Confusion Matrix 8
4. CONCLUSIONS 9
5. REFERENCES 10
LIST OF FIGURES
LIST OF TABLES
1. INTRODUCTION
This data can be used to identify various aspects of the business, such as which products are
selling the best, this helps in deciding which product category can de with more variety since
there exists a market for it or conversely which category either needs reinvigorating or ultimately
being discontinued so as to cut losses, it can also identify where the core user base and
demographic exist, so as to direct marketing costs towards these hubs so as to maximise reach
and profits. We can also Machine Learning models such as Neural Networks to look at certain
information with regards to an online shopping platform that deals with metrics such as exit
rates, page values and bounce rates to predict as to whether a customer is likely to make a
purchase or not, allowing the storefront to target customers whose usage patterns fit similar
parameters.
1.4 Objectives
● To use visualization tools such as Distributions, Scatter plots and box plots to make
naked eye inferences with regards to the relationship between the features.
● Identify and infer the reasons behind certain patterns in the data and their application
towards making business decisions that lead to increased profitability.
● Use the data as input to an appropriate model like a Neural Network to train and be able
to predict which customers are more likely to to make a purchase on the platform.
● To explain the observations using a YouTube video so as to present the inferences in a
simple and understandable way for the information to be conveyed to someone that isn’t
proficient at Exploratory Data Analysis.
1.5 Scope
The idea is to utilise data that most storefronts collect by default and by putting it to some
genuine use by making inferences on how the business can make certain decisions or changes to
maximise profits or cut down on losses.
Week 2, Sep 2021 1. Code the visualization tools on Python and use Completed
widgets on Orange.
2. Make the appropriate inferences with respect to
the graphs produced.
Week 3, Sep 2021 1. Using the inferences decide which features are to Completed
be used for the ML model.
2. Train the model.
3. Rate the model using evaluation metrics.
Week 4, Sep 2021 1. Record YouTube video explaining the inferences Completed
made.
2. IMPLEMENTATION
2.1 Beginning stage implementation
Initially the implementation started with requirement analysis. The dataset and it’s features are
understood thoroughly as this helps in approaching the visualization aspects. Python was used to
code the visualization while Orange Widget Tool was selected for the ML model.
● Jupyter Notebook
● Python, libraries:
○ numpy
○ pandas
○ matplotlib
○ seabron
Using a distribution of the sale split by category of product(See fig 3.3), segment of customer,
shipping mode and region we can see which category of products sells the best, it’s clear to see
that it’s Office Supplies, the store may then to choose prioritise in bringing more varieties of
those products while also realising that they might have to offer incentives such as discounts for
other categories to drive up sales. We also notice that the Standard Class is the most chosen
Shipping Mode, this might encourage the store to discontinue something as resource consuming
as “Same Day” shipping as it has a lower user base.
3. SNAPSHOTS
4. CONCLUSION
In a nutshell, this internship has been an excellent and rewarding experience. I can conclude that
there has been a lot I've learnt from my work at The Sparks Foundation, despite having
completed multiple courses in the field of Exploratory Data Analytics this was my first time
using it to this extent especially in the sense of business oriented domain such as a shopping
website, this allowed me to see the real world application of these tools and models with
reference to a store and its aim to improve business. Using the Orange Widget Tool was quite
helpful as it allowed for ease with experimentation of different parameters for both visualization
and the model itself.
5. REFERENCES
● Plots
○ https://seaborn.pydata.org/generated/seaborn.countplot.html
○ https://orangedatamining.com/widget-catalog/visualize/distributions/
○ https://orangedatamining.com/widget-catalog/visualize/scatterplot/
○ https://orangedatamining.com/widget-catalog/visualize/boxplot/
● Correlations https://orangedatamining.com/widget-catalog/data/correlations/
● Neural Network
○ https://orangedatamining.com/widget-catalog/model/neuralnetwork/
○ https://machinelearningmastery.com/implement-backpropagation-algorithm-scratc
h-python/
○ https://towardsdatascience.com/how-does-back-propagation-in-artificial-neural-ne
tworks-work-c7cad873ea7
○ http://neuralnetworksanddeeplearning.com/chap2.html
○ https://www.jeremyjordan.me/neural-networks-training/
● Confusion Matrix
○ https://orangedatamining.com/widget-catalog/evaluate/confusionmatrix/
○ https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62
● Evaluation Metrics https://orangedatamining.com/widget-catalog/evaluate/testandscore/
● Orange Basics
https://www.youtube.com/watch?v=HXjnDIgGDuI&list=PLmNPvQr9Tf-ZSDLwOzxpv
Y-HrE0yv-8Fy