Professional Documents
Culture Documents
1.EXECUTIVE SUMMARY
Sales prediction is the current numerous trend in which all the business companies thrive and
it also aids the organization or concern in determining the future goals for it and its plan and
procedure to achieve it. The data about car sales are derived from various sources .Sales of
cars does not contain any independent variable since various factors such as horse power,
model, width, fuel type, height, price, city-mileage, highway-mileage and manufacturer are
the various features that influence the sales.
In car sales prediction we first implement the methodology of analytic hierarchy process in
order to get varied idea about how well the various criteria in our dataset works and after this
we apply the machine learning algorithms such as Linear regression to get the best clusters
and we process them in t o random forest to get best accurate feature out of it.
Using Jupiter file browser of Anaconda Navigator which is a tool which helps the researcher
to arrive at a verdict when he/she faces the one or more pattern selection problem and the
final resultant derived from all these methods gives the fittest feature which influences the
customer in purchasing the car which indirectly gives the company or the research market a
result in predicting the future sales for cars. The Jupyter notebooks contain Python code.
Using a regression analysis with predicted variables, the asymmetric relationship between
attribute-level performance and overall customer satisfaction could be confirmed.However,
the analysis of time-series data reveals that a positive and significant relationship exists
between changes in customer satisfaction and changes in the performance of the firm.
Therefore, the impact of an increase in customer satisfaction on profits, although obscured in
the short run by many factors, is significantly positive in the long run.
2. INTRODUCTION
Every human being has got his/her business in this world, so the term business comes
with certain tagged words such as sales, profit and loss. A concerns success is determined by
its sales and the performance of its product in the industry is also identified by its sales.
Hence in order to improve the standard of the firm /concern the strategies, the techniques the
methodologies are built. In this process of development forecasting of certain key terms such
as profit, loss, return of interest may pave the way for the successful venture of the concern
yet when we look deep in to the details all these key terms are conveniently related to the
term called sales.
Sales prediction is one of the master trades of business which may open the gateways
for obtaining knowledge about the existing market trends and the ways to conquer the
market, planning is the first step in every activity we perform and hence knowing what lies
ahead in terms of sales hugely aids the concern/organization in this planning process.
Salesprediction can be a massive support in order to perform tracking,cash flow and
purchasing. Sales forecasting provides the business minds an idea of about how much to buy
and how much not to buy ,what are the risk can be taken in terms of revenue ,how to plan
1
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
budget ,market trends, introduction of new products according to the organizations capability
and ability ,what changes may happen if the plan fails are the areas where it helps to develop
our ideas ,let us shift our focus from the generic business term to its detailed chunk of
streams such as education,medical,automobile and lots of other streams.
Once great interest in recent days lies in automobile industry. The first boon to the
automobile industry came only after the industrialization renaissance period. Now the
automobile industry creates, packs, runs, moves the world on its wheels and also allows other
fields to flourish parallel along with them ,yet it also has its own defaults since it is not cost
friendly ,any single default can affect the whole system and it cause a huge failure in the
market for its brand ,there is a huge amount of revenue being spend and huge amount of
chunks being produced and exploited on daily basis hence sales forecasting in this industry
can lead to the organized pattern of selling,buying,producing goods and even the taxes to be
implemented also comes in to role.
Forecasting these sales in automobile industries can be performed with various and
variety of technologies and one among them is Data Analytics which uses the machine
learning technique. That may help in classifying the prediction of an automobile for a say, let
us take it as car; the yearly sales of it if know before then it will provide the manufacturer the
huge boost in designing it, getting spare parts, getting key parts and reducing the waste
products and tracking its revenue model its generation and various other activity. The
classifiers used such as logistic regression and random forecast provideus with accurate
predicted results.However, the analysis of time-series data reveals that a positive and
significant relationship exists between changes in customer satisfaction and changes in the
performance of the firm. Therefore, the impact of an increase in customer satisfaction on
profits, although obscured in the short run by many factors, is significantly positive in the
long run.
2.1.COMPANY OVERVIEW
Konigtronics (OPC) Private Limited in Figure 1.1 was started in August 2016 byMr. Vishesh.
S, B.E. (Telecommunication) to educate the students and the public oninnovative methods of
Research and Development in the field of Engineering andTechnology. Konigtronics (OPC)
Private Limited was incorporated as a corporatecompany by the Ministry of Corporate
2
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
Affairs (MCA) on 9th January 2017. (Pursuant tosub-section (2) of section 7 of the
companies Act, 2013 and Rule 18 the companies(incorporation) Rules, 2014)
Konigtronics Pvt Ltd has extended its services in the field of web and
applicationdevelopment, software engineering and has produced numerous products and
hassuccessfully commercialized it. Konigtronics has conducted many technical workshopson
topics like embedded systems, IoT (Internet of Things), cloud computing, networksimulation
and bio-medical signal processing. Konigtronics has now extended its servicein the field of
real-estate and civil works, career planning and entrepreneurship. Thecompany is located ina
prime location in central Bangalore and as many as 3 dedicatedengineers and several interns
working in it. Konigtronics has till now guided over 400students/professionals in their
projects, research and development activities.
approach in development.
B. Software engineering
• Web design
• Application design
3
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
• Professional training.
• Non-Engineers
3.PROJECT PROFILE
3.2. METHODOLOGY
4
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
new target variable (purchase amount for a new product) in predictive analytics.
Alternatively, the objective for predictive analytics can also be to acquire data that will be
used for predictor variables (rather than the target variable) that are not available in the
current customer databases. A company may have millions of customer records. It is
practically impossible to acquire the data from all of its customers due to cost constraints. For
the purposes of information gathering and business analytics, it suffices to acquire the data
from a representative sample of the customer population.
Data acquisition includes abstracting information from hard copy records, acquiring
information from electronic records, acquiring data from files that reside on the internet or in
other repositories, linking extant data files to other data repositories or to existing primary
survey data. Data acquisition is growing in demand because it allows researchers to take
advantage of a wide variety of existing data sources.
Survey -Survey data is defined as the resultant data that is collected from a sample of
respondents that took a survey. This data is comprehensive information gathered from a
target audience about a particular topic of interest to conduct research on the basis of this
collected data.
There are many methods used to gather survey data for statistical analysis in research.
Various mediums are used to collect feedback and opinions from the desired sample of
individuals. While conducting survey research, researchers prefer multiple sources to gather
data such as online surveys, telephonic surveys, face-to-face surveys, etc. The medium of
5
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
gathering survey data decides the sample of people that are to be reached out to, to reach the
requisite number of survey responses.
Factors of collecting survey data such as how the interviewer will contact the respondent
(online or offline), how the information is communicated to the respondents etc. decide the
effectiveness of gathered data.
There are four main survey data collection methods – Telephonic Surveys, Face-to-face
Surveys, and Online Surveys.
• Online Surveys
Online surveys are the most cost-effective and can reach the maximum number of people in
comparison to the other mediums. The performance of these surveys is much more
widespread than the other data collection methods. In situations where there is more than one
question to be asked to the target sample, certain researchers prefer conducting online surveys
over the traditional face-to-face or telephone surveys.
• Face-to-face Surveys
Gaining information from respondents via face-to-face mediums is much more effective than
the other mediums because respondents usually tend to trust the surveyors and provide honest
and clear feedback about the subject in-hand.
• Telephone Surveys
Telephone surveys require much lesser investment than face-to-face surveys. Depending on
the required reach, telephone surveys cost as much or a little more than online surveys.
Contacting respondents via the telephonic medium requires less effort and manpower than the
face-to-face survey medium.
• Paper Surveys
The other commonly used survey method is paper surveys. These surveys can be used where
laptops, computers, and tablets cannot go and hence they use the age-old method of data
collection; pen and paper. This method helps collect survey data in field research and helps
strengthen the number of responses collected and the validity of these responses.
Reports
6
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
7
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
8
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
Data scientists spend a large amount of their time cleaning datasets and getting them down to
a form with which they can work. In fact, a lot of data scientists argue that the initial steps of
obtaining and cleaning data constitute 80% of the job.
Therefore, if you are just stepping into this field or planning to step into this field, it is
important to be able to deal with messy data, whether that means missing values, inconsistent
formatting, malformed records, or nonsensical outliers.
Steps involved
• Dropping unnecessary columns in a DataFrame
• Changing the index of a DataFrame
• Using .str() methods to clean columns
• Using the DataFrame.applymap() function to clean the entire dataset, element-wise
• Renaming columns to a more recognizable set of labels
• Skipping unnecessary rows in a CSV file
3.2.4 Data preprocessing
When we talk about data, we usually think of some large datasets with huge number of rows and
columns. While that is a likely scenario, it is not always the case — data could be in so many different
forms: Structured Tables, Images, Audio files, Videos etc.
Machines don’t understand free text, image or video data as it is, they understand 1s and 0s.
So it probably won’t be good enough if we put on a slideshow of all our images and expect
our machine learning model to get trained just by that!
9
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
In any Machine Learning process, Data Preprocessing is that step in which the data gets
transformed, or Encoded, to bring it to such a state that now the machine can easily parse it. In
other words, the features of the data can now be easily interpreted by the algorithm.
• Categorical: Features whose values are taken from a defined set of values. For instance,
days in a week {Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday} is a
category because its value is always taken from this set. Another example could be the
Boolean set {True, False}
Because data is often taken from multiple sources which are normally not too reliable and that
too in different formats, more than half our time is consumed in dealing with data quality
issues when working on a machine learning problem. It is simply unrealistic to expect that the
data will be perfect. There may be problems due to human error, limitations of measuring
devices, or flaws in the data collection process. Let’s go over a few of them and methods to
deal with them,
1.Missing values:
It is very much usual to have missing values in your dataset. It may have happened during data
collection, or maybe due to some data validation rule, but regardless missing values must be
taken into consideration.
10
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
2. Inconsistent values :
We know that data can contain inconsistent values. Most probably we have already faced this
issue at some point. For instance, the ‘Address’ field contains the ‘Phone number’. It may be
due to human error or maybe the information was misread while being scanned from a
handwritten form.
• It is therefore always advised to perform data assessment like knowing what the data type
of the features should be and whether it is the same for all the data objects.
3. Duplicate values :
A dataset may include data objects which are duplicates of one another. It may happen when
say the same person submits a form more than once. The term deduplication is often used to
refer to the process of dealing with duplicates.
• In most cases, the duplicates are removed so as to not give that particular data object an
advantage or bias, when running machine learning algorithms.
When you think of data visualization, your first thought probably immediately goes to simple
bar graphs or pie charts. While these may be an integral part of visualizing data and a
common baseline for many data graphics, the right visualization must be paired with the right
set of information. Simple graphs are only the tip of the iceberg. There’s a whole selection of
visualization methods to present data in effective and interesting ways.
Data visualization: involves specific terminology, some of which is derived from statistics.
For example, author Stephen Few defines two types of data, which are used in combination to
support a meaningful analysis or visualization:
11
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
• Categorical: Text labels describing the nature of the data, such as "Name" or "Age". This
term also covers qualitative (non-numerical) data.
• Quantitative: Numerical measures, such as "25" to represent the age in years.
Two primary types of information displays are tables and graphs.
• A table contains quantitative data organized into rows and columns with categorical
labels. It is primarily used to look up specific values. In the example above, the table
might have categorical column labels representing the name (a qualitative variable) and
age (a quantitative variable), with each row of data representing one person (the
sampled experimental unit or category subdivision).
• A graph is primarily used to show relationships among data and portrays values encoded
as visual objects (e.g., lines, bars, or points). Numerical values are displayed within an
area delineated by one or more axes. These axes provide scales (quantitative and
categorical) used to label and assign values to the visual objects. Many graphs are also
referred to as charts.
A bar chart or bar graph is a chart or graph that presents categorical
data with rectangular bars with heights or lengths proportional to the values that they
represent. The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes
called a column chart.
A bar graph shows comparisons among discrete categories. One axis of the chart shows the
specific categories being compared, and the other axis represents a measured value. Some bar
graphs present bars clustered in groups of more than one, showing the values of more than
one measured variable.
12
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
Scatter plot
13
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
A heat map (or heatmap)
14
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
15
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
3.2.6 Correlation
Correlation is used to test relationships between quantitative variables or categorical
variables. In other words, it’s a measure of how things are related. The study of how variables
are correlated is called correlation analysis.
A correlation could be positive, meaning both variables move in the same direction, or
negative, meaning that when one variable’s value increases, the other variables’ values
decrease. Correlation can also be neutral or zero, meaning that the variables are unrelated.
We may also be interested in the correlation between input variables with the output variable
in order provide insight into which variables may or may not be relevant as input for
developing a model.
The structure of the relationship may be known, e.g. it may be linear, or we may have no idea
whether a relationship exists between two variables or what structure it may take. Depending
what is known about the relationship and the distribution of the variables, different
correlation scores can be calculated.
16
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
negative association.
The closer r is to 1 the closer the data points fall to a straight line; thus, the linear association
is stronger. The closer r is to 0, making the linear association weaker.
17
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
But the use of data analyticshelps in better code readability, helps in solving huge data
analyzation.
Our objective was to analyse attributes of automobile data making it more clear and easy
analysis of data, which led to effective time management in analysis of data, comparison of
data to reduce redundancy and get a future prediction of sales.
The basic steps involved in obtaining these results include,
1. Data acquisition,
The first step in analyzation of data is the acquisition of the data.The data acquired for our
purpose is from survey reports.
2. Data inspection,
The acquired data is then inspectedfor inspections help to identify and record hazards for
corrective action.During the inspection of the data variables like import, size, shape, head,
tail, describe, data types, null().any(), null().sum(), are inspected.
IMPORT
Import the class libraries, master data.
SIZE
Determine the size of the whole data present.
18
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
SHAPE
Determine the number of rows and coloumns.
HEAD
Specified number of rows are displayed from top of the data.
TAIL
Specified number of rows are displayed from bottom of the data.
DESCRIBE
Gives the value of count, mean, standard deviation and so on.
19
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
DATA TYPES
Describes the type of data which is present such as int, float, object and so on.
null().any()
Gives all the missing data in boolean data type.
20
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
null(). sum()
Gives the number of missing values.
3. Data Pre-processing
The obtained data contained unstructured, raw, crude, missing and string values which is
not recognizable to the machine.Therefore, the data needed to be pre-processed.Further
this data is converted into categorical data form of integer datatype.
21
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
4. Data visualization
22
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
• It is the graphic representation of data by using visual elements like charts, graphs,
maps etc.
• These tools helps to see and understand trends, outliers and patterns in data
• The goal is to communicate information clearly and efficiently to users.
23
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
Yellow-Satisfied customers
Blue-Unsatisfied customers
• Fuel Type
24
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
Green-DIESEL
Blue-PETROL
5. Correlation
Correlation is a statistical technique that can show whether and how strongly pairs of
variables are related.
Below represented is a heat map of the assignment.
A Heat Map is a representation of data in the form of a map or diagram in which data
values are represented as colours.
25
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
6. Machine learning
26
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
Similarly, the second random splitting is done with 80% testing and 20% training, giving
an accuracy of 86.07% test accuracy.
27
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
To predict the probability of a target variable, logistic regression plays its role.
The nature of target or dependent variable is dichotomous, which means there would be only
two possible classes.
In simple words, the dependent variable is binary in nature having data coded as either 1
(stands for success/yes) or 0 (stands for failure/no).
Types of Logistic Regression
Generally, logistic regression means binary logistic regression having binary target
variables, but there can be two more categories of target variables that can be predicted by it.
28
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
Based on those number of categories, Logistic regression can be divided into following
types −
• Binary or Binomial
In such a kind of classification, a dependent variable will have only two possible types either
1 and 0. For example, these variables may represent success or failure, yes or no, win or loss
etc.
PREDICTION
When the reasons behind a model’s outcomes are as important as the outcomes themselves,
Prediction Explanations can uncover the factors that most contribute to those outcomes.
29
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
PERFORMANCE MEASURES
The next step after implementing a machine learning algorithm is to find out how effective is
the model based on metric and datasets. Different performance metrics are used to evaluate
different Machine Learning Algorithms.
30
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
A confusion matrix is a table that is often used to describe the performance of a classification
model on a set of test data for which the true values are known.
• After the description of the performance, we get the following values, where,
Accuracy=79.74%
Precision=64.37%
Sensitivity=46.66%
Recall=42.66%
F1 score=54.10%
31
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
5.FINDINGS
After analysing the heatmap and figuring out the correlation between different columns/
physiological parameters, Logistic regression needs to be carried out to create a prediction model.
Figure 12 shows the results of logistic regression model. Figure 13 shows the Accuracy score of the
designed model. From this data, precision, f1 score and reliability can be calculated. Figure 14 shows
the R-squared calculation for the linear regression models. [6]
6. CONCLUSION
The evaluation of the product using these huge data is a very difficult and fatigue process.
Using business analytics and business intelligence we can simplify the work and is more
profitable to the industry.
7.LEARNING OUTCOME
32
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
7.1 Process control: Activities involved in ensuring a process is predictable, stable, and
consistently operating at the target level of performance with only normal variation.
For this project methodology we have used anaconda, a statistical tool for process control.
7.2 Anaconda
• Anaconda Navigator
Anaconda Navigator is a desktop graphical user interface (GUI) included in Anaconda
distribution that allows users to launch applications and manage conda packages,
environments and channels without using command-line commands. Navigator can search for
packages on Anaconda Cloud or in a local Anaconda Repository, install them in an
environment, run the packages and update them. It is available
for Windows, macOS and Linux.
The following applications are available by default in Navigator[17]:
33
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
• JupyterLab
• Jupyter Notebook
• QtConsole
• Spyder
• Glue
• Orange
• RStudio
• Visual Studio Code
Jupyter Notebook
It was created to make it easier to show one’s programming work, and to let others join in.
Jupyter Notebook allows you to combine code, comments, multimedia, and visualizations in
an interactive document — called a notebook, naturally — that can be shared, re-used, and
re-worked.
And because Jupyter Notebook runs via a web browser, the notebook itself could be hosted
on your local machine or on a remote server.
Originally developed for data science applications written in Python, R, and Julia, Jupyter
Notebook is useful in all kinds of ways for all kinds of projects:
• Data visualizations. Most people have their first exposure to Jupyter Notebook by way of a
data visualization, a shared notebook that includes a rendering of some data set as a graphic.
Jupyter Notebook lets you author visualizations, but also share them and allow interactive
changes to the shared code and data set.
34
PROJECT ON: Analysis of Trends in Automobile Industry Using Business Intelligence and Business Analytics
• Code sharing. Cloud services like GitHub and Pastebin provide ways to share code, but
they’re largely non-interactive. With a Jupyter Notebook, you can view code, execute it, and
display the results directly in your web browser.
• Live interactions with code. Jupyter Notebook code isn’t static; it can be edited and re-run
incrementally in real time, with feedback provided directly in the browser. Notebooks can
also embed user controls (e.g., sliders or text input fields) that can be used as input sources
for code.
• Documenting code samples. If you have a piece of code and you want to explain line-by-
line how it works, with live feedback all along the way, you could embed it in a Jupyter
Notebook. Best of all, the code will remain fully functional—you can add interactivity along
with the explanation, showing and telling at the same time.
REFERENCES
[1]. Interactions between kidney disease and diabetes- dangerous liaisons- Roberto Pecoits-
Filho, Hugo Abensur, Carolina C.R. Betônico, Alisson Diego Machado, Erika B. Parente,
Márcia Queiroz, João Eduardo Nunes Salles, Silvia Titan and Sergio Vencio- 2016- article
50.
[2]. The Python Standard Library — Python 3.7.1rc2 documentation
https://docs.python.org/3/library/
[3]. Data Warehousing Architecture & PreProcessing- Vishesh S, Manu Srinath, Akshatha C
Kumar, Nandan A.S.- IJARCCE, vol6, iss5, May 2017
[4]. Data Mining and Analytics: A Proactive Model -
http://www.ijarcce.com/upload/2017/february-17/IJARCCE%20117.pdf
[5]. A comparative analysis on linear regression and support vector regression- DOI:
10.1109/GET.2016.7916627- https://ieeexplore.ieee.org/abstract/document/7916627
[6]. Stock market predication using a linear regression- DOI: 10.1109/ICECA.2017.8212716,
https://ieeexplore.ieee.org/abstract/document/8212716 BIOGRAPHY
35