Big Data Project

Big Data Management Project
On
Car Sales Analysis Based on the Application of Big Data
SUBMITTED TO-
Dr. Amit Kumar Bhardwaj
Submitted by-
Abhay Thakur(401508004)
Akshay Kushwaha(401508007)
Ankit Sharma(401508008)
ABSTRACT
The field of Big Data plays a vital role in various fields. Big data is a term for massive data sets
having large amounts of data, more varied and complex structure with the difficulties of
analyzing, storing and visualizing for further processes or results. The data about car sales are
derived from various sources .Sales of cars does not contain any independent variable since
various factors such as horse power; model, width, fuel type, height, price, city-mileage,
highway-mileage and manufacturer are the various features that influence the sales.
In car sales prediction we first implement the methodology of analytic hierarchy process in order
to get varied idea about how well the various criteria’s in our dataset works and after this we
apply the machine learning algorithms such as Linear regression, Random tree to get the best
clusters and we process them in to random forest to get best accurate feature out of it. This
project takes the automobile manufacturing industry as an example, based on sale car large data
analysis, using data mining technology, through the program to prepare web crawler program for
data collection.
To give some suggestions for the automobile manufacturing industry in the production of
automobile, it reduces the inventory of automobile enterprises and the waste of resources. In this
paper Vehicle sales forecasts are used to plan production of various models in Nissan product
portfolio.
The results in this study could help the automobile companies better understand their business,
and the auto companies could use the results for possible strategic decisions. In addition,
legislatures in the impacted states could use the results to prepare for fluctuations in the industry
that would result in profound effects on the states in question.
Chapter-1 Introduction
Big Data refers to enormous amount of structured or unstructured data related to latest trends,
patterns and association which can be analyzed by companies for business gains. Big Data has
the potential for companies to improve their operations for faster, increasing decisions.
The reasons for companies to incline towards adopting big data are Time, Better Analytics, Vast
amount of data, Insights, Decision-making. There are several benefits of Big Data like Gain
instant appreciation from different data sources, Upgrade business performance through real-
time analytics, big data technologies manage huge amounts of data, Can provide better
appreciation with the help of useful and partially useful data and minimize the chances of risk
level and make right decision by proper risk analysis. Due to the various benefits of big data, the
big data applications have emerged and they help to play important roles in different fields.
Big Data Application in Media and Entertainment
New business models are launched for different companies in the media and entertainment
industry. The business model runs on collecting or creating the content, further to analyze it, then
marketing and distribution of the content. As the rate of consumer’s search increases, there is a
need for obtaining content at any moment, in whatever place in all formats on a variety of
devices. Big data runs through customer’s data along with observable data and gather even
minute information to create a customer’s detailed profile. The benefits of big data in media and
entertainment industry include forecasting what the target audience wants, planning
optimization, expanding acquisition and retention; suggest content on demand and new product
development.
Big Data in Manufacturing
The demand of natural resources like oil, minerals, petroleum, metal ores and agricultural
products has increased, which has led to an increase in the mass, proportion, velocity and
complexity that is an issue to handle. In the natural resources industry, big data helps predictive
modeling to sustain decision making and has been employed to consume and combine graphical,
text and temporal data. The benefits of Big Data applications in manufacturing are Quality of the
product, Supply scheduling, Tracking defect in Manufacturing process, Predicting output,
Expanding Energy Efficiency, Testing of new manufacturing processes, developing new
manufacturing process. Big Data has successfully solved today’s manufacturing issues and
hopefully in the future as well.
Big Data in Education
The impact of Big Data in the education world is immense. Today, internet makes it easier and
efficient and can be available almost anywhere, anytime all around the world. Hence, online
learning is available for every possible course. There are many examples of the Big Data Usage
in the education industry. Various Applications are built to help teachers and students in almost
all formats through mobile devices. Practically staff and institutions have to adapt to the new
data management and analysis tools. Adaptive learning produces big-data-fuelled projection
analysis to discover what a student is learning. Some problems may occur in the digital system of
learning like duplicity and so on, which Big Data controls it and provides significant changes.
Privacy and Personal data protection related to big data needs some attention.
Big Data in Transportation
In latest times, enormous amounts of data from location-oriented social networks and high speed
data from telecoms have influenced travel journeys. The big data applications in transportation
industry are included by governments, private and personal sector. Government use big data in
traffic handling, direction planning, and well organized transportation systems and forecasting
traffic conditions. Private sector use big data in income management, industrial enhancements,
logistics and affordable benefits. Personal use of big data includes planning directions to retain
fuel and time while travelling.
Big Data in Banking

Banking is a very crucial sector. The security, privacy, management and maintenance of any
banking system is a challenge. Big Data is very beneficial and helps in the fraud detection in
banking system. As Big Data is implemented, it searches all the illegal activities that has taken
place. It identifies the misuse of credit and debit cards, business precision, customer statistics
modification, public analytics for business.
Big Data Applications are helpful and handy in almost every field. It provides detailed
information about the respective field. It helps the industries to create new growth opportunities.
It also understands and optimizes business procedures. Big Data Applications are have been very
useful in recent times and optimistically in future as well.
Autos Sales is one of the major producers of domestic automobiles report sales monthly. This is
an important element to the financial markets since it is highly correlated with consumer
demands for the market and car sales can express the changes in the economy precisely. There is
an old warning story named “boiled frog.” The frog’s body temperature follows its surroundings.
If you put the frog directly in boiling water, it will sense the heat immediately and jump out.
However, when you heat the water slowly, the frog keeps adjusting to the rising temperature.
When the heat is too much for the frog to take, it is too late. The frog collapses and dies. This
theory can apply into economic as well. For example, in a hamburger restaurant, if the price for
one hamburger change from $5.49 to $5.99, it would not have many people noticed this change.
Therefore, it would not affect their sales a lot based on common sense. Nevertheless, if the price
for one hamburger changes from $5.99 to $6.99, people will catch this change easily and the
sales for the hamburger may decrease due to the change in price. Various differences in auto
sales share the same situation here. For predictors which can use for estimating the change in
auto sales, when the predictors change a little bit, auto sales may not shift according to this
adjustment and it will not be influenced by this. However, if there is a big change in one
predictor, then it will lead a change in auto sales just as the “boiled frog” theory. Having a small
change in predictors seems like that putting a frog in and heating the water slowly, auto sales will
adjust itself and change a little bit to follow the changing trends in predictors.
Chapter-2 Literature Review
Automobile industry is defined as the business of producing and selling vehicles (Kung and
Chang, 2004). It consists largely of a wide range of companies and organizations involved in the
process of design, development, manufacturing, marketing and so on. Moreover, the automobile
industry is also evaluated as one of the key driver of economic sectors by revenue because it has
positively encouraged the development of an extensive road system, supported the growth of
suburbs and shopping centers around major cities, and played a key role in the growth of
ancillary industries (Kulkarni and Rao, 2014) in addition, the large number of people currently
employed the industry has made it a key determinant of economic growth as well.
Automotive also contributes significantly to several important dimensions of nation building,

such as generating government revenue, creating economic development, encouraging people
development, and innovation. Consequently, the automotive sector impacts global economic
activity in a variety of ways and also affected by the global economic situation (NAR, 2013).
Sales forecasting is also an important part of starting a new business (Previts et al., 1994).
Almost all new businesses need loans or start-up capital to purchase what are necessary to get off
the ground such as office rental, equipment, inventory, etc. Particularly in getting the loan, an
entrepreneur must demonstrate his business plan or the valuable guaranty or mortgage. As
business grows, sales forecasts continue to be an important measurement of company's ability
(Doyle, 2000). Wall Street measures the success of a company by how well it meets its quarterly
sales forecasts (Jensen, 2001).
Sales forecasts are the foundation of planning. The forecasts enable an organization to have an
optimum inventory level, to make appropriate purchasing decisions and to maintain efficient
daily operations. All these affect the profits of the organization. Therefore, forecasting is critical
to profitability.
Strategic planning based on reliable forecasts is an essential key ingredient for a successful
business management within a market-oriented company. This is especially true for the
automobile industry, as it is one of the most important sectors in many countries. Reliable
forecasts cannot only be based on intuitive economic guesses of the market development.
Mathematical models are indispensable for the accuracy of the predictions as well as for the
efficiency of their calculations, which is also supported by the increase of powerful computer
resources. The application of time series models to forecasts of the registrations of new vehicles
was originally established by Lewandowski [1, 2] in the 1970s. Afterwards, a general
equilibrium model for the automobile market concerning both new car sales and used car stocks
was presented by Berkovec [3]. Thereby, equilibrium means that the demand equals the supply
for every vehicle type. Later on, Dudenhöffer and Borscheid [4] published a very important
application of time series methods to the German automobile market. However, the number of
efforts undertaken in this field of research is quite small to date.
Demand planning involves the process of creating and affecting demand in the future.
Regardless of method chosen (promotion, etc.), forecasting helps assess the impact of each
possible decision upon demand. Demand management integrates all aspect of an organization’s
strengths and weaknesses. It includes not only planning and forecasting but also coordinating all
activities that affect customer demand, e.g. creating, shaping and fulfilling demand.
Demand forecasting requires projecting what will happen to demand in the future. Obviously,
this requires statistical forecasting methods. Unfortunately, there is still a gap between the
statistical and economic techniques offered by forecasting and the judgmental technique most
executives use to forecast demand (Shahabuddin, 1987; Sanders, 1997).
There are many techniques of forecasting, and they vary in complexity, ease of use, and the
amount of data needed. Among the many forecasting techniques, many surveys (Sanders, 1997;
Mahmoud, 1984) have found the judgmental technique to be dominant. However, many studies
(Armstrong, 1986; Dunn and Wright, 1991) have found that the judgmental technique is less
accurate, more biased, and more likely to lead to poor forecasts than other techniques.
Methods based on statistical learning theory [5] are powerful instruments to get insight into
internal relationships within huge empirical datasets. Therefore, they are able to produce reliable
and even highly accurate forecasts. However, Data Mining algorithms have become more and
more complex over the last decades. In this work, the accuracy of the prediction has the same
importance as the explicability of the model. Hence, only classical Data Mining methods [6] are
applied here.
Instead of using a statistical forecasting technique, some companies use intention to-buy survey
data to forecast sales. Consumer durable product manufacturers often use purchase intention to
forecast. The US government conducts surveys to forecast spending on durable goods. The
survey results are presumed to help predict future sales. A study (Morwitz and Schmittlein, 1992;
Morrison, 1979; Armstrong et al., 2002) found a positive correlation between purchase intention
and purchase behavior. Theory suggests that the best predictor of future behavior is past
behavior. However, some social psychologists believe that a good predictor of what individuals
will do is their intention-to-perform the behavior (Fitzsimmon and Ajzen, 1975). Others suggest
that the intentions as predictors can work only under certain conditions (Armstrong, 1985). The
conditions are.
 That the event is critical in the life of the intender.

 That the intender has the ability to fulfill the plan.
 That conditions which affect the intention do not change.
 That the intender reveals accurate intention.
Despite the lack of reliable data, researchers do try to relate purchase intention to purchase
behavior. Even though these attempts suggest that this relationship is useful, there has been no
research testing the predictive accuracy of intentions and past sales. Lee et al. (1997) found very
little relationship between buying intention and sales.
 Due to the use of improper forecasting techniques, most forecasts give inaccurate results.
In addition to the use of inappropriate methods of forecasting, there are other reasons for
forecasting errors:
 Many forecasts rely on historical data without understanding the underlying basis of the
data. For example, an unexpected jump in sales becomes part of the historical data
instead of being considered as an outlier that may not happen again. .
 Forecasters tend to ignore likely changes that may influence the forecast, e.g. increases in
population, increases in competition, technological changes, etc. Any or all of these
factors may affect the organization’s sales and can easily be included.
 Using inappropriate computational methods for the data. Each type of data (e.g. time
series, cross-sectional data) requires different forecasting techniques. Incorrect
computational techniques cause errors in forecasts. .
 Forecaster bias affects results and should be kept to a minimum. Individual biases as a
result of personal optimism or pessimism have no place in forecasting. Bias increases
error in forecasts.
Forecasting is especially complicated due to the changing economic factors among which any
business operates. The economic factors include the Gross National Product (GNP), the
employment rate, the discount rate, the population growth rate, and others. These economic
factors have major effects on the manufacturers of durable goods. Their relationships are
complicated by the possibility that some of these factors have lagged effects on the sales of
durable products. In addition, sales of many products are affected by seasonal fluctuation. All
these economic factors are relevant when forecasting durable goods, e.g. automobile, sales.
Forecasts can be further complicated by sales promotions and advertising activities. Therefore,
forecasters must be aware of these activities, although it is difficult to time their occurrences and
to track the numbers of these activities. In addition, forecasters must consider activities that may
cause problems in forecasting, and, if data are not available on these activities, forecasters should
at least be cognizant of the problems and take the possibility of inaccurate results into
consideration. Regardless of the difficulties, appropriate statistical models with relevant variables
make for the best results.
An analysis of the sales of consumer durable goods in Britain found that the demand for cars
increased by 90 percent from 1970 to 1978 while the disposable income rose by 21 percent in the
same period (Pickering, 1978). Obviously, this analysis indicates that income alone does not
determine automobile demand. One possible explanation could be that consumers’ durables are
purchased not with current income but are bought with savings or are financed. Further,
consumer spending on durable goods, especially automobiles, can be postponed to accommodate
multiple factors as the useful life of the goods can be extended through repair and maintenance.
In addition, ownership of more than one car is also likely.
These uncertainties make demand for durable consumer goods difficult and challenging to
forecast. Fauvel and Samson (1991) state that spending on durable goods in indeed the most
volatile component of total consumer expenditures. In the USA, consumer durable goods
purchases represent a huge market (e.g. $3,769,235 million in 2003), but not much research has
been done to accurately forecast the sales of these products (Fauvel and Samson, 1991). The
forecast work that has been done relates to new products or intention-to-buy methods. A library
and Google’s Scholar search on ‘‘forecasting demand for durable goods’’ generated 45 items
that mostly dealt with new products. Data for durable goods are hard to find and may be
unreliable. However, due to its importance to consumption and the economy, forecasts of sales
of durable goods such as automobiles are critical.
The automobile industry plays a critical role in many economies. Demand for automobiles also
determines trend for travel and tourism, roads, and patterns of housing (Abu-Eisheh and
Mannering, 2002). That is, the more people own cars, the more they have the ability to travel
and, thus, the higher the demand for more and better roads. The mobility of people also
determines where and when they can locate their houses beyond congested cities, resulting in the
expansion of communities. All of these activities expand economies and create jobs. In turn, the
expansion puts pressure on politicians, urban planners and traffic engineers to be cognizant of
trends in automobile ownership. The demand for automobiles is a critical consumer decision and
is influenced by sociological and economic factors, and automobile ownership affects both
developing and developed countries (Wu, 1965; Abu-Eisheh and Mannering, 2002)
Not much analytical work has been done relating to automobile sales, which account for a large
share of the durable goods market. Studies by Carlson and Umble (1980) and Harris (1986) tried
to forecast demand for automobiles. Carlson and Umble (1980) predicted demand of automobile
from 1979 to 1983 by segmenting automobile into five classifications: sub-compact, compact,
intermediate, standard and luxury. The authors were trying to determine the relationship between
the price of gasoline and other major factors and the sale of cars. They found that the sales of
compact cars grew faster (from 35 to 45 percent) than the sales of other type of cars. They also
established that economic conditions were the major determinant of future automobile sales. The
study also found a relationship between the price of gasoline and the sales of cars. However, the
study was limited to two independent variables trying to forecast sales during a difficult political
period (the oil embargo). Harris (1986) also studied the relationship of some economic variables
to sales of automobiles and found a significant relationship between demand and some economic
variables.
However, forecasting the overall market for automobiles – in units as well as in dollars – is
important to policy makers. Therefore, a comprehensive model relating automobile sales to as
many relevant economic variables as possible is likely to be the most appropriate model (Carlson
and Umble, 1980). Forecasting automobile sales requires inclusive analysis that must consider
personal and relevant national economic factors to achieve good results. This study uses
regression analysis to forecast automobile sales in the US del (Carlson and Umble, 1980; Suits,
1958; Thompson and Noordewier, 1992).
Sales forecasting is particularly important because its outcome affects many functions in the
organization (Armstrong et al., 2000). Based on the forecasting results, business operations may
respond lost orders, inadequate service and poorly utilized production resources in the short
term; or managers may easily adapt with financial issue and make right market decisions so that
the organization may be brought into question in the long-run (Kotsialos et al., 2005). In fact,
most conventional sales forecasting methods used either factors or time series data to determine
the sales prediction (Nguyen and Tran, 2017a; Nguyen et al., 2015; Wang et al., 2015). Grey
system theory, a new research method which was formulated by Deng (1982) to study the
problems of less data, poor information and uncertainty (Liu et al., 2004) is used forecast sales in
this study. It is to make positive analysis about the increasing of the net sales of Nissan Motor
Company from 2008 to 2013, which are used for the prediction of the 4 coming years, 2014 to
2017.
The purpose of this research is to build a Grey model which fits into net sales and to see its
feasibility and how it works in forecasting net sales. Collectively, there is some empirical
evidence indicating that the subjective techniques popular for all types of forecasting situations
are effective. Among the statistical forecasting approaches, exponential smoothing has recently
been gaining in popularity (Nguyen and Tran, 2017b; Tran, 2016; Tran, 2017; Trinh and Tran,
2017). In particular, in recent years, Grey system theory have been applied to many other areas
like during the past several years, the grey system theory has been widely used to explore in
various fields and demonstrated satisfactory results (Kayacan et al., 2010).
Escalation in fuel prices affect on car demand in sales unit Cheng and Tan (2002) mentioned
the sharp oil price is one of the external factors which have a significant influence on Malaysian
inflation in 1973 and 1974; the substantial price increases in 1973 were brought about mainly by
the shortages of food and raw materials arising from bad weather and increased aggregate
demand.
Besides, upon studying on “Why do car prices differ across European countries?”, it points out
that in the situation of cars market in the European, the income tax, oil price, wage and the
standard of livings will affect the willingness of people buying a cars and the ability to buy a car.
For instance, the fuel price will affect the demand of cars in the car markets in countries. Higher
price of fuel, lower the demand of cars in the market. People will prefer using public
transportation rather than using their own cars. And new car buyers will need to think more to
decide buying cars, because high fuel price increases the cost of driving cars on their own. So
price of fuel can affect the demand of car in market directly. On the other way, countries with
high fuel price will lower the people wants to buy a car.
Increase in the income has influence on car demand and car consumption J. M. Dargay
(2001) studies the effect of income on car ownership, and the results indicate that rising income
leads to higher car ownership. Rising income makes it easier for households to own cars. Again
J. Dargay (2007) continues to examine the effect of prices and income on car travel in the UK. It
analyses the factors determining household car travel, and specifically the effects of household
income and the prices of cars and motor fuels. The data shows the diffusion process: motoring
has become more prevalent in successive generations. Car travel is more affected by car
purchase costs than by fuel prices, implying that once obtained, cars are used despite rising
variable costs for their use. On the other hand, car ownership is more sensitive to car purchase
costs than to fuel prices as expected. Thus, car use responds more rapidly to changes in income
and prices than car ownership.
In a study on car demand in European countries also shows that the incomes of people will the
main factor that affects the demand of cars in the market. The main income of people is wages,
so high wages people with higher purchasing powers; they have higher demand for luxury goods,
like cars, sport cars and houses. Graham and Glaister (2002) in survey about the response of
motorists to fuel price changes and an assessment of the orders of magnitude of the relevant
income and price effects. It means that the effect of price on fuel consumption and on motorists’
demand for road travel, and the demand for owning cars in heavily dependent on income. Also
Eltony (1993) uses household data to quantify the behavioral responses that give rise to negative
price elasticities of demand for gasoline. The result recognizes three main behavioral responses
of households in Canada to changes in gasoline prices: drive fewer miles, purchase fewer cars
and buy more efficient vehicles. Wetzel and Hoffer (1982) mentioned factors such as gasoline
prices, styling changes, and demographic changes influenced the price elasticity of demand in
each submarket differently using the disaggregated model. The models suggest that motor fuel
price increases have a significant but temporary impact on consumer demand for the largest
American car. Furthermore, as higher income individuals took delivery of previously ordered
cars early in the model year.
Car Sales Prediction Using Machine Learning Algorithms Machine learning models and
bankruptcy prediction is a paper work which talks about the improvement that takes place in
academics industry with the aid of machine learning algorithms in predicting bankruptcy. This
paper implements the usage of algorithms such as bagging, boosting, random forest and support
vector machine for predicting bankruptcy even before the event occurs and a greater span of
comparative study takes place with the performance of these results with the results of logistic
regression and neural networks [9]. Original Altman’s Z-score variables are used as predictive
variables with addition of extra variables such as the operating margin, sales, growth measures
related to assets, change in return-on-equity, change in price-to-book, and number of employees
based on carton and Hofer(2006).
Explaining machine learning models in sales prediction is a generic manuscript that discusses
about the recent trends of predictive models, real time scenarios in order to gain a deep insight
about buyers and seller’s interaction and the forecasting of sales [5]. Early churn prediction with
personalized targeting in mobile social games is a manuscript that explains Customer churn
.churn is defined by the act of a customer leaving a product for good. This churn are reduced to a
greater extent by following the procedure of mapping the feature with the interest of the
customer and pushing the notifications in order to drag back the customer in to the game .this
manuscript implements the methodologies such as logistic regression for the simple object linear
model ,decision trees for extracting redundancy from features random forest to be used in
various situations .Naive Bayes for generating the models and gradient boosting for its popularity
[4].
Industry big data present situation Physical information system and intelligent analysis will
come, for us to achieve production management and industry transformation and upgrade
provides a new train of thought, today measure of the manufacturing industry level has not only
concerned about the product manufacturing capacity, but for the customer more innovative
value. Professor Lu Wei believes that the nature of the industry's big data is data driven industrial
upgrading, that is, through the big data analysis to stimulate research and development, service
and manufacturing innovation, promote industrial upgrading. Professor Li jie thinks that
enterprises must to know the client also does not speak out, customers speak out that it are not
called problem[4], industrial data analysis is the competitive advantage of industrial
development, industrial data analysis can help Chinese enterprises to reduce manufacturing cost,
improve product quality, mining is not visible to the needs of users. He talked about the Trinity.
With the achievements in the world for the good interpretation of the industrial data analysis, it
will achieve and enhance the competitiveness of the point of view to product value which made
in China [5]. November 2015, the Fifth Plenary Session of the party's eighth plenum proposed to
implement a national data strategy, which is the data for the first time to write to join the party
plenum resolution, marking the data strategy officially rose to national strategy. The fifth plenary
session is opened a new chapter in the construction of large data. In fact in 2015, "big data" issue
is the guest of the executive meeting of the State Council, "big data" strategy as early as the
horizon. July, issued by the general office of the State Council the about the use of large data
strengthen of service and supervision of the main market a number of opinions "put forward to
improve the service level of market main body, to strengthen and improve the market
supervision ,to promote sharing of government and social information resource open, improve
the government information integration platform, eliminate information isolated island, promote
resource data open to the society, it can enhance the credibility of the government, leading the
development of the society, serving the public enterprises. China's information for consumer
market size of the magnitude has rapid growth of three huge. To enhance the network capability,
resident consumption upgrading and the four modernizations to speed up the integration and
development background, new technology, new products, new content, new services, new
formats which continue to inspire new consumer demand and as important means to enhance the
experience of consumer information and data will be in the field of industry get widely
used[6].For the current development of manufacturing industry and traditional manufacturing
phase contrast, traditional manufacturing industry faces great impact, such as in technology,
process design, quality management, production operations, need a big change deal with the
industry under the premise of big data challenge.
Very few papers provide mathematical or quantitative models, especially in a long procurement
lead time situation where forecasts are unreliable. Forecasts are not reliable information about
future demand, but they are the only source of information to procure parts. Due to demand
uncertainty, a safety stock margin may be used. This margin is defined by a percentage that
allows to order more parts than forecasts. For instance, with a safety stock margin of 10% the
system will order 10% more parts than forecasted. Emergency supplies are used in case of
shortages. This reflects the importance of choosing the best forecasting method for the firm in
order to reduce safety stock margin. According to Yang and Baolin the forecasting methods can
be generally divided into two categories, namely, the qualitative forecasting methods and
quantitative forecasting methods. The latter methods include some traditional statistical methods,
such as moving average, exponential smoothing and multiple regression analysis.
In fact, auto sales are affected by many factors, such as the economic situation, state policy, the
income of the family, and so on. These complicated factors cause the remarkable fluctuation and
non-linear characteristics of the historic sales data, so some data don't have trends and display
high fluctuation, to solve the problem, some data mining algorithms are applied to sales
forecasting due to the complexity of sales data. J. Scott Armstrong and Fred Collopy says that
studies have been conducted to identify which method will provide the most accurate forecasts
for a given class of time series. Conclusions about the accuracy of various forecasting methods
typically require comparisons across many time series. However, it is often difficult to obtain a
large number of series.
Error measures also play an important role in calibrating or refining a model so that it will
forecast accurately for a set of time series. According to Paul Dagum et.al Forecasting models
are dominated by uncertainty because salient, observable variables define only a small subset of
relevant variables; unmodeled influences can lead to unexpected consequences in a dynamic
process. In the Casual Forecasting method, forecaster constructs a forecasting model that relates
cost to the internal or environmental variables believed to cause changes in the observed cost. A
model, attempting to unveil the structure and operation of a process that determines our
requirement takes the form of one or more equation, usually statistical in nature.
Chapter-3 Problem Formulation

Problem Statement
Nissan Motor Company deals with a product portfolio that consists of three subcategories
namely cars, light trucks and heavy trucks. The lead time for manufacturing planning of any
subcategory is 12 months. Hence they need a forecast of the total auto sales in the India market
for the next 12 months on a monthly basis. Some examples of vehicles in each subcategory are
shown in the appendix. Both domestic sales and exports are to be forecasted.
Objective:
1. Pre-order various auto parts to make sure production process are streamlined.
2. Make proper investment in production facilities since setting up new equipment requires
time and needs to be planned at least 6-12 months ahead of the requirement.
3. Synchronize growth strategy to be in line with the sales trend across different auto
categories.
4. To efficiently distribute the final products between export and import markets.
The forecast period is next 12 months. The following monthly time series have been forecasted
in this project: Car sales, light truck sales, Heavy truck sales, Car export sales, Light truck export
sales.
Data Collection
Data for our analysis has been sourced from the Open Government Data (OGD). This portal is a
single-point access to datasets, documents, services, tools and applications published by
ministries, departments and organizations of the Government of India.
The original data source contains monthly sales figures for the entire united states of 3
subcategories of vehicles. The period is from 1973 to 2019. Ideally we would have wanted the
data for only Nissan vehicles but it is not available. We have assumed that Nissan management
will adjust the forecasts taking into account their market share for each of the vehicle
subcategories.
Forecasting Methodology:
1. Seasonal Naive to establish benchmark: The data exhibits a seasonality of 12 months.

Thus, seasonal naive forecasting includes forecasting monthly sales using the 12 month
prior values in the same time series. This is considered as a benchmark and subsequent
methods are aimed at improving upon this forecast accuracy.
2. Holts-Winter method as the data exhibited trend and seasonality: Since all data series
consistently exhibit trend as well as seasonality we have chosen to first try out Holt-
Winters method. The training and validation period for all-time series are different, they
have been adjusted to achieve the most accurate forecast.
3. MLR with monthly dummy variables: Next we have used multiple linear regression
using dummy variables for each month. Depending on the characteristics of the series
both additive and multiplicative regression methods have been used.
4. Ensemble combinations improve results: Finally, we have tried combinations of Naive,
Holts-Winters and Multiple Linear Regression to create ensembles that can further
improve the accuracy of the forecasts.
5. MAPE(Mean Absolute Percentage Error) and residual graphs used to judge and
select a final model: Once these methods are applied, their validation period residuals
are plotted together to gauge the accuracy of each of them. The final model for a
particular data series is selected taking into account the residuals plot and the MAPE
values.
Forecast Risk:
1. Economic and Geopolitical Shock: This model cannot predict sudden and unforeseen
shocks. Economic shocks include recessions, market crashes, imposition of heavy import
duties, strict environmental protection laws and spikes in oil prices. Geopolitical shock
would include political turmoil, wars, large scale terrorist attacks and natural disasters.
2. Forecasting sales of individual models: These forecasts only provide the total sales in
the US market. Ford management must adjust these using their market share estimates to
get the actual model wise sales of ford vehicles. There is a risk of uncertainty in the
market share estimates that could make model wise forecasts completely inaccurate.
Chapter-4 Data collection and Analysis

The sourced MS excel file was largely meant as a business summary for managers and
executives rather than data analytics. From this file, the relevant data was extracted and
reformatted into numbers. Furthermore the date and time was properly formatted to be
compatible in XLMiner. The quality of the data itself was good and we did not encounter any
missing values. We have used data post the 2008 recession for training and validation of the
models. This was done to avoid the large and sudden decline due to the recession. This outlier
was causing the models to be inaccurate. The individual training and validation periods for each
time series are mentioned separately below.
We had attempted to use the oil price and India GDP as external predictors to account for certain
trends in our model. However, we observed that these predictors did not improve the validation
forecasts and thus were dropped.
Analysis of Domestic Car Sales:

Models: The data was split into 86 months of training and 24 months of validation. Visual
inspection of the data indicated that the series had a strong trend with seasonality that varies in
magnitude. Hence a Multiple linear regression (MLR) and Holt Winters Multiplicative (HWM)
methods are were considered and a seasonal naive forecast was created to benchmark the
performance of these methods.
Performance: The MAPE for these methods is listed below in the table:
Method Naïve Seasonal MLR HWA

MAPE 6.8% 11.5% 6.168%
a) Different models used in forecasting sales (Split represents break between training and
validation)
b) Residuals of models for forecasting sales

c) Final model for sales with confidence intervals on forecast
An option to create an ensemble was rejected given that the residuals for both naive seasonal
Multiple linear regression produced results which consistently under predicted the values in the
validation period. The final choice for forecasting was Holt Winters method as it presented the
smallest error and the spread of residuals was also even between positive and negative.
Analysis of Domestic Light Trucks Sales:

Models: Analysis has been done with 60 months data i.e. from Jan-2009 to Dec-2013 as training
set and 36 months as validation set i.e. from Jan-2014 to Dec-2016. Forecasting has been done
using multiple methods like Seasonal Naïve, Multiple Linear Regression (MLR), Holt Winters
Additive (HWA) and Ensemble an average of MLR and HWA.
Performance: The following table gives a summary of the results across various models:
Method Naïve Seasonal MLR HWA Ensemble of MLR &

HWA
MAPE 7.498% 3.809% 7.718% 1.904%
validation)

Analysis of Heavy Truck Sales:
Models: The models that were chosen to forecast were seasonal naive, Holts-winters additive,
MLR, and Ensemble. The data clearly showed that Heavy truck sales happened in cycles; up
trending for few years and down trending in others. Hence 15yr historical data was chosen for
analysis. We took 156 data points for training and 36 points for validation purpose.
Performance: The following table gives us the performance of the various models used. As is
clearly visible HWA gave best results and hence was used for forecasting the year 2017 sales.
Method Naïve Seasonal MLR HWA Ensemble

MAPE 19.18% 30.79% 8.32% 13.89%
a) Different models used in forecasting sales (Split represents break between training
and validation)

Analysis of Light Truck Export Sales:
Models: The data for this series has been taken from Jan 2009 to December 2016. The Holt-
Winters method was applied with 7 years of training and 1 year of validation. The Holts-Winter
was run to produce a forecast for the next 12 months. Finally, multiplicative linear regression
method was applied using 11 dummy variables for each month and 2 more predictor variables
for t & t^2. t^2 variable was necessary to capture the stagnation in the export sales during the
year 2015 & 2016. The ensemble model is just the average of Holts-Winter and MLR.
Performance: The following table gives us the performance of the various models used. We
observe that the ensemble model gives the lowest MAPE and the tightest residuals plot (Shown
in appendix). Thus, we choose ensemble as the forecasting model.

MAPE 6.77% 6.88% 10.33% 5.32%
validation)
Analysis of Car Export Sales:
Models: For the analysis of this series of data, data points post December 2008 have been used.
The data has an exponential trend and an additive seasonality. Hence to capture this multiple
regression was used with t, t-square and monthly dummies. The fit seemed well. Holt Winters
Additive Smoothing method was also used; however it didn’t produce good results. An ensemble
of these methods was also tried out.
Performance: The following table gives a summary of the results across various models:

MAPE 19.70% 8.38% 15.6% 11.69%
validation)

Chapter-5 Result & Discussions
1. Car Sales- Forecast
Best Model: Holt Winters Additive (MAPE: 6.16%)
Holts Winters selected and 12 month forward forecast generated
2. Light Trucks Sales

Best Model: Ensemble (MAPE 1.904%)
Ensemble selected and 12 month forward forecast generated
3. Heavy Truck Sales

Best Model: Holts-Winter (MAPE 8.3%)
Holts Winters selected and 12 month forward forecast generated
4. Car Export sales

Best Model: MLR Additive (MAPE 8.4%)
MLR selected and 12 month forward forecast generated

5. Light Trucks Export sales
Best Model: Ensemble (MAPE 5.32%)
Ensemble selected and 12 month forward forecast generated
 Accuracy and fast forecasting based on a few data and poor information: If the forecast is
in error, the plans derived will be in error too. For example, if the managers are overly
optimistic in the coming business, the organization surely suffers great losses over
expenditure once the prosperous sales are not as expected. On the contrary, if the forecast
is too low, it may be very hard for the firm to get the customer orders due to not being
well prepared in providing what the market demands. This certainly causes the company
to forgo profits and give its competitors a good chance of snatching the market share
Once the sales forecast is prepared, it becomes the key factor in all operational planning
throughout the company. Hence, a good forecast must be the sound basis of budgeting.
Financial planning for working capital requirements, plant utilization, and other needs is
based on anticipated sales. The scheduling of all production resources and facilities, such
as deciding labor needs and purchasing raw materials, depends on the sales forecast. The
sales forecast also plays a critical role in sales force planning. The sales forecast helps
sales executives determine the budget for the department; it also influences sales quotas
and compensation of salespeople.
Chapter-6 Conclusions
We forecasted the next 12 months sales for the 3 subcategories with reasonable accuracy and
confidence intervals and performance metrics.
We have not been able to create a function to quantify the penalty of under forecasting due to
lack of data. But given the data of average prices in each segment and assuming that all lost sales
go to competitors we can create a loss function which can provide the financial impact for the
errors in forecast.
It is recommended that Nissan management take into account the forecast confidence intervals
into any production planning exercise. This is because the forecast have an uncertainty
associated with them. It is further recommended that these forecast models be retrained with
each new monthly data point as and when they are available. Hence rolling forecast should be
followed.
The data proved helpful in identifying the reason behind the inaccuracy of sales forecast behind a
wide range of factors, as well as, assessing the impacts of changes on existing norms, processes
etc. It is associated with greater levels of internal validity due to systematic selection of factors.
It has also been observed that overall factors consideration is important in sales prediction.. Error
in sales forecasting is mainly affected by external factors which are not under individual firm's
control. There are many factors which are not under controlled that is why, any organization
should make provision to accommodate changes when occur in external factors while predicting
sales.
REFERENCES for Literature Review
1. Bar-Yam, Y. When Systems Engineering Fails – Toward Complex Systems Engineering
in International Conference on Systems, Man & Cybernetics 2003 Vol. 2 (IEEE Press,
Piscataway, NJ, 2003): 2021–2028.
2. Bohanec Marko, Kanji Borštnar Mirjana and RobnikŠikonja Marko (2016), “ Explaining
Machine Learning Models in Sales Prediction”, Expert systems with applications 000
(2016)1-13.
3. Dagum Paul, Galper Adam and Horvitz Eric, “Dynamic Network Models for
Forecasting”. In Section on Medical Informatics, Stanford University School of
Medicine, Stanford, California 94305, 41
4. Lewandowski, R.: Prognose- und Informationssysteme und ihre Anwendungen. de
Gruyter, Berlin – New York (1974)
5. Lewandowski, R.: Prognose- und Informationssysteme und ihre Anwendungen, Band II.
de Gruyter, Berlin – New York (1980)
6. Berkovec, J.: New Car Sales and Used Car Stocks: A Model for the Automobile Market,
The RAND Journal of Economics 2, 195–214 (1985)
7. Dudenhöffer, F., Borscheid, D.: Automobilmarkt-Prognosen: Modelle und Methoden.
In: Ebel, B., Hofer, M.B., Al-Sibai, J. (eds.): Automotive Management – Strategie und
Marketing in der Automobilwirtschaft, pp. 192–202, Springer, Berlin, Heidelberg (2004)
8. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin, Heidelberg,
New York (1995)
9. Witten, I. H., Frank, E.: Data Mining. Morgan Kaufmann Publishers, San Francisco
(2005)
10. Garcia-Ferrar, A., Dell, J.H. and Martin-Arroyon, A.S. (1997), ‘‘Univariate forecasting
comparison: the case of the Spanish automobile industry’’, Journal of Forecasting, Vol.
16 No. 1, pp. 1-17.
11. Harris, E.S. (1986), ‘‘Forecasting automobile output’’, Federal Reserve Bank of New
York, Quarterly Review, pp. 40-42.
12. Justin, F.T. (1996), ‘‘Consumer buying intentions and purchase probability: an
experiment in survey design’’, Journal of the American Statistical Association, Vol. 61
No. 315, p. 658. Lee, M.,
13. Elango, B. and Schnaars, S.P. (1997), ‘‘The accuracy of the conference board’s buying
plan index: a comparison of judgmental vs. extrapolation forecasting methods’’,
International Journal of Forecasting, Vol. 13 No. 1, pp. 127-35.
14. McAlinden, S.P., Hill, K. and Swicki, B. (2003), Economic Contribution of Automotive
Industry to the US Economy – An Update, Center for Automotive Research, Ann Arbor,
MI, available at: www.cargroup.org/pdfs/Alliance-Final.pdf (accessed 20 December
2007).
15. Mahmoud, E. (1984), ‘‘Accuracy of forecasting: a survey’’, Journal of Forecasting, Vol.
3 No. 2, pp. 139-51. Morrison, D.G. (1979), ‘‘Purchase intentions and purchase
behavior’’, Journal of Marketing, Vol. 43 No. 2, pp. 65-74.
16. Brühl, B., Hülsmann, M., Borscheid, D., Friedrich, C. M., Reith, D.: A Sales Forecast
Model for the German Automobile Market Based on Time Series Analysis and Data
Mining Methods. In: Perner, P. (ed.): ICDM 2009. lncs, vol. 5633, pp.146–160, Springer,
Berlin (2009)
17. Joyce Berg, Forrest Nelson, and Thomas Rietz, "Prediction market accuracy in the long
run," International Journal of Forecasting, vol. 24, no. 2, pp. 285-300, 2008.
18. Bo Cowgill, Justin Wolfers, and Eric Zitzewitz. (2009, January) Using Prediction
Markets to Track Information Flows: Evidence from Google. [Online].
http://www.bocowgill.com/GooglePredictionMarketP aper.pdf
19. James Duncan and R. Mark Isaac, "Asset markets: How they are affected by tournament
incentives for individuals," American Economic Review, pp. 995- 1004, 2000.
20. Alan Hall. (2011) Ford Prediction Market. [Online].
http://media.ford.com/images/10031/FTL_predmarke t.pdf
21. Friedrich Hayek, "The use of knowledge in society," American Economic Review, vol.
XXXV, no. 4, pp. 519-530, September 1945.
22. . Adams, M.N.: Perspectives on Data Mining. International Journal of Market Research
52(1), 11–19 (2010)
23. 2. Asur, S., Huberman, B.A.: Predicting the Future with Social Media. In: ACM
International Conference on Web Intelligence and Intelligent Agent Technology, vol. 1,
pp. 492–499 (2010)
24. 3. Bakshi, K.: Considerations for Big Data: Architecture and Approaches. In:
Proceedings of the IEEE Aerospace Conference, pp. 1–7 (2012)
25. Cebr: Data equity, Unlocking the value of big data. in: SAS Reports, pp. 1–44 (2012)
26. Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: MAD Skills: New
Analysis Practices for Big Data. Proceedings of the ACM VLDB Endowment 2(2),
1481–1492 (2009)
27. Cuzzocrea, A., Song, I., Davis, K.C.: Analytics over Large-Scale Multidimensional
Data: The Big Data Revolution! In: Proceedings of the ACM International Workshop on
Data Warehousing and OLAP, pp. 101–104 (2011)
28. Zhou G, Ou X, Zhang X. Development of electric vehicles use in China: A study from
the perspective of life-cycle energy consumption and greenhouse gas emissions. Energy
Policy. 2013; 59(3):875–84.
29. Company BP. BP statistical review of world energy. London England British Petroleum
Company. 2000.
30. Namdeo A, Tiwary A, Dziurla R. Spatial planning of public charging points using multi-
dimensional analysis of early adopters of electric vehicles for a city region.
Technological Forecasting & Social Change. 2013; 89:188–200.

Big Data Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data Project

Uploaded by

Copyright:

Available Formats

Big Data Management Project

Big Data Application in Media and Entertainment

Big Data in Manufacturing

Big Data in Education

Big Data in Transportation

Big Data in Banking

Automotive also contributes significantly to several important dimensions of nation building,

 That the event is critical in the life of the intender.

Chapter-3 Problem Formulation

1. Seasonal Naive to establish benchmark: The data exhibits a seasonality of 12 months.

Chapter-4 Data collection and Analysis

Analysis of Domestic Car Sales:

Method Naïve Seasonal MLR HWA

b) Residuals of models for forecasting sales

Analysis of Domestic Light Trucks Sales:

Method Naïve Seasonal MLR HWA Ensemble of MLR &

b) Residuals of models for forecasting sales

c) Final model for sales with confidence intervals on forecast

Method Naïve Seasonal MLR HWA Ensemble

b) Residuals of models for forecasting sales

c) Final model for sales with confidence intervals on forecast

Method Naïve Seasonal MLR HWA Ensemble

c) Final model for sales with confidence intervals on forecast

Analysis of Car Export Sales:

Method Naïve Seasonal MLR HWA Ensemble

b) Residuals of models for forecasting sales

c) Final model for sales with confidence intervals on forecast

2. Light Trucks Sales

Ensemble selected and 12 month forward forecast generated

3. Heavy Truck Sales

4. Car Export sales

MLR selected and 12 month forward forecast generated

You might also like