You are on page 1of 18

openSAP SAP Analytics Cloud – Become an

Augmented BI Expert
Week 4, Unit 7: Predict Key Indicators with Smart Predict
Regression

Predict Used Car Prices with Smart Predict


Regression

PUBLIC
TABLE OF CONTENTS
USE CASE ........................................................................................................................................................ 3
THE DATA ......................................................................................................................................................... 3
Create the Training dataset ............................................................................................................................ 3
PREDICTIVE SCENARIO ................................................................................................................................. 6
Settings ............................................................................................................................................................. 7
Train .................................................................................................................................................................. 8
Get insights from the Regression model ...................................................................................................... 8
Overview ............................................................................................................................................................ 8
Influencer Contributions ..................................................................................................................................... 9
Save Predictions ............................................................................................................................................ 10
CREATE A STORY TO CONSUME THE PREDICTIONS ............................................................................. 12

2
PUBLIC

USE CASE

Dan is a car dealer. He wants to have a tool to get a quick and reliable price estimation of the used cars he is
selling.
Dan wants to leverage the used car sale prices data history to estimate the prices of used cars, currently on
sale.
Dan creates a regression scenario based on historical sales prices, to catch the relationships between the
used car characteristics and the corresponding price.
Then with Smart predict, Dan generates a price estimation (prediction) for all the used cars currently on sale.
Finally, Dan builds a story in SAP Analytics Cloud (SAC), to display a selection of cars and their estimated
price, corresponding to the customers' car demand.

THE DATA
Create the Training dataset

The dataset that contains the historical data has in total 111.658 sold used cars described with the following
36 variables.
Variable Name What does the variable correspond to?
OfferID Unique ID for each car offer
shop Flag whether the car was sold via e-Shop or car dealer
price The price that the car was sold
vehicleType Describes the vehicle type, e.g., limousine or sport
gearbox Flag whether the gearbox is automatic or manual
powerPS The horsepower of the car
model The model of the car
kilometer The mileage of the car in kilometers
fuelType Describes the fuel type of the car
notRepairedDamage Flag if the car has an unrepaired damage or not

3
postalCode Describes the postal code of the car dealer and is empty if
the car is sold via e-shop
stylePackagePremiumLine Flag whether the car design is “Premium Line” or not
stylePackageSLine Flag whether the car design is “Sports Line” or not
backupCamera Flag whether the car has a backup camera or not
GPS Flag whether a GPS system is included or not
VoiceControl Flag whether the car has voice control or not
SportSteeringWheel Flag whether the car has a sport steering wheel or not
airCondition Flag whether the car has air conditioning or not
SportSeats Flag whether the car has sport seats or not
adjustableSteeringWheel Flag whether the car has an adjustable steering wheel or not
KeyLessGo Flag whether the car has keyless go or not
RearSeatHeating Flag whether the car has rear seat heating or not
warranty Flag whether the car is sold with warranty or not
effectPaint2 Flag whether the car has this certain kind of effect paint or
not
heatableMirrors Flag whether the car has heatable mirrors or not
stylePackage1 Flag whether the car has this certain kind of style package or
not
stylePackage2 Flag whether the car has this certain kind of style package or
not
leatherSeats Flag whether the car has leather seats or not
InteriorPremium Flag whether the car has the premium package for the
interior design or not
InteriorSport Flag whether the car has the sports package for the interior
design or not
peculiarity_drive_assistant_systems Describes which peculiar drive assistant systems the car has
from zero to 5; the higher the number the more extensive are
the drive assistant systems
effectPaint1 Flag whether the car has this certain kind of effect paint or
not
businesspackage Flag whether the business package is included or not
heatableSteeringWheel Flag whether the car has a heatable steering wheel or not
peculiarity_interior Describes which peculiar interior design the car has from
zero to 5, the higher the number the more exclusive the
design
Age The age of the car

Log on to SAP Analytics Cloud, navigate to the folder My Files\Public\openSAP BI Expert\Week 4


Augmented Analytics\Unit 4.7 Predict Key Indicators with Smart Predict Regression and open the .csv
file used_car_pricing.csv.

The file will be downloaded to your local computer. You will now use this file to create a dataset.
Create a new folder under My Files : My Files\Used Car Prices (feel free to name the folder otherwise if you
prefer).
In this folder, create a new dataset named Used Car Prices using the csv file used_car_pricing.csv from
your local computer.

4
You have now created the dataset.

You can browse the dataset and explore the values of the different variables. On the right panel, click on
Columns and for each column you can see the value details clicking on the “cube” icon.

5
When you are done, do not forget to save the dataset (notice it has been created in Edit mode).

PREDICTIVE SCENARIO
Click on Predictive Scenarios in the left hand-side panel.

Then create a new Regression model.

6
Save the scenario in the folder My Files / Used Car Prices and name the Predictive Scenario Used Car
Pricing. Click on the OK button.

It is now time to setup this Predictive Scenario.

Settings

Complete the Settings panel as follows:


• First set the Description as Used Car Prices Estimation
• In Training Data Source field, you specify the dataset Used Car Prices to train the predictive
model.
• The used car price is the value to predict that you set in the field Target and corresponds to the
variable price in the training dataset.
• Excluding variables can speed up the execution process but keeping them will not interfere with the
modelling process. In our example OfferID has no influence on the target, thus we will exclude it. Tip:
You can exclude the variables that have no influence on the target from the modeling process.
• (optional) You can click on Edit Column Details and check the variable meta information used for
the data processing. The variable price, the predictive target, must have Continuous for the
Statistical Type.

7
Train
Once the regression model has been set, click on the Train button at the bottom right.

The status pane at the bottom shows the model creation progress. It shows the status Trained when the
regression model is created.

When creating a regression model, the data analysis is sensitive to the data order, so it’s possible that you
might get slightly different results.

Get insights from the Regression model

Overview

The Overview panel displays the essential information:


▪ Two performance indicators:
o The RMSE (Root Mean Square Error) indicates the accuracy of the model, the lower the
better.
o The Prediction Confidence measures the ability to get the same prediction quality for a
new dataset.
▪ Target statistics on the training dataset.
▪ The contribution of the five variables that influence the target the most.
▪ And the predicted values versus the actual values.

8
Influencer Contributions
This panel displays the influence of the variables on the car price estimation:
• The section Influencer Contributions indicates what are the most important car characteristics for
the estimated price.
• The section Grouped Category Influence is interactive: for a given variable it indicates precisely
how its values (or categories) influence positively or negatively the estimates price.
• The section Grouped Category Statistics shows the frequencies of these groups of values for the
car price value it influences the most. You can see for which car price value which variable group of
values influence the most.

9
Save Predictions
You can see that the performance of this predictive model is good:
• with an acceptable average error (RMSE = 2632, can be approximated as an average error of 2632
euros on the price estimation) compared to the car prices (average price about 8500 euros with a
large standard deviation of 7000)
• and a good prediction confidence (close to 100%).

This predictive model is a generated algorithm that can replicate the data patterns learnt during the training
phase, on new data to generate the predictions of the defined predictive target.
You can now apply this generated regression model to new sets of similar car characteristics to get the
corresponding estimated prices. In real cases you apply the model on a new dataset with the same set of
variables but without the target variable. For simplicity you are going to apply for the same set of cars used
for training.

10
In the top bar, click on the Apply button.

Fill in the dialog like this:

• The Data Source selector points to the dataset with the used car characteristics (same variable list
as the dataset used as input) that you want to estimate the price. Here this is Used Car Prices.
• Replicated Columns is the list of the variables you want to see in the output dataset with the
predictions. At least it should contain the OfferID to be able to know to which car an estimation is
associated with. In this case you select the variables that you are going to use in the story with the
predictions, namely OfferID, kilometer, Age, vehicleType, model, price.
• In Statistics & Predictions you select the Predicted Value (price).
• Output As contains the name of the dataset that will be created to receive all the resulting data.
Choose the folder My Files/Used Car Prices and name the dataset Used Car Estimated Prices.

Click on the Apply button.


The Predictive Models pane at the bottom shows the progression and status “Applied”:

11
Once the model has been applied, you can go to folder My Files/Used Car Prices, and you can click on the
generated dataset to open it.

CREATE A STORY TO CONSUME THE PREDICTIONS


The next steps consist in showing the predicted values in a story.
The story you will build is as follows:

Create a new Canvas story and select the dataset Used Car Estimated Prices you have generated just
before. Select the Classic Design Experience.

Do not forget to save the story as soon as possible – give it the name Used Car Price Estimation.
First create two input controls respectively based on the dimensions Model and vehicleType:

12
Adjust the vertical size so that all values are visible and rename the input control to Model.
Make sure the options below are unselected.

13
The input control will look like this

Now you should do the same for the input control based on vehicleType. Name it Type.

Create two input controls based on the measures Age and kilometer, following the instructions below:

14
15
Adjust the size to display properly the values slider, hide the unnecessary details and rename the respective
titles to Age and Kilometer:

Then under these input controls, insert a table to display the predictions:

Select Insert / Table, in the Builder provide the following settings for the table:

The resulting table should look like follows (filter on model = Best Drive Car1 and vehicleType = kleiwagen):

16
You can now make selection in the different input controls and see in the table the corresponding used car
with the generated price estimation (Smart Predict predictions). Well done!

Thanks to Sarah Detzler for the use case idea (see the blog here https://blogs.sap.com/2020/03/31/hands-
on-tutorial-sap-smart-predict-used-car-pricing/).

17
Coding Samples
Any software coding or code lines/strings (“Code”) provided in this documentation are only examples and are not intended for use in a production system environment. The Code is only intended to better
explain and visualize the syntax and phrasing rules for certain SAP coding. SAP does not warrant the correctness or completeness of the Code provided herein and SAP shall not be liable for errors or
damages cause by use of the Code, except where such damages were caused by SAP with intent or with gross negligence.

www.sap.com/contactsap

© 2022 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.

The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.
National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable
for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements
accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality
mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platform directions and functionality are
all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation
to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are
cautioned not to place undue reliance on these forward-looking statements, and they should not be relied upon in making purchasing decisions.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other
countries. All other product and service names mentioned are the trademarks of their respective companies. See www.sap.com/trademark for additional trademark information and notices.

You might also like