You are on page 1of 7

Comparing Models

 Here is the chart depicting all model scores:


Type Model Name Private Public Private_Rank In_top_private% Public_rank Public_top%

ML Model
M0.1 5.09 5.16
M0.2 2.67 2.84

M3.1 0.91 0.86 1980 35.62%


M3.2 0.91 0.94
M3.3 0.85 0.91
M3.4 0.89 0.94
M3.5 0.94 0.86
M3.6 0.87 0.88

M4.1 4.9 4.99


M4.2 4.9 5.29

M5.1 0.84 1.26


M5.2 0.95 1.11
M5.3 5.05 4.93

M7.5 0.75 0.83 865 15% 4347 78%


DL Model
NN M3 1.12 0.99
NN M4 0.82 1.21 1933 34%
NN M5 1.90 1.70
NN M6 3.58 4.30
Comparing Models
Explain all models
We have tried 18 strategies for ML and 4 strategies of DL.

The strategy names we use for ML are -


M0.1, M0.2, M3.1, M3.2, M3.3, M3.4, M3.5, M3.6, M4.1, M4.2, M5.1, M5.2, M5.3, M7.5

The strategy names we use for DL are -


Neural Network M3, Neural Network M4, Neural Network M5, Neural Network M6

For ML models till 5.3 the data was split by store then the model was fit. In 6.1 we tried to fit one
model to entire data without splitting it. In 7.5

Also, these are the basic columns we use in every ML Model we tried-
id, item_id, dept_id, cat_id, store_id, state_id, d, unit_sale, date, day_of_week, month_no,
day_of_month

These are extra columns we added to few models to check if we can get better results -
event or not, snap or not, sale price, total sale(unit sale * sale price),
moving average of sale price for 7,14,30,60,180 days.
day of month, date,
moving average of unit sale for 7,14,30,60,180 days.

These models had only the basic columns and extra columns were not used for them.
ML 0.2, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 5.1, 5.2, 6.1, 7.5 .
DL Model 3 uses LSTM and raw data of sales_train_ev and sales_train_val.
Model 4 uses LSTM and raw data of sales_train_ev and sales_train_val.
Model 5 uses LSTM and raw data of sales_train_ev and sales_train_val.
Model 6 uses Long Form and Dense Layers only and all the columns including extra columns.
Comparing Models
Machine Learning Strategies
By default all Machine Learning Strategies were tried on Long Form and was not treated as Time
Series Problem. The decision of not handling the problem as Time Series was taken after careful
considerations.
Strategy 0.1
In this strategy we train on one year data only.
For eval data we train on (1941-365) to 1941 and predict on 1941 to 1969.
For val data we train on (1913-365) to 1913 and predict on 1913 to 1941.

However we use four models to predict 28 days. First model predicts week1, second predicts week1
and week2, third predicts week1, week2, week3 and fourth model predicts week1, week2, week3,
week4. Then we append 0 s to make all these predicted lists same length and then we take element wise
mean to make the final prediction.

Though a good concept, it did not do well during submission.


Strategy 0.2
Only difference from 0.1 model is that this one fits entire val data and previous one took only one year
of val data.

Strategy 3.1
This strategy takes only columns added by feature_engineer() function, no more than that.
For evaluation data it fits on only one year of data (1576 to 1941) and predicts 1942 to 1969.
But for validation it fits on entire data till 1913 and predicts 1914 to 1941.
Strategy 3.2
This strategy takes only columns added by feature_engineer() function, no more than that.
For evaluation data it fits on only one year of data (1576 to 1941) and predicts 1942 to 1969.
For validation again it fits on one year data 1913-365 to 1913.
Strategy 3.3
This strategy takes only columns added by feature_engineer() function, no more than that.
For evaluation data it fits on three years of data (1941-365-365-365 to 1941) and predicts 1942 to 1969.
For validation again it fits on three years data 1913-365-365-365 to 1913.
Strategy 3.4
This strategy takes only columns added by feature_engineer() function, no more than that.
For evaluation data it fits on entire data (0 to 1941) and predicts 1942 to 1969.
For validation again it fits on entire data 0 to 1913.
Strategy 3.5
Comparing Models
This strategy takes only columns added by feature_engineer() function, no more than that.
For evaluation data it fits on two years data (1941-365-365 to 1941) and predicts 1942 to 1969.
For validation again it fits on entire data 0 to 1913.
Strategy 3.6
This strategy takes only columns added by feature_engineer() function, no more than that.
For evaluation data it fits on three years data (1941-365-365-365 to 1941) and predicts 1942 to 1969.
For validation again it fits on entire data 0 to 1913.
Strategy 4.1
In this strategy we add all the columns we can add. Then we fit on one year of data only, both for eval
and val data.
For eval we train on 1941-365 to 1941, we test on 1941 to 1969.
For val we train on 1913-365 to 1913 and we test on 1913 to 1969.
Strategy 4.2
Only change from 4.1 is that entire val data is fit.
Strategy 5.1
Instead of LGBRegressor we try XGBRegressor. We fit one year data for eval and entire data for val.
We don’t add more features than added by feature_engineer() function.

Strategy 5.2
Instead of LGBRegressor we try XGBRegressor. We fit entire data for both eval and val. We don’t add
more features than added by feature_engineer() function.
Strategy 5.3
Instead of LGBRegressor we try XGBRegressor. We fit entire data for both eval and val. We all the
features we can.
Strategy 7.5
We fit an LGBM model on entire data separated by dept id. As this is the best model I got, it will be
discussed in detail later. .
Comparing Models
DL Strategies
Model 3, 4, 5 uses Multi-Step LSTM - Vector Output Model.
Model 6 uses Dense Layer based model on long form data separated by dept id.

DL Model 3
We did not transform the data into long form. The sales_train_ev.csv and sales_train_val.csv were fed
directly and the model was LSTM based.

In this training strategy we dont go by store_by_store, we decided to split products into departments
instead as same products may behave same across stores, this can be helpful for LSTM models as they
depend on patterns.

However, entire data was used to do the training both for eval and val csv. This can be changed to try a
different approach where we use only 1 or 2 yr data as LSTMs are not good at remembering long
sequences.

Our data split is like this: our model is created to take 1885 days input and 28 days output for eval csv
we use 28 to 1913 as xtrain and 1913 to 1941 as ytain, then we predict on 56 to 1941 for val csv we use
0 to 1885 as xtrain and 1885 to 1913 as ytain, then we predict on 28 to 1913

so both the times we are predicting 28 days and it is 1941 to 1969 for eval and 1913 to 1941 for val.

df - can be sales_train_ev or sales_train_validation dept_id - can be any of the 7 dept products belong to
as ['HOBBIES_1', 'HOBBIES_2', 'FOODS_1','FOODS_2','FOODS_3',
'HOUSEHOLD_1','HOUSEHOLD_2', ]

epoch_no - we keep changing it because different dept requires different model, different target_loss
and different epoch no

model - we might use different models for different departments

DL Model 4
We dont transform the data into long form, we just take the sales_train_ev.csv or sales_train_val.csv
and split it by dept and train model on it. Our model will be LSTM based.

For Eval data Xtrain is 1548 to 1913, ytrain is 1913 to 1941, X_test is 1520 to 1941
For Val data Xtrain is 1548 to (1913-28), ytrain is ((1913-28) to 1913), X_test is ((1913-365) to 1913).

DL Model 5

We dont transform the data into long form and the models are LSTM based.
Comparing Models
Eval data
xtr_from, xtr_to, ytr_to = 116, 453, 481 we train model 1 on this data
xtr_from, xtr_to, ytr_to = 481, 818, 846 we train model 2 on this data
xtr_from, xtr_to, ytr_to = 846, 1183, 1211 we train model 3 on this data
xtr_from, xtr_to, ytr_to = 1211, 1548, 1576 we train model 4 on this data

xte_from, xte_to = 1604, 1941 we predict on this data with each model
and take mean of the predictions as our
final prediction. same is done for
validation data.

Val data
xtr_from, xtr_to, ytr_to = 88, 425, 453
xtr_from, xtr_to, ytr_to = 453, 790, 818
xtr_from, xtr_to, ytr_to = 818, 1155, 1183
xtr_from, xtr_to, ytr_to = 1183, 1520, 1548

xte_from, xte_to = 1576, 1913

DL Model 6
We converted the data into long form as we did in ML models, then we fit dense layer based model and
try different models on different departments.
Comparing Models
Models I would like to try as future improvement:
Strategy 1

Training strategy used: Train on data till 1829, predict on 1829 to 1857. Now for this prediction we will
use 4 models, each model will predict one week of the 28 days of test data. This predicted data of
X_test will be exchanged with true label next time, so we will be training on our prediction next next
time.

This is repeated for next test sets as described below.

Train on data till 1857, predict on X_test of 1858 to 1885. Now for this prediction we will use 4 models,
each model will predict one week of the 28 days of test data.This predicted data of X_test will be
exchanged with true label next time, so we will be training on our prediction next next time.

Train on data till 1885, predict on X_test of 1886 to 1913. Now for this prediction we will use 4 models,
each model will predict one week of the 28 days of test data.This predicted data of X_test will be
exchanged with true label next time, so we will be training on our prediction next next time.

Train on data till 1913, predict on X_test of 1914 to 1941. Now for this prediction we will use 4 models,
each model will predict one week of the 28 days of test data. This prediction is ultimate prediction for
this store.
Strategy 2
This method is fairly complex, it did well while training but submission scores were extremely bad.

Training strategy used: Train on data till 1829, predict on X_test of 1829 to 1857. This predicted data of
X_test will be exchanged with true label next time, so we will be training on our prediction next next
time.

Train on data till 1857, predict on X_test of 1858 to 1885. This predicted data of X_test will be
exchanged with true label next time, so we will be training on our prediction next next time.

Train on data till 1885, predict on X_test of 1886 to 1913. This predicted data of X_test will be
exchanged with true label next time, so we will be training on our prediction next next time.

Train on data till 1913, predict on X_test of 1914 to 1941. This prediction is ultimate prediction for this
store.

You might also like