You are on page 1of 56

Seco Analytics

A thesis project for Seco Tools.


Independent Project in Sociotechnical Systems Engineering
IT Systems (1DL931), Spring term 2019

June 29, 2020

Authors: Supervisors:
Samuel Dahlback Course Supervisor Georgios Fakas
Gustav Kruse Teaching assistant Georgios Kalamatianos
Albin Åbrink Company Supervisor Stefan Östlund
Lotta Åhag Company Supervisor Mikael Lindholm

1
Abstract

Forecasting is a powerful tool that can enable companies to save millions in revenue every year if the

forecast is good enough. The problem lies in the good enough part. Many companies today use Excel to

predict their future sales and trends. While this is a start it is far from optimal. Seco Analytics aim to

solve this issue by forecasting in an informative and easy manner. The web application uses the ARIMA

analysis method to accurately calculate the trend given any country and product area selection. It also

features external data that allow the user to compare internal data with relevant external data such as

GDP and calculate the correlation given the countries and product areas selected. This thesis describes

the developing process of the application Seco Analytics.

2
Contents

1 Introduction 7

1.1 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Project description 8

3 Socio-economic aspects 8

3.1 Target group and potential users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Why a platform? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.3 Our platform, Seco Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.4 Financial benefit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.5 Competitive analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Design 14

4.1 Seco Analytics Logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.2 System design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.2.1 Sitemap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2.2 Design draft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.3 The site from a user perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.3.1 Admin and superuser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.3.2 Graphical design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.3.3 Incorrect use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Method 19

5.1 Project plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.2 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.2.1 Facebook messenger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3
5.2.2 Slack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.2.3 Trello . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.2.4 Google Drive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.2.5 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.3 Team management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

6 Analysis 22

6.1 Course of action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6.2 Different kinds of machine learning methods . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6.3 Analysis software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6.3.1 Rapidminer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.4 Forecasting models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6.5 Analysing correlation to external data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.6 Selection of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

7 Database design 29

7.1 SQLite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

7.2 SQLiteStudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

8 Technical solutions 31

8.1 Development software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

8.1.1 Github . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

8.1.2 Django . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

8.1.3 Spyder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

8.1.4 Atom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

8.2 Programming languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

8.2.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4
8.2.2 HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

8.2.3 CSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

8.2.4 JavaScript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

8.3 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

9 Evaluation 35

9.1 Test plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

9.2 Test procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

9.2.1 Test manuscript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

9.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

9.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

10 Discussion 37

10.1 Process over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

10.2 Lessons learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

10.3 Further development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

10.4 Hand over . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Appendices 43

A Contributions 43

A.1 Albin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

A.2 Gustav . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

A.3 Lotta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

A.4 Samuel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

B SQL database using SQLiteStudios for the analysis 46

C Master thesis description 49

5
D Design draft 51

E Design 52

F Usability testing 54

F.1 Informed Consent Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

F.2 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

G Project plan 56

6
1 Introduction

Forecasting is an essential part of maintaining a balanced product portfolio and a slim production line

adjusted after the market needs. Knowing where the market is heading is therefore important to maintain

a slim organization and knowing in which products to invest in terms of research or production. Predicting

market trends can be one of the most difficult tasks for companies [22], and with new digital tools, market

movements can be predicted with greater accuracy.[14]

Machine learning is a modern method that allows software applications to become more precise and

accurate to predict outcomes without being explicitly programmed. The basic structure of machine

learning is to build algorithms that can receive input data and use statistical analysis to predict an

outcome and update outputs as new data becomes available. The process of predictive analysis are

applied on the platform “Seco Analytics”, and with machine learning the forecast becomes more accurate.

The analysis require searching through market data to detect patterns like seasonality, trends and noise.

The program is being adjusted accordingly.[17]

There are many forecasting tools available on the market today, and the choices for someone looking

for an analysing tool are almost infinite. Meanwhile bad IT-systems are causing costs of 14.5 billion

Swedish kronor (SEK) each year, in Sweden alone. In general 30 minutes a day for each worker are being

wasted.[30] Making an analysing and forecasting website that pleases everyone would be an impossible

task. Instead we are recognizing the vast variety in order to tailor a solution for a specific kind of need.

The Seco Analytics platform is not a solution that fits all companies who are having problems with their

forecast, but rather focus on pleasing a certain kind of user at a certain kind of company. The product

targets the critical problems and finds an easy solution for the Seco organisation to use. While machine

learning is rather new for the Seco organisation this product will work as a “pilot project” to prove that

new technical data solutions can be used to fulfill needs and creating value through the value chain.

7
1.1 Glossary

Table 1: Glossary of frequently used expressions

Seco Analytics Referring to the Seco Analytics site


Seco Analytics team Referring to the Seco Analytic developers.
Admin page The Admins own page where the admin can oversee all the users activity
and regulate usability.
Registered User Users who are logged in to their accounts when they are using the Seco
Analytics webpage.
Admin User User who have permission to access the admin controllers.
User Anyone that uses the webpage.
Login Page The page where to log in.
Forecasting Page The “main” page, where the user can forecast over time data series.
View and edit Projects Page The page where the user can view and edit projects.

2 Project description

Seco Tools announced a master project in order to analyse market trends in relation to internal and

external data. The aim is to provide a good basis for more efficient product portfolios and product

development by analyzing how such work should be carried out and what type of work should be prioritized

in order to be as effective and precise as possible. The full project description is in Appendix C.

The project has then been developed and adapted to a bachelor project to create a tool for sales prediction

by using existing API’s for machine learning. The final product is a webpage which interacts with data

and present it in an interactive manner. The goal is to give the market department a better tool to

predict its sales outcome to know where to compete and where to reduce market share, but also to give

the production units a better tool to predict its production to know what to produce and where to

produce it.

3 Socio-economic aspects

3.1 Target group and potential users

Unlike other web applications, used occasionally by lots of people with a wide diversity of knowledge, our

users are assumed to have previous knowledge of similar systems and how to use it in a proper manner

8
to get their desired results. The typical users of Seco Analytics are employees at various departments

at Seco Tools that work as either sales representatives or production planners as the system aims to be

able to make predictions of future sales figures that is tightly tied to the production. The forecasts are

tentatively meant to be used by the Product Management department, but also among employees that

works for Officially Marketing and Product Lines. The system should have a low barrier of entry to use,

as this will make it more accessible for the company as a whole and make it easier to incorporate in many

parts of the company. However, the users are expected to have a valid foundation of decision making in

future investments as well as being familiar with similar programs for analyzing data.

Seco Tools AB is a company owned by the Sandvik group that specialises in metal cutting tools for milling,

stationary tools, holemaking and tooling systems.[26] Seco Tools is present in 75 countries worldwide and

have their headquarters in Fagersta, Sweden. Seco Tools is a company that is well established on the

market of metal cutting tools, and is looking for strategies to increase their revenue and decrease their

in-house spending.

One of Seco Tools most important processes is to put customer value in focus. Both strategic and tactical

implementation over time are of great value. This is made by measuring the effectiveness in a large variety

of different product portfolios. In order to know where to compete, utilizing complex analysis methods and

strategic decision-making can drastically improve the development possibilities and direct the company in

an advantageous direction. A necessary condition is to have a high quality platform for decision support

regarding trends, key figures and forecasting between different segments and customers.[Östlund, Private

Communication, 2019-05-24]

3.2 Why a platform?

At present, general prediction is centralized with a few key resources. The general process is reported in

Figure 1. The market analysis is mostly done by using spreadsheets such as Excel or statistical programs

which is a time-consuming process that gives a great deal of uncertainty. First, the responsible resource

must collect sales data from a server. The data is transferred to an Excel sheet where it is processed

and calculated. From there you can manually draw a conclusion. This is a knowledge-demanding process

that requires extensive experience. Predictions must then be conveyed in a correct and informative

manner, provided that no information is lost. This are distributed through email, meetings and the

intranet.[Karjalainen, Private Communication, 2019-05-24]

9
Figure 1: Forecast, analysis and external correlation process. No machine learning is being used.

10
In the forecast and trends department they use some statistical programs to find forecasts for product

sales and regions. The exact models used are not known but they are based in traditional statistics.

The system calculates a forecast which in combination with the bearing size per Distribution Center and

supply lead time (eg how long it takes to produce a product) determines how large the bearings will be

and how much and when to produce.

Seco have a varied order intake. A rolling average will be chosen, but there are considerably more

advanced functions. The system tests different models and chooses the best model with the least forecast

error. Excel is mainly used to find external correlations and to visualize the results in communication

aspects. The analyses are mediated through meetings and consensus principles where all managers agree

on whether the results seem reasonable or not.[Östlund, Private Communication, 2019-05-24]

3.3 Our platform, Seco Analytics

Seco Analytics solves the problems presented in section 3.2. It is easy to use as the user does not need

to have a deeper understanding of programming or analysis. The user only needs to fill in the desired

prediction. The threshold for starting to use the platform is thus lower than how analyses were previously

done within the organization. There is thus a potential to decentralize certain processes and make the

organization less vulnerable and dependent on a few key people. Analysis and decision-making are

considerably faster with the help of the platform. Seco tools consultancy branch, the company’s business

area for sales of services and technical solutions, have the potential of creating value with the customer

through the use of a forecasting tool. Partly by giving recommendations based on how different market

segments will go and by helping customers make decision based upon forecasts. The platform can also

be sold as a packaged product and adapted to Seco Consultancy customers, in order to make forecasts of

their market data. An overview of why Seco Analytics is needed for the company is visualized in Figure

2.

11
Figure 2: How to fill the forecasting need of the Seco Tools organization

3.4 Financial benefit

By utilizing information from forecasts of market trends, one can draw conclusions of what actions

should be made to get better sales outcomes. Such information tells where to compete and where to

reduce market share, but also what to focus on in production and where to produce the products around

the world. Seco Analytics forecasts will therefore be of financial benefit and make value to the company

as it improves business related decisions. It could also reduce personnel costs as the system provides

all relevant forecasts gathered in one system. This increases the effectiveness when investigating future

trends and reduce errors compared to manual analyses.

Figure 3: The web based solution and the financial benefited users

The information that can be utilized from forecasts can also be of great value to customers. The service

provided can forecast production outcome for customers and their production trends. By the use of specific

12
customer data the program can predict how good a specific customer is doing within a specific segment,

what kind of products connected to a specific production cycle is selling more and how the segment in

general is doing. This creates data driven value for all the states within Seco and its customers. An

overview of the web based solution and the financial benefited users is in Figure 3.

In general, 88 percent of the office workers in American industry companies uses calculation programs as

Excel as work arounds for bad internal IT-Systems. This means that a lot of value invested in business and

production planning software are being wasted. The programs available is not tailored for the companies

specific needs. The competence within the companies on how to use and maintain the products are also

a great concern.[7] The financial benefit which lies in “Just in time” is an important strategic question for

a lot of companies in the industry sector. In order to reduce the value locked in goods it is important to

produce the products just in time for them to be shipped off to the specific customer. To forecast when

to sell the product a company can reduce costs on the logistics side by up to 30 percent.[25]

3.5 Competitive analysis

There are plenty of other platforms to use when it comes to predictive analysis. Today, Seco is using

Excel. Seco Analytics however is different in the sense of what kind of data and algorithms are being

used. The data is being directly fed into the platform and reduces the amount of repetitive labour. It is

easier to use for the adjusted forecasts. Because of this difference, the usability for Seco Analytics is not

the same as for Excel.

There are still other competitors to the Seco analytics platform, like Qlik, PowerBI and R. These three

analytics tools together with Excel are the main focus for the following SWOT analysis seen in Table 3.5.

Strengths Weaknesses
Proof of concept that Seco can use machine No maintaining knowledge Hard to continue develop the
learning within the company. Show that a platform within the company No interactive plot Relatively
platform can be developed within the com- hard to import datasets Limited amount of internal data
pany The product creates value direct Tailor Only one machine learning algorithm Limited amount of
made for Secos specific needs external data Only checks external correlation the same
dates
Opportunities Threats
Chance to create value with better forecasts Emerging competitors within the machine learning sector.
on other departments and companies Few Changed customer attitude and needs towards seco consul-
competitive tools being used in this area tancy
Emerging need for this kind of product and
service

13
4 Design

4.1 Seco Analytics Logo

A logo was created for the project which can be be seen in Figure 4, inspired by the logo for Seco

Consultancy.

Figure 4: The logo created for Seco Analytics.

4.2 System design

An overview of system design is visualized in Figure 5. The employee has to be authorized and registered

by the admin to login on the webpage. The employee sends a request on the webpage, which directs the

request to the server which in turn extract the right data from the database. The result obtained from

the server is then returned to the user through the webpage. The admin is the only one who can update

the content in the databases.

Figure 5: The system design

14
4.2.1 Sitemap

The web site consists of 3 pages in which the user can login, choose project to work on and forecast sales

trends of the chosen branch for specific countries and product management areas. Only the administrator

can view, edit and create new users and projects. Figure 6 shows an overview of the site map.

Figure 6: Site map of Seco Analytic

As Seco Analytics is an internal tool within the Seco Tools group that contains sensitive data, the program

requires the user to log in with a valid username and password. To minimize the risk of unauthorized

users accessing the data, the users has no ability to create an account or reset the password. The only one

who can create or change user settings is the admin, whom the staff must personally contact to change

settings. Every user must also apply to get access to the projects they want to work on. When the admin

grants the application, one or more data segments will be enabled for the user to analyse. A segment can

for example be aerospace, automotive and medical.

4.2.2 Design draft

Initially two types of mockups were presented to Seco, one with a more modern interface, with multiple

choices on different screens which causes more hidden information, as well as one similar to the many

older analysis programs. Seco found our second alternative more appealing and therefore we chose to

use our second alternative as a base, but visualizing it in a more modern manner. The first mockup also

contained a project page as well as a page to edit user information. These were not used in our final

version as it turned out to be unnecessary and ineffective due to security issues. The first mockup is

shown in Appendix D.

The design for where the project aims is partly designed from the meetings that the group had with

15
project managers at Seco. Therefore, the project plan has changed a lot during the project, since new

requests for functions have been added gradually. This means that both the graphic design and associated

functions have been a work in progress to a late stage in the project.

4.3 The site from a user perspective

Users of the page are capable of choosing product areas and countries and create a graph that forecast

sales a chosen time period forward. The user should be able to choose any number of product areas and

countries at the same time. Then the user should also be able to compare the graph to other data and see

correlation between the sales data and external data given by a value of Pearson’s correlation coefficient

which is further described in section 6.5.

The user interface of Seco Analytics is planned to be user friendly, with all necessary information displayed

in one main page so that the user does not lose any information along the way. The main forecasting

page therefore has all choices for the forecast available in the left panel to easily change the settings of

the forecast, which can be seen in Appendix E.

The panel consists of 3 labels seen in Appendix E where the user choose external data, customer country

and product area to forecast on. When the user pulls the pointer over these, a scroll down selection

appears. Multiple choices are possible for customer country and product area, and the program always

displays one forecast and one correlation with external data. In the scroll downs the user can choose

toggle all to mark and unmark all boxes. The user must also choose desired time interval in months and

in which unit the forecast should represent. These unit options are orders in Swedish kronor (SEK) or

orders in quantity.

The “Submit”-button performs the forecast, and will also reset all previous choices of data. Whether the

correlation or forecast should be seen, is chosen in the two clickable tabs above the figure. The user can

also choose between returning to the segments site or logout by clicking on the buttons in the upper right

corner of the page. When a user closes the webpage all performed forecasts becomes discarded and no

settings the user has used on the page will be saved. Screenshots of all the pages can be seen in Appendix

E.

16
4.3.1 Admin and superuser

The admin of the page must be acquainted with many facets of programming and web development, as

the project contains the programming languages SQL, CSS, HTML, JavaScript and Python and the five

different languages are used for different purposes. To be able to expand the page, the same qualities are

desired as well as knowledge of data mining, machine learning and different forecasting models. Admin

has access to the admin page where users can be created and removed(see Figure 7). The admin page

also allows access to the Models created in Django, and allows the admin to change the content of these

models. This means that the admin can change the available variables on the site (more information in

section 7).

Figure 7: This shows how the admin page looks like

4.3.2 Graphical design

The main purpose is to forecast and analyse data as a proof of concept to Seco Tools, where new functions

has been added after time. Therefore, graphical design has been a secondary issue, not prioritized in

order to have time for other important parts of the final product. The graphical design is attractive

towards its simplicity, with few functions and use of terminology that is known to an employee at the

company, although it is not particularly attractive to work with. As the product is supposed to be a

proof of concept, the finish of the graphical design has not been in focus which is a deliberate choice by

the group.

The forecast is an image that is saved from the script. The image is not interactive, but updates the title

and labels for y and x-axis depending on what forecast or correlation that has been selected. Background

colour, lines and forecast shadow has been chosen to highlight and explicate the most important parts

of the graph, but also to fit in with Seco Tools company colours. The external data is plotted in two

separate graph as the data is measured in different units and ranges. This gives two separate y-axes with

17
shared x-axis for time. Both plots could have been merged into the same graph if the value for each

datapoint had been converted to the same unit, but was not prioritized.

The site is developed in multiple ways to make it easier to understand and use. All buttons changes

colour when hovering over them and the pointer switches to a hand pointer. The drop-down selection

also get darker background when hovering over the selections. One can toggle boxes in the drop-down

lists by clicking anywhere on the row. This is a simpler way for the user to pick the intended variables

as you do not need to click directly into the checkbox. The options are highlighted to prevent misclicks.

4.3.3 Incorrect use

When using the analysis tool there are some requirements for the backend algorithm to work properly.

One such thing is that the datasets have to match with each other. If you select a country and a product

area without any matching sale orders the algorithm will not be able to compute any data points. To

prevent this we have implemented an error detector that allow us to catch it and display an error message

telling the user to select more data in order to get a good analysis. Another algorithm limit is that it

require the months to be positive. You cannot scroll to a negative value but you can type it in by hand.

In case such a problem arises the same error detecting system as mentioned above will detect it and send

back the same error message, as in Figure 8.

Figure 8: Displaying the error message.

When you use the submit button the cursor will turn into a loading symbol. This is to prevent the user

from spamming the button since the algorithm require some time to load. This is especially important

when predicting for a large amount of data and for many months into the future since this operation

may take a couple of seconds and therefore could cause confusion if nothing happened after pressing the

submit button.

18
5 Method

5.1 Project plan

The group decided to divide the project into three major parts; webpage, analysis and thesis. As the

project is dependent on finding relevant forecasts, the time spent on analysing the data must be planned

to take a significant part in the project. Our project plan takes this into account by planning analysis

in an early stage of the project with the same priority as programming the website. The thesis is also

planned to develop continuously during the project, documenting the parts being worked on as they are

developed. To make sure delays would not occur with any of the requirements, intermediate reconciliation

was held with Seco Tools. The group had a two week buffer in the end of the project in case any delays

would arise. A complete GANTT scheme can be found in Appendix G.

The group agreed upon the importance of everyone being familiar with the work in progress, but also to

get as relevant results as possible. Therefore, there was one person preliminary working on analysis and

one person working on programming the website. The other two project members were more free to take

turns working on the two modules. This way, the group always had someone who was familiar with the

work, and that new perspectives could come from those who worked on both modules.

5.2 Communication

Given an agile work environment, communication within the group has to be quick and effective as given

goals may change at any time. To achieve good communication within the group we have used multiple

platforms with different purposes of each platform. The group made it clear early on that we wanted

to do most of the work together at the same place and time. This led to most decisions regarding the

project being made orally at meetings, and not through written communication such as email, Slack or

Facebook messenger. Communication with Seco Tools was mostly made through Outlook as this is the

email-client used by the company, and each member of the group were given a company email to make

communication easier.

19
5.2.1 Facebook messenger

Facebook messenger was used as an informal communication channel, where group members could write

if they were late and send other messages of such nature. It was used this way because messenger is quick

and comfortable to use, but there is no easy way to save important communication to go back to at a

later date.

5.2.2 Slack

Slack is a communication platform designed for communication within projects. It allows for easy back-

tracking within channels, and you can create different channels for different parts of the project to not

clot a single channel.[23] Slack was used for communication with the project supervisor assistant and the

project supervisor. since the group did a majority of the work located in the same room at the same

table, important communication was done face to face rather than through chat.

5.2.3 Trello

Trello is a web-based list making application that allows collaborative work boards and shared lists.[27]

Trello was used by the group as a tool to distribute tasks and as a tool to remember tasks that needed

to be done at a later stage in the project. As the project evolved it was used more and more as a tool

for sprints, where every tiny little task was listed and then ticked off as they were completed over the

course of a week. Then the group tried to add new tasks for the following week on Friday afternoons.

See Figure 9 for an example of the groups Trello board.

5.2.4 Google Drive

Google drive is a cloud based platform for document sharing that allows for distributed updating of

documents. Drive was used by the group to write shared text documents such as deliverables and thesis.

It was also used for it’s presentation program that allows multiple members in the group to work on and

access the presentation.

20
Figure 9: Screen shot of the groups Trello board

5.2.5 Outlook

Outlook is an email and calendar service developed by Microsoft. This was primarily used to communicate

with employees at the company and also to share files with each other. Outlook was also used to maintain a

secure connection when sending sensitive data and also as a fast way to relay information and questions to

the company. The calendar feature was used to book meetings since this implemented most of the relevant

people at the company’s schedule and gave us a easy way to also book a room at their headquarters.

5.3 Team management

The team gave out roles to all members quickly after the project was started and the roles given are seen

in Table 2.

The group quickly realised that the project was not as clear cut in the role distribution as the group

first hoped, and the role assignment was modified after the day to day tasks that the group worked with.

Example of a different role distribution that was used was that two members worked with analysis of

the data, while two worked with the front-end of the application. At other times the whole group have

worked with the same high priority problem, to solve pressing problems together quickly. The usage of

software such as Trello has also varied a lot, depending on the mood in the group. Instead of using Trello

for task distribution and planning, we used a whiteboard and frequent discussions as the whole group

21
usually sat and worked together.

Table 2: Overview of each team members role in the project


Name Role
Samuel Dahlback Front-end developer and Project manager
Gustav Kruse Back-end developer and Communication manager
Albin Åbrink Security manager, Database developer and Report manager
Lotta Åhag Analysis and Visualization manager

6 Analysis

6.1 Course of action

The Seco Analytics lifecycle, as pictured in Figure 10, is the course of action the project had to go

through. In general, the analysis began with business understanding. Objectives that are interesting in

our business model had to be discussed and taken into account. Together with the market department

team a decision was made that the projects main focus should be the aerospace segment. The data has

been gathered over the time frame 2016-01-01 to 2019-04-02 and contains factors such as date, quantity,

income, customer country, product description etc when every sale occurred. Then we make sure that

all this data becomes processed and cleaned by deleting return orders and minus posts. Sales that is to

high and stand out from the data are reduced to a minimum. The cleaning is important to reduce the

amount of errors and maximize the data that creates value to the forecast. The cleaned data is then being

analysed and processed. This step is crucial to know which forecasting method to use. The forecasting

is made with a specific algorithm optimal for the data set. Visualisation of the forecast is then delivered

with the help of python and libraries such as pandas and matplotlib.

22
Figure 10: The workflow of the site development

6.2 Different kinds of machine learning methods

There are mainly three different kinds of machine learning families as seen in Figure 11. Supervised,

unsupervised and reinforcement. The supervised has a task driven input. This means that a specific

kind of input gives a specific kind of output. The algorithms that can be used in the supervised type is

regression and classification. The unsupervised method only have inputs and are data driven. Cluster

algorithms is used in the unsupervised method. The reinforcement method uses algorithms that learns

to react to an environment directly. In order to proceed with the self learning analysis, a match between

the dataset and a specific kind of machine learning method had to be made.

6.3 Analysis software

To be able to find what is important and interesting in a big data set, the data has to be analyzed using

technical features especially developed for big data. As the group didn’t have much experience in data

mining we looked for software that could help us with much of the analysis with the right features. The

group settled upon Rapidminer for this task.

23
Figure 11: Different learning algorithms [18]

6.3.1 Rapidminer

Rapidminer is a data science application, created by the company Rapidminer, that provides an inte-

grated environment for data preparation, machine learning, deep learning and predictive analysis.[16]

Rapidminer was used to find possible correlations within our dataset and to find which methods within

data mining and machine learning that gave the best predictions correlating to our dataset. Some vari-

ables were more important than others when it comes to forecast using machine learning to predict sales.

In Figure 12 and 13 we have examples on the analyses on the data where the most important factors of

correlation and prediction is showed. From the figures a conclusion that quarter, day of the week and

sales region have an impact on the positive and negative sales. The sales data therefore have seasonality,

trends and noise. This conclusion is then used when choosing forecasting method. There are different

kinds of machine learning algorithms and by using Rapidminer the most suitable machine learning meth-

ods are suggested. The methods the program are trying are mainly used for regression. Random Forest,

Gradient Boosted Trees and Support Vector Machine are being analysed and are the methods used for

regression.[10]

The correlation differs using different kinds of methods. Since the team have limited knowledge in how

to program the different kinds of regression models the team decided to focus on a model easy enough

to program and that would fulfill our need of seasonality, trends and noise but advanced enough to get

a good result. The decision fell on the ARIMA model.

24
Figure 12: Important factors for prediction using support vector machine

Figure 13: Important factors that supports prediction using Random Forest trees

25
6.4 Forecasting models

When we had developed a decent understanding of which aspect might have an impact on the prediction

we started to search for forecasting methods. One that we found that fitted well was the ARIMA

model. ARIMA stands for Autoregressive integrated moving average and is a model used for making

mathematical forecasting on data from time series. ARIMA is a generalization of the ARMA model

and does add a method to use non-stationary data. Non stationary means that the joint probability

distribution of the data changes over time, and consequently that the mean and variance may change as

well.[6] This does mean that a stationary process is a process that does not take into account where it

starts nor how many steps in the time-axis it has taken, the distribution and probability of all following

steps will be the same as if no steps had been taken. The simplest example of a stationary process is

white noise. No sales figures for any company behaves as white noise, as it often has direct dependencies

on earlier sales and seasonal effects.[9]

In the ARIMA-model the AR part signifies the prior values effect on the current value being forecasted,

the I indicates that data values has been replaced with averages between the old and the current values

and the MA signifies that the regression error is a linear combination of errors that occurred in the

past. When there is seasonality in the data, one can extend the ARIMA-model further to include for

this. The model is then called Seasonal ARIMA, generally written SARIMA(p,d,q)X(P, D, Q)s and takes

seasonality, trends and noise into account. p and q indicates the nonseasonal AR(the lag) and MA(the

number of terms to be included in the model) orders. With other words, p make sure that all lags up to

those orders are included and q indicates the integration order of the process. P, D and Q corresponds

to AR parameters, seasonal differencing and MA parameters.[9]

Figure 14: How the ARIMA model was fitted in views.py

The SARIMA model in views.py is used as in Figure 14 using the Python class statsmodel.tsa.statespace.sarima

26
which is represented as in Equation 1. A(t) is the deterministic trend polynomial and y the observed

time-serie process. s is the periodicity, also known as the span of seasonality. θ and φ are parameters for

the model.[9]

φp (L)φ̃P (Ls )∆d ∆D ˜ s (1)


s yt = A(t) + θq (L)θQ (L )ζt

Inspiration for the forecast method and Python code is taken from Towards Data Science: An End-to-End

Project on Time Series Analysis and Forecasting with Python.[12] In order to use the ARIMA model as

analysis method, the model must first be set with parameters which gives the best possible performance

of the forecast. To do this, the data is firstly investigated with SARIMAX which takes into account for

seasonality, trend and noise in the data. The parameters for the lowest AIC value, a value to estimate

the likelihood of the model to predict the future [21], is considered as the optimal choice for the ARIMA

model. For our model the parameters in Equation 2 yields the lowest value for order, and seasonalorder

which is reflected in the final forecasting script.

SARIM AX(1, 1, 1)X(1, 1, 0, 12) (2)

Before implementing the final forecast model, the method was also investigated by calculating the root

mean squared error and plot the data as a One-step ahead forecast.[29] Figure 15 shows an example of

the average daily sales for each month of indexable milling and ISO Turning to France and Germany.

The dark grey part of the forecast is a 95 percent confidence interval. The group considers the forecast

to be convincing enough to use as model, which also is confirmed by the low RMSE value. The forecast

method has several parameters which is chosen by the developers. Example of these are grouping the

values to months, from which date the forecast should start and for how many months to predict the

forecast.

27
Figure 15: One-step ahead Forecast to validate our model.

6.5 Analysing correlation to external data

Data from Seco Tools is in the webpage plotted and correlated with external data. The external data

is collected from various sites, sorted and converted to separate excel files to be executable by a general

Python function called generatePic in views.py. Seco Analytics initially provides external data for Airbus

share stock, jet fuel price in the U.S., crude oil price in EU, Global GDP Index, Bitcoin price and avocado

price and the overall idea is visualized in figure 16. The variety of external data is used to investigate

the relevance of example share stock prices compared to avocado prices. As the data consists of metric

variables (on which calculations are meaningful), the correlation can be calculated by Pearson correlation,

which pairwise compute the correlation between the selection of Secos sales data and the external data

for each month, excluding all null values.[1] Pearson correlation returns a number between -1 and 1,

which tells whether there is a negative linearly relation, or if the variables are highly linearly related.[28]

If the correlation is close to 0, there is no linear relation. Note that there still could be a non linear

relation between the variables if the correlation points to 0. In this project, the function.corr() is used

from pandas.DataFrame to automatically calculate the Pearson correlation.

28
Figure 16: Overview of the idea of external data.

6.6 Selection of data

The data provided by Seco is gathered in an excel file with roughly 600 000 distinct rows. Some parts of

this data are sorted out to get a more credible forecast. Table 3 shows what SQL parts in views.py that

sorts out which data and why.

Table 3: Data chosen to be excluded from the dataset.


Order_SEK > 0 A lot of orders have been returned and are
therefore minus posts, which are not included
in the forecasts.
Order_SEK < 10000000 Two orders are over 10 million SEK, which
have a negative impact on the reliability of
the correlation and forecast. These are sorted
out from our data.
Order_Date < ’2019-03-01’ Only select dates for the orders that were al-
ready made at the time we got access to the
data. Thus, all future orders were sorted out
of the data.

7 Database design

Using an excel file that contained sales data to aerospace industry provided by Seco Tools the group tried

to find what parts of the data was interesting so that we could simplify the database to a scale that would

be better suited for the project. Simplifying down the original 18 columns into an ER-diagram that looked

like figure 17. The relational model in figure 18 was also produced. The thought behind simplifying the

model and database were that given the big amount of data it would speed up the calculations and make

it easier to avoid errors in the data handling. First a small database was implemented that contained

29
10000 lines as a testing database, and then the entire database was implemented into an SQLite database

using SQLiteStudio.

Figure 17: Descriptions of the different kinds of attributes, entities and keys of the database created in
Django Models.

Figure 18: Database design of the database created in Django Models.

7.1 SQLite

SQlite is a relational database management system that is popular in application software such as web

browsers. It is arguably the most widely used database engine in the world, and is well suited for websites

and data analysis. The biggest limitations of SQLite compared to client/server-databases are the size of

the dataset that is limited to 140 terabytes and the lack of concurrency. This means that only one person

at a time can write in the database, but as the number of readers are unlimited and there is no way

to write to the databases without being an admin. As the odds of a few admins changing concurrently

in the database are low, SQLite will work sufficiently. Reaching 140 terabytes of data in text would

be equivalent of 1013 rows with small amounts of data, but adding more data would not make the the

number of possible rows go down so drastically that it will impact this project as the group were given

less than 106 rows.[24] SQLite was used by the group as it is well suited for the tasks the project required

and because the limitations of an SQLite database were not impacting on the project, nor will they be if

the project is expanded upon.

30
7.2 SQLiteStudio

In the program we used two different databases. One smaller for the webpage that is static and one

database that would represent the live updated database from the company. The smaller database was

also used as the way to create the queries from the larger database. This was done through Django Models

where each of the different selectable variables were assigned from the Models. This smaller database

created in Django Models is what you see in Figure 17 and 18. The saved attributes from the smaller

database was used to search through the larger database and extract the complete data. Django Models

create a SQLite3 database, meaning it is a relational database embedded in the endprogram. This suites

the application as the available attributes then are stored online on the site meaning the attributes don’t

have to be fetched from a database located on another server. To arrange the data from the Excel file to

a database file, SQLiteStudio was used. It is a program that is easy to use and was a fast solution to our

database problems. See the SQLite code in appendix B.

8 Technical solutions

8.1 Development software

When writing the project code different frameworks must be used together for the front-end and the

backend and it is easier to write code in programs designed for programming, so called IDE:s (Integrated

Development Environment). The ones that the group decided to use was Atom, which we had used in

earlier courses for the front-end programming (HTML, CSS etc.), and Spyder as it is a powerful IDE

that makes it easy to run and test python scripts. To share the code in the group, we used GitHub in

the beginning. But GitHub came with a lot of problems for our workflow and when more time was spent

on GitHub than actual coding the group decided to change it’s approach and email each other the code

instead.

8.1.1 Github

GitHub is an online code sharing platform that enables multiple developers to work on the same code.

It also provides automatic merging of different versions of the code, reducing the risk of colliding code

from different versions of a code. It also provides a back-catalogue of old versions of code that has been

31
committed to GitHub, enabling backtracking through code to find bugs.[8] GitHub was used by the group

as a medium to share and upgrade the project code.

8.1.2 Django

Django is a plug in, or a framework that is used for high level webpage solutions. It is used for its

scalability, but also for its security oriented environment. Django also enables quick and easy web

application development as it encourages principles such as DRY (don’t repeat yourself).[3] Django was

used mainly for the projects backend solutions and for the login page with its SQLite database. This

database enabled for easy and secure handling of users and a way to “index” the database, as we could

load in certain key elements from our data that was used when collecting information from the complete

data set. The project was developed in Django 2.1.7 as that was the latest stable update available when

the project began.

8.1.3 Spyder

Spyder is an IDE made for scientific coding that is written in Python, and supports coding in Python.[5]

Spyder can be found in the Anaconda environment that is an environment for data science and machine

learning that simplifies maintenance of software as the anaconda environment does it for you. It also

supplies many packages and makes it so that the developer doesn’t have to install them all independently.

Spyder was used within the Anaconda environment for the development of the scientific computing done

in the project, such as the computations of the ARIMA models forecasting. It was used for this because

of its stable environment with stable static code analysis and debugging capabilities.

8.1.4 Atom

Atom is an IDE developed by GitHub for development in HTML, JavaScript, CSS and other web devel-

opment languages that has multiple built in features regarding code sharing on GitHub and everything

that follows using GitHub for project code sharing.[2] As it also supports development in Django, and

our group decided to use it as it also provides a clean workspace that multiple members in the group

were familiar with from earlier projects.

32
8.2 Programming languages

8.2.1 Python

Python is a high level programming language with a small core and large standard library. Python is

widely used for web development and scientific computing and is one of the quickest growing programming

languages in the world.[15] The project was developed using python 3.6 since that was the latest version

that all of our desired packages supported fully. Almost all of the backend was written in Python,

including the Django framework and the analysis methods.

8.2.2 HTML

HTML (HyperText Markup Language) is a programming language used primarily for website develop-

ment. HTML allows the user to create different elements and give them attributes. These elements

include buttons, hyperlinks and headlines.[20] In this project all front end have been based on HTML

accompanied with the support of CSS and JavaScript.

8.2.3 CSS

CSS (Cascade Style Sheet) is a programming language used to structure websites and give them design

attributes. This works by having a HTML document with a number of elements defined. The CSS is

then used to give them positions on the site, give them a specified size and colors for example.[19] The

group has used CSS for almost all of our design and simple design scripts such as changing the color of

certain buttons on hover.

8.2.4 JavaScript

JavaScript is a script based programming language with the main purpose of providing a website with

scripts. These scripts can be activated using a button or upon loading the page. The script can be divided

into functions which perform a certain operation.[13] These can include reloading the page, changing the

style or text of a HTML element and saving values from inputs. jQuery is an extension to JavaScript

that implement a bunch of new features and improvements. One such feature that we used in this project

33
is AJAX. This enable us to easily send information from our front-end to our backend and get a return

back. Here we used it to send information from our webpage to our backend for analysing.

8.3 Security

Early in the project a security assessment was made and the security specifications of the system were

written down.

To ensure the systems security it shall not be accessible for change through the webpage,

for normal users. When admins want to implement an update, all changes should be ACID

compliant to avoid loss of transactions and provide a reliable database. For us the Atomicity,

Consistency and Durability characteristics are most important as it is a database that’s not

supposed to be changed by many different transactions simultaneously, but rather by one admin

that updates the database.

The system shall not be susceptible to usual security breaches, such as SQL injection or

brute force password hacking. To achieve this we will build a well structured database, hashed

and salted passwords and (if we have time) enforce more complex passwords. It shall also

be ACID-compliant. The system should be quick enough to make the predictions of a large

amount of data with reasonably short computation time. To achieve this we will have to be

able to store what data is important in relation to other data, and be able to sort and store

specific data so that all predictions run smoothly.

The group started producing a backend in Django and realized that most of these features would be

handled to some degree by the built in security features in Django. Django’s built in security features

that follows with the recommended backend set up are:

• XSS (cross-site scripting) protection: Prevents users from inserting malicious client side scripts into

the browsers of other users.

• CSRF (Cross site request forgery) protection: Prevents users from doing POSTs with another users

credentials without that users knowledge or consent, by executing a check on every POST-argument.

• SQL injection protection: Prevents SQL injection by using query parameterization.[4]

34
Out of these security features, the most important ones for the system are the protection from SQL

injection and cross-site scripting. These are more important than cross-site request forgery as the system

does not allow for posts to the site from the user. SQL injection is important because the data that is

stored in the database is sensitive to the forecasts and the company, and unauthorised users being able to

alter the data would undercut the whole system. As the team wrote parameterized SQL, the system was

determined safe enough from SQL-injection, but this could be developed further in future work with the

website. Regarding login and being able to access others passwords, Django provides hashed and salted

saved passwords in the database to protect from brute force attacks on the database. The algorithm

used in this project to hash the password was PBKDF2 which is a well established algorithm for hashing

passwords[11], therefore the group saw it as a suitable password protection solution.

9 Evaluation

9.1 Test plan

When producing a test plan focus was on three main parts, simplicity, usability and effectiveness. The

simplicity part was used to determine which level one needs to be to understand how to do a given task

and which words might be hard to understand. To get an accurate result multiple people of different

educational backgrounds were selected for testing. The usability was tested to see if the test subjects

could achieve what they wanted and if the returns were satisfactory with a good amount of data presented.

The last thing we checked for was effectiveness. This was done to see if a test subject could perform their

task in a fast enough time to seem viable.

9.2 Test procedure

Before the tests were carried out, all participating test persons had to read through a consent form

containing information on what type of test would be performed and that the test person in question

could withdraw at any time. All test participants were guaranteed anonymity. The consent form can be

seen in Appendix F.1.

35
9.2.1 Test manuscript

In order to test the product and obtain credible results, a usability test manuscript had to be created. To

ensure that the same text were red by all the test persons, even if different members of the team conducted

the test, a standard set of task forms was printed and used to fulfill our needs of objectivity. This means

that the only thing that differs is the test persons personal using performance. The manuscript is in

Appendix F.2.

9.2.2 Results

Test persons were selected based on previous educational background. Seco analytics is mainly meant for

people who have experience in the business, but could also be developed to be used by a broader target

group both within and outside Seco. Therefore, a greater diversity of test persons was sought. The final

group of test persons thus became one person with previous knowledge of the business, three engineering

students and one without any educational background to reflect the intended target group. While testing,

one team member sat beside the test person to give support if needed, and one team member took notes

of all interesting reactions or behaviour when interacting with the webpage.

The tests proceeded as planned and a lot of information came out of it. Table 4 shows a summary of the

reported reactions.

9.3 Conclusion

The results gathered from the usability test generate a decent amount of ways to improve the platform

in order to have a user friendly application. Some functions got fixed directly after the useability test

took place. Such as:

• The user can only choose predictions from 3 to 24 months. All numbers below 3 gives a forecast

for 3 months and all numbers above 24 gives a forecast for 24 months.

• The grey picture on the forecasting page got changed to an image of a blank graph

• Some linguistic problems were changed to more explaining text.

36
Test Person Reaction
Student 1 Was confused by the positioning of the selection of countries
etc as his eyes was on the graph part of the page. Thought
the summary was the selection instead. Some expressions
caused confusion. The first picture presented caused con-
fusion as the text was not related to the site.
Student 2 Thought the overall design was old school and dull. Found
it weird that an external data had to be assessed for all
forecasts. Was slightly confused by the choice of words on
the site.
Student 3 Uncertain by the fact that one external comparison is al-
ways done. Didn’t find the time to predict in the tests and
found the choice between SEK and quantity odd.
Person with no educational background Got confused of the format for “number of months”. The
user tried for example to fill in 03. Test person was also
unsure about several abbreviations such as QTY and in-
house terms for products.
Seco worker Everything is clean, great design and nice pictures. The
grey picture on the forecasting page is not satisfying. Some
linguistic problems and misconceptions Thinks that the
platform should have more correlations and avocado price
is unnecessary.

Table 4: Reported reactions from test users

Because of time limitations there are some things we have not been able to fix. These things are requiring

a larger amount of time to get it to work. More of the things that can be improved are accounted for in

the “Hand over” section in 10.4.

10 Discussion

10.1 Process over time

A major threshold in developing Seco Analytics was that the team members had no previous experience

in programming with Django which took some time to set up and learn how to use. Therefore, the work

in progress went much faster after getting into the Django framework, especially as the team always had

a clear plan of what the next step should be in the development. Another threshold was the database as

almost all functions in the project is dependent on its existence. There was an uncertainty about what

kind of database would fit best in the project and therefore took some time before coming to a solution.

Models are built in Django, but we had previous experience in using SQLite. Otherwise, the development

of the webpage went smoothly and all front and backend could be finalized in a few weeks. Another

threshold was choosing the analysis method. Luckily, we found some methods quite early in the process

37
that helped us getting the resulting forecast we wanted to achieve.

10.2 Lessons learned

If we would have done this project again, there are a few things we would have tried to do differently,

although we are very content with the product that we have developed. For example, we would try

to set up a list of functions that the final product would strive to have in an early stage, even if it in

this case was hard as we did not have any information about the data we were about to receive. We

would also spend more time to come up with a mockup in an earlier stage, or at least which style the

graphical interface that we were looking after for buttons and background. Another lesson learned is to

have particular meetings which is dedicated to discuss relevant problems. The most important feature

of these meetings should be that everyone is active and not working on other things in order to find a

solution for the particular task together.

10.3 Further development

There are plenty ways to improve the forecasting platform and as the development progressed, more and

more ideas for development was thought of by the team. Given the limited time-frame, focus was on

making the platform operational.

In the backend the company needs to know how to structure the total database in different segments.

From there, the "live" database must be linked to the python part. The analysis tool then retrieves data

from this database.

As for the analysis part, the company must have a deeper understanding of how python works. A general

idea of the choice of analysis model must be acknowledged in order to efficiently get an appropriate

machine-learning analysis. Given more time, implementations such as the following bullet points could

have been performed.

• Being able to make forecasts on products rejection to give better insight in products lifetimes and

reduce waste, and also being able to this for different departments within Seco.

• Forecast over the flow of production.

38
• Interactive figures where you can zoom in and out and see more detailed stats directly in the graph.

• Not force users to run a forecast and correlation with external data, but give the user more flexibility

in what is done with the site.

• Adding more market segments to the site, meaning adding more databases or restructuring the way

the current databases work.

• Being able to add more specific product management areas to make the analysis of the product

more dynamic and representative of reality.

• The design of the site can be made more attractive by making the buttons look better,change of

background color and other graphical changes.

• Being able to choose and compare different forecasting algorithms for the data, and get the com-

parisons in an comprehensible manner. This could be achieved by adding in more APIs that could

make the calculations, or by writing the calculating scripts from scratch.

• Easier way to implement correlation to external data. For example by gathering all external data

in the same file, and make the script run over these without having to adopt the data to the extent

that is needed in the current version of the webpage.

10.4 Hand over

For the project to be continued, developers must be somewhat acquainted with a few different program-

ming languages and different frameworks within these languages. To be able to continue the work with

the visual components of the web site, knowledge of classic web development languages such as HTML,

JavaScript and CSS is required. The front-end components of the site are built in these languages in a

traditional manner. For the composition of the site Django was used to create the view structure and

most of the backend components of the site.

The backend is built in python almost exclusively. This include the Django framework which is entirely

based in python. The analysis scripts found in the views section are also written in python. Here some

python libraries are included, for example matplotlib which is a python tool to perform Matlab tasks.

Another is numpy which include a lot of mathematical structure and multiple dimensional array types.

Due to these objects being written in python a good understanding of it is important to continue to

39
develop it. The database is written in SQL but it is fairly basic with only a few tables and no foreign

keys so basic SQL knowledge is enough.

The frontend is written in HTML, CSS and JavaScript. These three languages complement each other

with different functions. If one part of the site is to be changed there might be a need to change the code

in all three languages. Therefore it is important to have a decent understanding of all three of them in

order to change the frontend part of the website. The JavaScript also include some jQuery, mostly used

to send information between the front and backend. That means that if the view scripts remain the same

no jQuery knowledge is required. If the view scripts change to require more parameters or change the

script entirely the jQuery have to be updated and will therefore require some knowledge on how to do so.

This project could be developed by a skilled web developer and a skilled data scientist/statistician.

Skills required to continue work on the site are:

• Knowledge of Python both as a tool for web-development through Django and for scripts for anal-

ysis.

• Basic knowledge of programming languages for web-development HTML, JavaScript and CSS.

• Mathematical knowledge of different forecasting methods and the difference between them.

• Basic knowledge of how to plot with Matlab.

• Basic sql queries to fetch data from database.

• Understanding differences between csv files and excel files, as well as how to convert these. It is

also good to know how to extract rows and elements from an excel file with python code.

• Required libraries: Django, Django import-export, numpy, pandas, matplotlib, statsmodels, sqlite3,

pylab and sql parse.

40
References

[1] pandas.dataframe.corr. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.


DataFrame.corr.html. visited on 2019-05-20.

[2] Atom. Why atom? https://flight-manual.atom.io/getting-started/sections/why-atom/. visited


on 2019-05-21.

[3] Django. Django(web framework). https://www.djangoproject.com/. visited on 2019-05-23.

[4] Django. Security in django. https://docs.djangoproject.com/en/2.1/topics/security/. visited on


2019-05-23.

[5] Spyder Docs. Spyder: The scientific python development environment — documentation. https:
//docs.spyder-ide.org/. visited on 2019-05-23.

[6] Eddie McKenzie Everette S. Gardner. Why the damped trend works. https://www.bauer.uh.edu/
gardner/docs/pdf/Why-the-damped-trend-works.pdf, 2009, Oct. visited on 2019-05-23.

[7] IFS. Enterprise software usability digital transformation. https://www.ifsworld.com/us/sitecore/


media-library/assets/2017/09/21/enterprise-software-usability-and-digital-transformation/.
visited on 2019-05-25.

[8] Github Inc. Github. https://github.com/features. visited on 2019-05-23.

[9] Weiming JM. Mastering python for finance: Implement advanced state-of-the-art financial statistical
applications using python, 2nd edition. Birmingham: Packt Publishing, Limited; 2019.

[10] Hans-Peter Piepho Joseph O Ogutu. A comparison of random forests, boosting and support vector
machines for genomic selection. https://www.researchgate.net/publication/51176259_A_comparison_
of_random_forests_boosting_and_support_vector_machines_for_genomic_selection, 2011, May. vis-
ited on 2019-05-27.

[11] A. Rusch K. Moriarty, B. Kaliski. Pkcs 5: Password-based cryptography specification version 2.1.
https://tools.ietf.org/html/rfc8018#section-4, 2017, Jan. visited on 2019-05-27.

[12] Susan Li. An end-to-end project on time series analy-


sis and forecasting with python. https://towardsdatascience.com/
an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b,
2018, Jul. visited on 2019-05-15.

[13] Mozilla. Javascript. https://developer.mozilla.org/en-US/docs/Web/JavaScript. visited on 2019-


05-15.

[14] PR Newswire. Digital workplace market - global forecast to 2023: Use of advanced technologies,
tools, and employees demanding a greater work-life balance is going to drive the digital workplace
market. https://www.prnewswire.com/news-releases/, Updated 2019, Feb. visited on 2019-05-21.

[15] Python. Applications for python. https://www.python.org/about/apps/. visited on 2019-05-20.

[16] RapidMiner. Make data science fundamental to any business outcome. https://rapidminer.com/
solutions/. visited on 2019-05-27.

41
[17] Margaret Rouse. machine learning (ml). https://searchenterpriseai.techtarget.com/definition/
machine-learning-ML, Updated 2018, May. visited on 2019-05-20.

[18] Madhu Sanjeevi. Different types of machine learning and


their types. https://medium.com/deep-math-machine-learning-ai/
different-types-of-machine-learning-and-their-types-34760b9128a2, 2017, Sep. visited on
2019-05-18.

[19] W3 Schools. Cascading style sheets. https://www.w3.org/Style/CSS/Overview.en.html. visited on


2019-05-17.

[20] W3 Schools. Html. https://www.w3schools.com/html/html_intro.asp. visited on 2019-05-17.

[21] ScienceDirect. Akaike information criterion. https://www.sciencedirect.com/topics/


medicine-and-dentistry/akaike-information-criterion. visited on 2019-05-20.

[22] Aishwarya Singh. Stock prices prediction using machine learning and deep learn-
ing techniques (with python codes). https://www.analyticsvidhya.com/blog/2018/10/
predicting-stock-price-machine-learningnd-deep-learning-techniques-python/, Updated 2018,
Oct. visited on 2019-05-21.

[23] Slack. Slack. https://slack.com/intl/en-se/. visited on 2019-05-15.

[24] SQLite. Limits in sqlite. https://sqlite.org/limits.html. visited on 2019-05-23.

[25] Asela Kulatunga Subodha Dharmapriya. New strategy for warehouse optimization – lean
warehousing. https://www.researchgate.net/publication/264886028_New_Strategy_for_Warehouse_
Optimization_-_Lean_warehousing, 2011, Jan. visited on 2019-05-20.

[26] Seco Tools. Om seco. https://www.secotools.com/article/742?language=sv. visited on 2019-05-15.

[27] Trello. Trello. https://trello.com/. visited on 2019-05-24.

[28] SPSS tutorials. Pearson correlations – quick introduction. https://www.spss-tutorials.com/


pearson-correlation-coefficient/. visited on 2019-05-23.

[29] Ky M. Vu. The ARIMA and VARIMA Time Series: Their Modelings, Analyses and Applications.

[30] CFO World. Dåliga it-system kostar miljarder. http://cfoworld.se/


daliga-it-system-kostar-miljarder/, Updated 2016, May. visited on 2019-05-19.

42
Appendix

Appendices

A Contributions

A.1 Albin

In the project Albin was preliminarily responsible for the security, databases and the report writing.
This would have meant quite a large part of the project and a fun challenge. The parts Albin did
only partially represent the preliminary responsibilities as the group partially abandoned the preliminary
responsibilities and started working more dynamically within the group. This meant that Albin’s focus
was shifted from the back-end to mainly the front-end, designing the visuals of the sight and spending
time making it somewhat dynamic when it comes to screen-size and view resolution. This came with
some problems as the goal of the project had some minor changes along the way that impacted the design
aspects of the website. The group also had the policy that everyone in the group should at least have
insight into the work the other members were doing, meaning Albin also kept tabs at everything that was
done within the group and tried to contribute wherever he could. In the report, that was Albins area of
responsibility he wrote about much of the miscellaneous that wasn’t anyone in the groups specific area
of responibility. He also wrote about the parts he had been assigned responsibility of in the beginning of
the project: Security and Database.

The biggest learning outcome for Albin in the project was the importance of the planning in a project
of this scale. The importance of making preliminary mockups and system requirements as ambitious
and well thought out as is possible before any “real” programming is made is something important Albin
learned as well. Albin gained a lot of insight in project management, and the planning the workflow
the project was supposed to have. He has also realized that he must take more time than other team
members to get comfortable coding frameworks such as Django, and that Albin takes time to get going
in coding he is not completely comfortable in. This meant that Albin personally would have needed
more time to get the project started, and as the team worked faster than he did he had to revert to
tasks he was more comfortable with from earlier projects he had been a part of. Despite this, Albin
learned a lot about the Django framework, and had multiple insights into what goes into the back-end of
a website. This includes, but is not limited to, the security aspects, division of different users of the site,
code/file structuring of different components of the site. He also further developed his understanding of
the front-end languages HTML and CSS and what was capable to produce with the two languages.

The biggest challenges for Albin in this project can be divided into two different categories: collaboration
in the group and programming. Programming in new frameworks such as Django takes a bit of time to
get used to, and that was difficult for Albin. The fact that this was a project within an area that Albin
had no experience (market predictions) was a challenge as well. The collaboration in the group was a

43
challenge as all members had their own vision of what the project was supposed to be. This made it so
the group had to discuss many things to find a path that most of the group was comfortable with. This
forced Albin to realize that he couldn’t always get his way and to be more humble as the results were
better than he thought they would be. As collaboration is the biggest part in a project of this caliber
and size, he will take these insights with him in future projects.

A.2 Gustav

In the beginning of the project Gustav was primarily engaged in the communication towards the company
and the analysis of the data given. Since the group decided to start early, before christmas, a lot of time
was invested in the design part and strategic design questions together with the company. Gustas spent
some time at the market department in order to gain some business understanding and to point out how
the organisation works with forecasting.

Together with Lotta, Gustav invested a lot of effort to understand and analyse the market data. Mainly
using tools like WEKA and Rapid Miner to get an understanding of how to proceed with the selection
of an machine learning algorithm. The first few weeks was mostly used by reading about data science
in general, and forecasting in particular. A lot of effort was put in discussions and how to proceed with
the project in the best way suited. Within the group, an agile working strategy made it possible to
solve analytics problems that came up on the way. Together with Lotta i contributed to the deliverable
of the machine learning algorithm, creating a python program that could read data from the database
and export it to the webpage. At least three different kinds of machine learning algorithms were used
for forecasting, mainly network and classification methods, and after some discussion the group agreed
to proceed with the ARIMA method, since the other algorithms tested by Gustav took a lot of time to
process all the data and had some configuration problems. Gustav also developed the SQLite3 database
where the market data was fetched from using SQLiteStudio. The market data was given to the group
as an excel sheet and in order to get a fast webpage he built a database with the sales data from the
aerospace branch. A lot of effort was put in importing and exporting the data using Django. Gustav
programmed the django code at the analysis page were the scroll bars printed the products and countries
from the database. In the thesis Gustav focused mainly on the usability testing, Socio Technical aspects,
the platform, how seco works today, Analysis and the SWOT analyse. The most of the figures were made
by Gustav. He also contributed to other sections and overall refinement as well. Gustav also made the
most of the presentations together with the slides presented at the company and the university.

This is the first time I have worked in a big developing project like this size, and overall it was very
interesting. It has been a new experience for me to focus on a product developed by a group of individuals.
Since I have some experience from the products itself regards how to make them on different production
sites and RnD projects it has been a blast to understand the market and to forecast the market trends.
It has been fun to focus on an idea given to us, mainly with an open agenda and from there deliver
a product that directly from the start will generate value for the company and its customers. One of
the things that’s been a challenge is to cooperate with others and try to interpret the clients needs. I
think that this has been a very valuable experience and a great lesson in listening to the needs in an
organisation and within the project team and from there deciding how to proceed when problems arise.

44
A.3 Lotta

Lottas role in the project was initially to take care of the analysis and visualization of the forecasts, as she
have some previous knowledge of statistical machine learning. She would therefore have more background
information and be able to familiarize with the data effectively to find suitable analysis. Lotta spent
together with Gustav the first few weeks analyzing data using RapidMiner and WEKA. Lotta also found
a suitable analysis and forecasting method that could be written in Python and also adapted to the
type of data available. Since the team chose to work with Python and the chosen forecasting script uses
Matplotlib and Lotta is comfortable working with both languages, this also became her responsibility
area.

Lotta wrote and adapted the forecast script to the data and linked the script to the SQlite database. She
also took great responsibility for backend, database and partly project management to drive the project
forward. More specifically, Lotta wrote models.py and loaded the database by writing script csvToDB.py,
which reads a csv file with data provided. In the end, however, models.py is not used very much, but can
still be useful for those who want to develop the product in future.

Lotta wrote a script to generate subplots on Seco’s data together with external data to finally calculate
the correlation between the two. This includes a script that reads from excel files where the external data
is gathered. At the end of the project, Lotta was also responsible for debugging as well as switching to
suitable variable names and writing comments and file descriptions. She has been an active participant
in how the product should be designed. Both through which functions it should have, graphic appearance
and ease of use. She has also taken a big role in structuring the work in the form of setting goals and
discussing the status of the project to the timetable.

In the thesis, Lotta has mainly been writing about the code in which she was involved with during the
project. This also includes text about project description, target group, major parts of design, selection
of data, process over time, lessons learned, correlation to external data, forecasting models and parts of
evaluation.

Above all, Lotta has learned how to use Django to set up a webpage. It took longer time to learn how to
use Django than she thought, especially the models-part, but she really enjoyed it later when the workflow
made sense. Lotta has also become a more confident Python programmer and would not hesitate to use
it to solve future programming problems. She learnt a lot about handling data and choose methods for
analysis which she is most happy about as it is something she hope to be doing more of in the future.

Next time Lotta will do something similar, it will take much shorter time and she will be able to work
more effectively. Of course, as a full time project over three months, Lotta feels that the experience has
made her a better team member and better at handling conflicts within the team. For Lotta, the most
exciting part of this project has been to try somewhat similar to consultancy for a company. It’s been
interesting to see what differs between the university and company’s perspective of our project.

45
A.4 Samuel

The role of Samuel was project manager and front-end developer. This meant that it was Samuel that was
responsible for tasks such as keeping the time schedule and making sure everyone knew roughly what they
needed to do. Furthermore, the project manager role meant that Samuel had to have an understanding
of what everyone was doing so that the front and back-end would be synchronized. This meant that a
lot of time was given to study every field in the beginning before a lot of the programming began.

As a front-end developer Samuel was responsible for all the design and all the front-end programming.
This include HTML, CSS and JavaScript. Regarding the design everyone made contributions to the
design and then Samuel tried to merge all the elements the group wanted into a mock-up. This mock-up
was not a final product but it matches the end mock-up pretty well. The changes primarily came from
the company wanting new functions added to the program.

When it came to the HTML Samuel did the majority of the work with some help from the rest of the
group. The same goes with the JavaScript part where almost all JavaScript was done by Samuel. The
jQuery script used to connect the front and back-end was made by Samuel and Lotta together. The
CSS was made in collaboration with Albin where Samuel made the basic structure of the site, Albin was
responsible for making everything scalable and positioning.

The biggest lesson Samuel learned during this project was how to structure a workplan for a start-up
project. Among other things this include how much time it really takes to do certain things and how
important it is to have a design and a prototype before you start hardcoding. Another great learning
experience for Samuel was how to use Django and all that came with this. This include that main
framework, the model system in Django and how to connect the front and back-end of a webpage in a
much more advanced manner. The possibilities of HTML and JavaScript also bloomed out during the
project and inspired Samuel to continue to explore them in the future. Also working with a company
and getting demands from them gave Samuel a great deal of work experience.

Some challenges that Samuel experienced was the task of trying to figure out what the company exactly
wanted and how to implement this. Another challenge was how to plan the time of the different part
of the project so that one part did not get to far ahead of the rest. Since the group was dynamic and
everyone was able to perform most tasks this got solved somewhat but we had a few days where everyone
basically had to wait for one task to be complete before the rest could proceed.

B SQL database using SQLiteStudios for the analysis

This was made in a program named “SQLiteStudio”. The name of that database is SecoDB and is being
saved in the same folder as the analysing program (Views). mA table named Sales_Data was created
with 18 Columns using following commands:

CREATE TABLE Sales_Data (

46
Product_No BIGINT,
Product_Desc CHAR (50),
Product_Line CHAR (50), Product_Manage_area CHAR (50),
Product_Management_Area_2 CHAR (50),
Grade CHAR (50),
Sales_Region CHAR (50),
Company CHAR (50),
Global_Key_Account CHAR (50),
Customer_Name CHAR (50),
City CHAR (50),
Customer_Group CHAR (50),
Customer_Country CHAR (50),
Salesman CHAR (50),
Order_Date DATE,
Order_Qty BIGINT,
Order_SEK BIGINT,
Order_Lines BIGINT
);

47
Figure 19: Columns in the big data base representing different kinds of information type corresponding
to different orders

The database is being uploaded with the following commando:

sqlite> .separator ; (Because a CSV file is separated with ;)


sqlite> .cd c:
sqlite> .import aero.csv sales_data (Our sale data is saved in aero.csv)
sqlite>

The data is now saved.

To import the data in our python-script we use the following commando:


conn= sqlite3.connect(’SecoDB.db’)
cursor = conn.execute("SELECT * from Sales_Data LIMIT 1") (As regular SQL code)

48
C Master thesis description

The group made contact with Seco Tools in December regarding the possibility to do a project for the
company. The project description that was used as a basis for our project was initially meant a master
project. After some negotiation with Seco Tools an agreement was made on the scale of the project
and after the approval of trying the project from our course supervisor, the group started to work on
understanding what actually needed to be done. Figure 20 is the Master project description.

49
Figure 20: The description of the Master Project

50
D Design draft

Figure 21: Login page. Only users who have received a login ID by admin can access the application

Figure 22: Project page. An overview of all saved project for this specific user. User can either choose
“view” to return to an old project, or “Edit” to change the setting for the specific project.

51
Figure 23: The user can apply for a project which the Admin has to approve. Each project is described
by information that the user manually fill in.

Figure 24: Main page inside a project. Here one can choose to forecast trend based on chosen variables,
or view different plots with associated summary of the predictions.

E Design

52
53
F Usability testing

F.1 Informed Consent Form

The following will provide you with information about this usability evaluation that will help you in
deciding whether or not you wish to participate. If you agree to participate, please be aware that you
are free to withdraw at any point throughout the duration of the evaluation without any penalty.

In this study we will ask you to use the computer provided. All information you provide will remain
confidential and will not be associated with your name. If for any reason during this study you do not
feel comfortable, you may leave and receive the agreed upon compensation, no questions will be asked
and your information will be discarded. Your participation in this study will require approximately 15
minutes. When this study is complete you will be free to ask any questions. If you have any further
questions concerning this study please feel free to contact us through phone or email:

Gustav.Kruse_c@secotools.com or (0723505887).

Please indicate with your signature on the space below that you understand your rights and agree to
participate in the experiment.

Your participation is solicited, yet strictly voluntary. All information will be kept confidential and your
name will not be associated with any data or findings.

54
City and Date Signature

Print Name

F.2 Tasks

Task A

You are curious about how ISO-Turning sales go in India, and want to know how it will go 3 months
ahead. Order a forecast on the webpage with quantity as y-axis any external data is ok. To mark the
task is done, tap the table.
Your task:
i: Order A forecast for the number of inserts in India and ISO-Turning.

-To start the task, put the card on the table.


-To mark the task is done, tap the table.

Task B

You are curious about how sales are going in France for three optional products, and want to know how
it correlates with GDP. Order a correlation on the webpage. To mark that the task is done, tap the table.
Your task:
i: Order a correlation between sales of three optional products in France and its correlation with GDP.

-To start the task, put the card on the table.


-To mark the task is done, tap the table.

Task C

You want to know how sales go for all products in all countries and want to know the forecast for the
coming year. Order a forecast on the webpage. To mark that the task is done, tap the table.
Your task:
i: Order a forecast of 12 months for all product areas and countries.

55
-To start the task, put the card on the table.
-To mark the task is done, tap the table.

G Project plan

Figure 25: GANTT scheme containing project time plan.

56

You might also like