You are on page 1of 76

This final thesis has been carried out at the School of Engineering at Jönköping

University within Computer Engineering. The authors are responsible for the
presented opinions, conclusions and results.

Examiner: Johannes Schmidt


Supervisor: Garrit Schaap
Scope: 15 hp (first-cycle education)
Date: 2023-05-31

1
Acknowledgements
We wish to extend our most sincere gratitude for the invaluable guidance and support
provided by our supervisor, Garrit Schaap, throughout this study. Additionally, our
profound appreciation goes to Tobias Wahlström at Knowit, who has contributed
significantly to this research by generously dedicating his time to share his technical
expertise. Furthermore, we would like to extend our heartfelt thanks to all the test
participants who generously dedicated their time to assist in the collection of data in
this study.

2
Abstract
Serverless functions have emerged as a prominent paradigm in software deployment,
providing automated resource scaling, resulting in demand-based operational
expenses. One of the most significant challenges associated with serverless functions
is the cold start delay, preventing organisations with latency-critical web applications
from adopting a serverless technology.
Existing research on the cold start problem primarily focuses on mitigating the delay
by modifying and optimising serverless platform technologies. However, these
solutions have predominantly yielded modest reductions in time delay. Consequently,
the purpose of this study is to establish conditions and circumstances under which the
cold start issue can be addressed through the type of approach presented in this study.
Through a design science research methodology, a software artefact named Adaptive
Serverless Invocation Predictor (ASIP) was developed to mitigate the cold start issue
through monitoring web application user traffic in real-time. Based on the user traffic,
ASIP preemptively pre-initialises serverless functions likely to be invoked, to avoid
cold start occurrences.
ASIP was tested against a realistic workload generated by test participants. Evaluation
of ASIP was performed through analysing the reduction in time delay achieved and
comparing this against existing cold start mitigation strategies. The results indicate
that predicting serverless function invocations based on real-time traffic analysis is a
viable approach, as a tangible reduction in response time was achieved.
Conclusively, the cold start mitigation strategy assessed and presented in this study
may not provide a sufficiently significant mitigation effect relative to the required
implementation effort and operational expenses. However, the study has generated
valuable insights regarding circumstantial factors concerning cold start mitigation.
Consequently, this study provides a proof of concept for a more sophisticated version
of the mitigation strategy developed in this study, with greater potential to provide a
significant delay reduction without requiring substantial computational resources.

Keywords: serverless functions, cold start, provisioned concurrency, adaptive scaling,


mitigation, predicting function invocations, monitoring real-time user navigation

3
Table of content
Acknowledgements 2
Abstract 3
Table of content 4
1 Introduction 6
1.1 PROBLEM STATEMENT 6
1.2 PURPOSE AND RESEARCH QUESTIONS 9
1.3 SCOPE AND LIMITATIONS 10
1.4 DISPOSITION 11
2 Method and implementation 13
2.1 DESIGNING THE ADAPTIVE SERVERLESS INVOCATION PREDICTOR 14
2.2 DATA COLLECTION 20
2.2.1 RECORDING USER TRAFFIC 23
2.2.2 GENERATING A NAVIGATION PROBABILITY MATRIX 28
2.2.3 GENERATING A FUNCTION-TRIGGER MAP 30
2.2.4 REPLAYING USER TRAFFIC FOR EVALUATION 30
2.2.5 COLLECTING PERFORMANCE METRICS 32
2.3 DATA ANALYSIS 34
2.4 VALIDITY AND RELIABILITY 37
2.5 CONSIDERATIONS 39
3 Theoretical framework 40
3.1 SERVERLESS COMPUTING 40
3.2 COLD START PHENOMENON 40
3.3 DOHERTY THRESHOLD 40
3.4 PROVISIONED CONCURRENCY 41
3.5 EXISTING LITERATURE ON COLD START MITIGATION STRATEGIES 42
3.6 PROVISIONED CONCURRENCY AS AN ALTERNATIVE APPROACH 42
3.7 LIMITATIONS OF PROVISIONED CONCURRENCY 42
4 Results 43
4.1 PRESENTATION OF COLLECTED DATA 43
4.2 DATA ANALYSIS 45
5 Discussion 49
5.1 RESULT DISCUSSION 49
5.2 METHOD DISCUSSION 54
6 Conclusions and further research 58
6.1 CONCLUSIONS 58
6.1.1 PRACTICAL IMPLICATIONS 58
6.1.2 SCIENTIFIC IMPLICATION 59
6.2 FURTHER RESEARCH 60

4
7 References 62
8 Appendices 65
APPENDIX A: INFORMATION PROVIDED TO STUDY PARTICIPANTS 66
APPENDIX B: PERFORMANCE METRICS FROM TRAFFIC REPLAYING 67
APPENDIX C: DIFFERENCE IN RESPONSE TIME BETWEEN FUNCTIONS 70
APPENDIX D: ADDITIONAL TRAFFIC REPLAYING TO ENSURE RELIABILITY 75

5
1 Introduction
This chapter provides background and motivation for the research area which the
study is addressing. Furthermore, the purpose and research questions are presented.
Finally, the scope and limitations are outlined in addition to the disposition for the
report.

The utilisation of serverless functions has emerged as a notable trend in the software
engineering industry. Serverless functions provide developers with the ability to write
code without concern for the configuration of underlying infrastructure. Typically,
serverless functions are endpoints within an Application Programming Interface
(API) that can receive Hypertext Transfer Protocol (HTTP) requests containing data
for processing. The serverless function code is abstracted from the operational
infrastructure and software logic is fractionated into multiple small functions. As a
result, serverless functions are well-suited for backends that support interactive
frontend single-page web applications (Ivan et al., 2019).

While the serverless functions term may imply an absence of a server, this is not
entirely accurate. Instead, the serverless term denotes the fact that there is no server
constantly running. A virtual server is allocated only when needed, and when the
demand subsides, the server is deallocated (Ivan et al., 2019). This characteristic
distinguishes serverless computing from traditional, constantly-running server
configurations.

Due to technical resource limitations, serverless functions may not be a viable option
in all scenarios. Serverless functions suffer from a problem commonly referred to as
cold start which describes a delay before the function invocation. The duration of the
delay can span multiple seconds, rendering a serverless architecture unviable for
latency-critical web applications. The delay is caused by the fact that serverless
functions lack continuous access to computational resources. This prompts the need
for a virtual container instance to load and initialise necessary software dependencies
before the serverless function can be invoked (Vahidinia et al., 2022).

This study proposes a novel solution to the cold start problem. The solution relies on
the possibility to predict serverless function invocations based on the current
distribution of users across a web application.

1.1 Problem statement


This section provides a detailed description of the research area subject to exploration
in this study. The section accounts for existing literature within the relevant field and
highlights the knowledge gap addressed by the study.

6
Software development has traditionally involved business logic as a core part of the
software, where the application has typically been architectured as a monolith.
However, as cloud-based services have increased in popularity, Function-as-a-Service
(FaaS) has emerged as a new paradigm where event-driven business logic is invoked
in an isolated virtual environment (Vahidinia et al., 2022).

Serverless function platforms are offered by multiple cloud service vendors such as
Amazon Web Services (AWS), Microsoft Azure and Google Cloud. Computational
resources are scaled in accordance with demand, allowing developers to concentrate
on software engineering while leaving resource scaling to the cloud provider
(Vahidinia et al., 2022).

Serverless functions are not without disadvantages, perhaps the most prevalent issue
related to serverless functions is the cold start problem. Cold start refers to the time it
takes to initialise a computational resource prior to the invocation of a serverless
function (Li et al., 2021).

Computational resources are allocated and maintained in memory based on the


function invocation frequency. If a serverless function does not receive any requests
within a certain time duration, the computational resource instance and associated
memory are deallocated. This causes software with a serverless architecture to exhibit
a cold start delay when a request is sent to a serverless function that has not been
invoked within a certain time duration. This can significantly impact the user
experience of an interactive web application, as the delay affects the response time of
the function (Ivan et al., 2019).

Furthermore, when a serverless function receives multiple requests simultaneously,


multiple computational resource instances are needed to distribute the load, as each
instance can only invoke one function at a time. This results in the recurrence of cold
start delays for each instance during initialization, impacting multiple users of a web
application (Ivan et al., 2019).

Within the User Experience (UX) domain, the Doherty threshold, introduced by
Doherty and Thadani (1982), stands as a prominent theory. This principle states that
productivity is dramatically increased when the waiting time between a user’s action
and the subsequent response is less than 400 milliseconds. Maintaining a delay below
the Doherty threshold is beneficial to user attention, and by extension, enhances the
overall user experience of an interactive system.

Pursuant to the Doherty threshold, minimising the occurrence of cold starts can
positively affect the user experience of a web application by decreasing the response
time. This would also result in improved productivity for web applications used for

7
eg., administrative tasks. A cold start can typically result in a response time of
approximately five seconds, however, delays in response time up to 40 seconds are
also occurring under some circumstances (Lin & Glikson, 2019). Therefore, it is
valuable to explore possibilities to reduce the occurrence of cold starts.

A previously presented solution has merged two serverless functions together which
eliminates the cold start for one of the two functions. This solution contains
drawbacks related to efficiency since the merged functions run sequentially, causing
an increase in overall execution time (Lee et al., 2021).

Another approach presented by Vahidinia et al (2022) is based on a learning


algorithm that initiates computational resources based on usage patterns over time.
This implies that future demand for a particular serverless function can be predicted
based on historical data about previous demand. This approach has shown a reduction
in memory consumption by 12.73% and cold start delay by 22.65%.

Conclusively, existing research on reducing cold start occurrences in serverless


functions predominantly relies on predictive algorithms analysing historical user
behaviour data (Vahidinia et al., 2022). While these models have shown some
success, they are inherently limited by the intrinsic complexity and unpredictability
of human behaviour.

The prediction of user behaviour is a challenging task due to the dynamic and
non-linear nature of human behaviour. Human actions are influenced by a wide range
of factors, from individual characteristics, preferences, and past experiences to
environmental factors and interactions with others (Ajzen, 1991). These complex
influences make it difficult to accurately predict future actions based on past
behaviour.

Furthermore, the use of historical data to predict future behaviour assumes that past
trends will continue into the future. However, this may not always be the case.
Changes in external circumstances, user needs, or application functionality can lead
to changes in user behaviour that are not reflected in historical data.

Another challenge is the temporal dimension of user behaviour. User behaviour can
vary significantly at different times of day, days of the week, or seasons of the year. A
model that does not take these temporal variations into account may overestimate or
underestimate demand at certain times, leading to unnecessary cold starts or
inefficient use of resources (Ajzen, 1991).

Therefore, this study introduces a novel approach to reducing the frequency of cold
start occurrences in connection to interactive web applications. In contrast to previous
research regarding cold starts, this study attempts to reduce cold starts occurrences
8
through predicting computational resource demand based on the number of users
currently present at each page in a web application. Consequently, the primary
difference between previous approaches and this study is that it is centred around
predicting function invocations based on real-time user traffic whereas previous
approaches base their predictions on historical user traffic data.

1.2 Purpose and research questions


This section presents the purpose and the research questions defining the foundation
of the study. The purpose is defined with regards to the problem statement, and in
combination with the research questions, guides the subsequent choice of methods.

The purpose of this study is to establish under which circumstances it is possible to


mitigate serverless cold starts through predicting future function invocations.
Previous research has attempted various solutions, but none has completely addressed
the issue, with the existing solutions providing only partial reductions in time delay
that may not be noticeable to web application users.

An approach that has been researched involves performing application-level


optimization, where only the minimally necessary code is loaded. This solution has
resulted in an average reduction in cold start latency of 28.78%, indicating that the
cold start problem still persists (Wen et al., 2022).

A journal paper by Jegannathan et al. (2022) attempts to reduce cold starts through a
time series forecasting approach called the Seasonal Auto Regressive Integrated
Moving Average (SARIMA) model. The model is used to predict demand for
serverless functions and scale computational resources accordingly based on
historical data that is expected to repeat through an identifiable pattern.

An existing solution which may have come close to eliminating the cold start issue
comes from a study regarding the possibility to share computational resources. It is
claimed that many serverless functions require the same dependencies which makes it
possible to invoke different serverless functions in the same container. In the cases
where it is possible to apply this solution, the cold start delay may be reduced to 10
milliseconds, effectively eliminating the cold start problem. A drawback to this
approach is the fact that it is limited to contexts where different serverless functions
share the same dependencies, which is not always the case (Li et al., 2021).

Drawing on the problem statement, it is evident that the cold start issue remains as a
topic being actively researched. Furthermore, it is evident that none of the
above-mentioned previous research has completely addressed the cold start issue.
Consequently, the purpose of this study is to solve the issue through a novel approach

9
– predicting serverless function invocations and pre-initializing necessary resources
through monitoring real-time user traffic to an interactive serverless web application.

Unlike previous research, the solution presented in this study does not solely focus on
the technical implementation of serverless platform technology, but rather
incorporates real-time user monitoring as a means of mitigating cold starts. The study
aims to provide prescriptive conclusions regarding the selection of cold start
mitigation strategies based on relevant circumstantial factors and hence, the first
research question is:

[RQ1]: Under which circumstances can serverless function invocations be predicted


based on real-time user distribution across a web application?

The focus of this study is the development and evaluation of a software artefact
called Adaptive Serverless Invocation Predictor (ASIP) that aims to reduce the cold
start for serverless functions. It is compared to mitigation strategies from existing
research and hence, the study’s second research question is:

[RQ2]: Which cold start mitigation strategy provides the most significant delay
reduction?

1.3 Scope and limitations


This section defines aspects and considerations entailed in the study, while also
describing aspects that have been excluded from the study.
The scope of this study is limited to investigating cold starts in connection with
interactive frontend web applications, as this is a context where cold starts may result
in noticeable delays for users. Furthermore, this study focuses on reducing cold start
occurrences in serverless platforms offered by cloud providers, and does not
investigate open-source serverless frameworks for on-premises operations.
In this research, the extent of function invocations is constrained by budgetary
limitations since large volumes of serverless function invocations incurs monetary
expenses. In line with this, the ability to experiment with various provisioned
concurrency configurations is also restrained, as such explorations directly result in
additional expenses. As a result, the provisioned concurrency value for any individual
serverless function is never set above one.
Nevertheless, this constraint does not notably hinder the effectiveness of the proposed
software artefact, ASIP, which still achieves its intended purpose. However, it is worth
noting that with an increased allocation of resources for this research, further value
could be added to the study. This could be accomplished through more extensive
experimentation with different provisioned concurrency configurations, thereby

10
potentially unveiling additional insights into optimising serverless function
invocations. Thus, while the current study effectively achieves its aims within the
defined constraints, future work with augmented resources could further expand on
these findings.
In this study, a matrix describing navigation probability in Figure 1 is constructed
using data from Google Analytics. However, this matrix, once established, remains
static and does not adapt or update in response to new user navigation data. To
continuously update this matrix, a dedicated software algorithm would likely be
necessary, one capable of computing the probabilities for function invocations based
on a more detailed analysis of user navigation patterns. This potential enhancement,
however, extends beyond the scope of the current study and is designated to future
exploration.
The proposed software artefact, ASIP, is not intended to be applied to web
applications with simple page hierarchies as illustrated in Figure 1. Instead, ASIP is
intended to be implemented in connection with web applications consisting of
significantly more complex page hierarchies. Such web applications may provide the
possibility to determine that navigation between certain pages is more or less
probable, allowing for more accurate prediction of function invocations.
Overall, this study is subject to limitations in terms of the context, quantity of function
invocations, serverless platforms, and complexity of web applications, which should
be considered when interpreting the results and conclusions of the research.

1.4 Disposition
This section describes the structure of the report and summarises the content in the
respective chapters which follow.
2. Method and implementation.
This chapter describes the scientific methodology and approach used to
conduct the study. Furthermore, the chapter specifies the process through
which data has been collected and analysed to answer the research questions.
3. Theoretical framework.
This chapter presents theories relevant to the study based on existing literature
within the concerned research field. Terms and concepts used throughout the
report are introduced and defined.
4. Results.
The results chapter contains an analysis of data collected during the study.
Additionally, this chapter includes a presentation of the analysed data which
serves to answer the posed research questions.

11
5. Discussion.
In this chapter, discussions regarding the results of the study are recited.
Reflections are made with consideration to previous research within the field.
6. Conclusions and further research.
This chapter presents conclusions drawn from the conducted study in
combination with remarks regarding potential future research in connection to
the study.

12
2 Method and implementation
This chapter describes the procedures used to develop and evaluate the study’s
proposed software artefact called Adaptive Serverless Invocation Predictor (ASIP)
with the purpose of addressing the research questions. Additionally, this chapter
includes ethical considerations and discussions regarding the validity of the
conducted research.

Since the research gap was identified through a literature search, a deductive research
method was employed. In addition, a quantitative research approach was chosen, as
the research revolves around reducing a measurable time delay.

To generate empirical data and evaluate the effectiveness of the proposed solution,
ASIP was developed using a design science methodology. Since the elected research
method resulted in the acquisition of quantitative data, a quantitative analysis method
was utilised.

13
2.1 Designing the Adaptive Serverless Invocation Predictor

This section describes how ASIP was designed and implemented. The architectural
context of ASIP is illustrated in Figure 1.

Figure 1
Architectural overview of ASIP

14
The following list describes the enumerated sequential steps depicted in Figure 1:
1. Users navigate through the web application frontend. Page B currently has
some traffic while Page A has no traffic.

2. The web application frontend stores the number of users per page in a Firebase
Realtime Database.

3. A scheduling mechanism regularly invokes the ASIP function at a predefined


interval of 15 seconds.

4. The ASIP function receives data from the Firebase Realtime Database
concerning the number of users currently present at each page.

5. ASIP interprets a matrix containing data indicating the probability for a


navigation from page A to page B when a user is currently present on page A.

6. ASIP retrieves a JavaScript Object Notation (JSON) document, which outlines


the functions invoked in response to user navigation to each page within the
web application.

7. ASIP configures the number of concurrently active instances for each function
based on the navigation probability data from the matrix.

Figure 1 illustrates a trivial web application where one page currently has multiple
users visiting and another page currently has no user traffic. In this case, ASIP
identifies that the serverless functions potentially triggered through the page with no
visitors do not need to be pre-initialised since there is a minimal probability that these
functions will be imminently invoked. As a result of the computation by ASIP, the
concurrency for functions unlikely to be invoked is set to a value of 0, while the
concurrency for functions likely to be invoked is set to 1, as higher concurrency is
necessary only when the same function is invoked multiple times simultaneously.
ASIP was constructed as a serverless function to align with the serverless concept,
where the absence of a constantly active server is considered a benefit. Through this
approach it was intended that virtually no extra computational resources would be
required and consequently no maintenance for an additional server would incur. Since
the ASIP function was scheduled to run constantly, it was not liable to cold starts such
as a typical serverless function.
However, after implementing ASIP as a serverless function, it became evident that a
more suitable method would involve hosting ASIP on a separate virtual server
instance in terms of cost efficiency and reliability. Since the ASIP serverless function
requires continuous operation, it contradicts the intended purpose and negates the
benefits of serverless functions. A more cost-effective alternative would be to deploy
15
ASIP on a virtual machine instance, such as an AWS EC2 instance in this context.
This approach would be appropriate given that ASIP does not perform
resource-intensive tasks, yet necessitates constant execution.
A scheduling mechanism was configured to regularly invoke the ASIP function at a
15 second interval. This interval was selected after an analysis of existing
function-scaling solutions and their loop intervals, with a particular focus on an
auto-scaling strategy found in Kubeless – a serverless platform technology tailored for
the Kubernetes container orchestration tool. This solution analyses traffic metrics
every 15 seconds to predict future demand for serverless functions, such as the
solution presented in this study. Research conducted by Agarwal et al. (2021)
demonstrated that Kubeless' scaling solution effectively reduced the occurrence of
cold starts, indicating that a 15 second interval may be suitable for a comparable
scaling algorithm, such as ASIP.
The choice of a 15 second loop interval was deemed essential for ASIP to promptly
pre-initialize functions in response to changes in user navigation. The interval was not
set to an even shorter duration for two primary reasons: to maintain consistency with
the Kubeless strategy and to account for the time required for resource allocation in
response to modifications to a function’s provisioned concurrency configuration. A
shorter interval might have led to complications, as the provisioned concurrency
configuration could be altered too frequently, preventing instance scaling from
completing in the allotted time.
A loop interval longer than 15 seconds may have delayed ASIP’s response to changes
in user navigation, potentially leading to suboptimal resource allocation and increased
cold start occurrences. Furthermore, a longer interval might have resulted in less
accurate predictions of future demand, as ASIP’s algorithm would be invoked less
frequently and therefore account for fewer navigations taking place between the
invocation interval.
In this study, ASIP employs a mathematical formula to compute the probability that at
least one user on Page A will navigate to Page B. This formula is necessary because
ASIP must account for an increased navigation probability when there are multiple
users on Page A. This is because the values in the function probability matrix,
depicted in Figure 1, do not account for the possibility of more than one user. The
formula is as follows:
𝑋
𝑃(𝑋) = 1 − (1 − 𝑌) (1)
Here, P(X) denotes the probability that at least one user navigates to Page B when
there are X users on Page A. The variable Y represents the probability of a single user

16
navigating to Page B. The value of Y is directly obtained from the navigation
probability matrix.
The formula in (1) is based on the complementary probability rule. This rule states
that the probability of an event occurring equals one subtracted by the probability of
the event not occurring (Blitzstein & Hwang, 2019).

In the formula (1), (1 − 𝑌) is the probability for a single user not to navigate from
𝑋
page A to page B. (1 − 𝑌) represents the probability that none of the 𝑋 users on page
A will navigate to Page B. The complementary probability rule is expressed as
𝐶 𝐶
𝑃(𝑋 ) = 1 − 𝑃(𝑋) where 𝑋 is the complement of 𝑋 (Blitzstein & Hwang, 2019).

The variable 𝑌 in (1) corresponds to the variable 𝐶 and as such this formula can be
𝑋
rearranged into 𝑃(𝑋) = 1 − (1 − 𝑌) which is the formula depicted in (1).
Rearranging the formula in such a manner enables the variables to be populated and
the resulting computation can be used as the navigation probability.
The formula in (1) is utilised because it enables the calculation of the probability that
an event occurs at least once during a series of attempts, given a known probability
for the event to occur in a single attempt. Consequently, it can be used by ASIP to
calculate the probability for at least one of any number of users on Page A navigating
to page B. The formula in (1) is demonstrated in a graph with two different values for
𝑌 in Figure 2.

17
Figure 2
Example of navigation probability in relation to user count

A weakness with the approach of computing the navigation probability using formula
(1) based on the navigation probability matrix is that the time is not considered, the
time dimension is excluded since the data and formula does not account for the fact
that the user may navigate after 1 second or after 1 hour, this type of difference is not
visible in the data retrieved from Google Analytics. Including the time dimension
would make it possible to exclude inactive users to provide more accurate predictions.
However, ASIP is not reliant on taking the time dimension into account as it is
designed to make predictions based on other dimensions.
An integral factor in ASIP’s algorithm was the invocation probability threshold,
determining whether provisioned concurrency should be enabled or disabled for a
function. If the computed invocation probability exceeded the threshold, provisioned
concurrency would be enabled; otherwise, it would be disabled. Ideally, the threshold
should have been continuously adjusted based on real-time user traffic analysis to
ensure an optimal level. However, this possibility was not explored during the study,
and as a result, the threshold was configured at a fixed interval. Nevertheless, to
provide valuable insights into the impact of different thresholds, benchmarks were
conducted for various threshold values, and the results are included in this report.

18
By continuously adapting the threshold to match the current user traffic patterns, the
system could have optimised the allocation of provisioned concurrency resources.
This dynamic adjustment would have allowed for better utilisation of available
resources during periods of high demand and potentially reduced costs during periods
of low demand.
Future research could explore the implementation of an algorithm that automatically
adjusts the invocation probability threshold based on real-time traffic analysis. This
approach would enable the ASIP system to continuously optimise the utilisation of
provisioned concurrency, leading to enhanced performance and resource allocation.
In conclusion, although the investigation did not incorporate dynamic threshold
adjustment, the performed benchmarks shed light on the influence of different
thresholds. The potential benefits of continuously adapting the threshold to match
real-time user traffic patterns highlight an area for further exploration and
improvement in ASIP's algorithm.
In order to utilise the provisioned concurrency feature on AWS, it became necessary
to contact AWS support to increase account limits. This was because an AWS account
does not normally have a quota which allows for any utilisation of the provisioned
concurrency feature. As such, to increase this quota, a support ticket needed to be
created for this issue, however, in this study, the matter was resolved urgently,
allowing the researchers to alter provisioned concurrency configurations.
An example of a web application to which ASIP was considered applicable was an
e-commerce web application due to its complex page hierarchy tree. ASIP was
believed to induce the desired cold start mitigation based on the hypothesis that a user
currently navigating a branch in a page hierarchy tree is unlikely to instantaneously
access a page deep within an unrelated branch in the page tree. The basis for this
assumption is an improbability for the user to navigate between certain pages due to
the design of the web application.
The nomenclature of the software artefact, Adaptive Serverless Invocation Predictor
(ASIP), is derived from its key functionalities and intended purpose. ASIP's primary
task is to forecast future invocations of serverless functions, leveraging real-time
navigation data for this prediction process. Furthermore, ASIP incorporates an
adaptive scaling algorithm which dynamically adapts the provisioned concurrency
configuration for serverless functions in accordance with an invocation probability
calculation, which is based on real-time user traffic analysis. Therefore, the moniker
'ASIP' succinctly encapsulates its combined features of adaptability and prediction in
a serverless computing environment.

19
2.2 Data collection
This chapter describes the various data collection methods used to generate
quantitative data specifically tailored to address the research questions in a scientific
and systematic manner.
In more precise terms, this chapter outlines the steps taken to test and evaluate ASIP,
as well as to compare it against existing solutions. To assess ASIP with an appropriate
sample size of user traffic, traffic data was collected with the assistance of study
participants and an open-source web application.
To collect the traffic, a proxy server with an access log was configured to register all
HTTP requests from the participants while a script continuously recorded the
real-time user navigation data. Simultaneously, Google Analytics was monitoring the
web application to form a basis for the navigation probability matrix.
Following this, the function-trigger map from Figure 1 was manually created to detail
which functions were initiated from specific pages. After completing these initial data
collection procedures, the recorded user traffic was replayed to assess ASIP's
performance under realistic workloads and gather relevant performance metrics. The
methodologies for these data collection processes are explored in greater depth in the
respective sections of this chapter. An overview of relationships between the
components and procedures involved in the data collection is visible in Figure 3.

20
Figure 3
Relationships between data collection procedures and components

While Figure 3 illustrates the procedures and components required to record and
replay traffic to evaluate ASIP, ASIP itself is not dependent on all of these procedures
when operating normally and not being tested. Consequently, to distinguish and
clarify the difference between the procedures required for testing and the procedures
required for ASIP to operate, ASIP’s dependencies are illustrated in Figure 4. As
such, a comparison between Figure 3 and Figure 4 provides insight into the purpose
of the various data collection methods described throughout this chapter.

21
Figure 4
An overview of ASIP’s operational prerequisites

The section below outlines which data collection methods have been chosen to answer
the study’s respective research questions.

[RQ1]: Under which circumstances can serverless function invocations be predicted


based on real-time user distribution across a web application?
This research question is answered through a design science research approach and
through developing the ASIP software artefact to address this research question. This
method choice is grounded in the work of Juhani Iivari (2010). Iivari posits that
design science research in information systems aims to produce practical, innovative,
and effective solutions to real-world problems through design and evaluation of
software artefacts.
By developing a software artefact, this study seeks to contribute to both the theoretical
understanding and practical application of predicting serverless function invocations
based on real-time traffic data. This approach allows for the examination of the
circumstances under which such predictions are feasible and accurate, while also
22
generating knowledge that can be used within the industry to reduce cold start
occurrences.
Following Iivari's recommendations, the artefact development and evaluation process
was guided by systematic and transparent methods, ensuring that the findings were
both reliable and valid, and ultimately contributing to the broader knowledge base of
design science research in information systems.

[RQ2]: Which cold start mitigation strategy provides the most significant delay
reduction?
To address this research question, a method involving evaluation and comparison of
ASIP to existing cold start mitigation strategies was employed. This approach was
motivated by the need to assess the relative effectiveness of the proposed solution in
comparison to existing practices in the field (Kitchenham et al., 2004).
By directly comparing the performance of the developed artefact with established cold
start mitigation strategies, the study not only identifies the most effective solution but
also highlights the strengths and weaknesses of various approaches. This comparative
evaluation allows for a comprehensive understanding of the potential impact of
adopting the proposed solution in real-world settings, and it informs decision-makers
about the practical implications of implementing various strategies (Kitchenham et al.,
2004). By employing a scientific and systematic evaluation method, the findings of
this study contribute to the advancement of knowledge on cold start mitigation and
provide valuable insights for both researchers and practitioners in the domain of
serverless computing.

2.2.1 Recording user traffic

To evaluate ASIP against a realistic workload, user traffic data was recorded in
advance. This recorded traffic data was not required by ASIP, however necessary to
evaluate it. In preparation for the recording, an existing open-source web application
was converted to a serverless architecture.
User traffic was recorded at various times and locations, allowing test participants to
take part in the study without needing to be present simultaneously. This approach
enabled the acquisition of a larger sample size than what would have been achievable
if all participants were required to be present at the same time and location.
A total of 15 individuals were invited to perform certain tasks in the web application
chosen for the data collection. The application was a pharmacy management system,
therefore, a roleplay scenario where test participants were assigned various roles was
performed. The web application had three logical roles: cashier, pharmacist, and

23
assistant pharmacist. To ensure even distribution, each role was allocated to five test
participants.
Some test participants were located remotely while others were physically present
alongside one of the two researchers. Test participants used their own computer to
access the web application while a real-time data collection script was running on the
researcher’s computer.
The varied locations may have influenced the recorded data, as network latency and
bandwidth can differ between networks. However, this variability can also be viewed
as advantageous, since web applications are typically accessed by users from diverse
networks and locations.
Test participants were provided a document written in Swedish to sign before
participating in the study (see Appendix A). The document declares the purpose of the
study as well as the collection and processing of data. Furthermore, the document
underlines the fact that after signing the document, the test participant can at any time
during the test withdraw their consent to the data collection and be excluded from the
study.
A script was written in Python and configured to run continuously throughout the
traffic recording session. The script periodically fetched current data from Google
Analytics Realtime API at an interval of five seconds. This interval was selected with
respect to the 15 second interval at which ASIP was scheduled for invocation. With
real-time data updated every five seconds, ASIP would consequently always have
access to data that had been updated since the preceding invocation as the real-time
data was updated more frequently than ASIP was invoked.
In retrospect, a real-time data update interval shorter than five seconds could have
yielded more reliable results since the replaying of user traffic would have
corresponded more closely to the normal circumstances of the ASIP function. This is
because ASIP usually fetches real-time data directly from Firebase Realtime
Database, which is updated more frequently than every five seconds.
Conversely, a longer interval for real-time data updates would result in ASIP
receiving less frequent updates, which may impact the accuracy and responsiveness of
the scaling algorithm when replaying requests. This could lead to suboptimal resource
allocation and slower response times, diminishing the overall effectiveness of the
scaling solution.
Initially, Google Analytics was intended to be utilised for two different purposes in
this study: 1) to provide a foundation for the navigation probability matrix in Figure 1,
and 2) monitoring real-time user traffic with Google Analytics Realtime feature.

24
A preliminary user traffic recording test was conducted, with one of the researchers
acting as a test participant. This was done to ensure the data collection setup
functioned as intended and the collected data was successfully obtained.
Upon analysing the data obtained from Google Analytics Realtime during the
preliminary test recording it was observed that the data did not describe real-time user
quantities and instead described an accumulated user quantity over the most recent 30
minutes. Further inspection revealed that all the data provided by Google Analytics
Realtime described accumulated data over the most recent 30 minutes.
A custom solution was developed as a more suitable alternative to Google Analytics
Realtime. The custom solution utilised Google Firebase Realtime Database to track
the number of users currently present on each page of the web application.
Modifications were made to the frontend of the web application, such that whenever a
user navigates to a new page, the user count is decreased by 1 for the current page and
increased by 1 for the destination page. This resulted in significantly more accurate
real-time data since it was updated directly in connection to every user navigation.
Due to the enhanced accuracy provided by the custom solution, it was employed for
all user traffic recording. The components involved in recording user traffic data are
illustrated in Figure 5.

25
Figure 5
User traffic recording components

All HTTP requests sent to the web application during user traffic collection were
captured in an HTTP access log. An AWS Elastic Compute Cloud (EC2) instance
hosted an Nginx web server configured as a proxy, which forwarded incoming HTTP
requests to the serverless web application backend API. The purpose of the access log
was to facilitate replaying of all requests at a later time.
To provide closer resemblance to a real use case scenario, each user recording was
divided into two sessions with a short pause in between. This pause can for example

26
be considered equivalent to a coffee break, commonly occurring within the industry
and resulting in a temporary reduction in frequency of requests sent from a user to a
web application.
Figure 6 provides a visual depiction of the procedure for a single session through a
Unified Modeling Language (UML) sequence diagram. Each session lasted for an
approximate duration of ten minutes.

Figure 6
Sequential process for a user traffic data collection session

As a result of the user traffic recording taking place at different occasions, multiple
access logs and real-time data text files were generated, which were subsequently
merged together as if they were from a single recording session. To later replay the
traffic data, the timestamps in the collected data were adjusted with an offset so that
the first timestamp from each recording became identical.

27
2.2.2 Generating a navigation probability matrix

Half of the recorded user traffic data was used as the basis for the navigation
probability matrix depicted in Figure 1. The remaining data was later used for replay
to assess ASIP's performance. This separation was essential to ensure reliable results,
as ASIP needed to be tested with traffic data distinct from the data employed in
constructing the matrix. Such a precaution was necessary, given ASIP's reliance on
mitigating cold starts using real-time data distinct from the data utilised in matrix
construction. Subsequently, this approach enabled ASIP to be tested solely with traffic
data from test participants whose data was not included in the navigation probability
matrix.
The data separation process rendered the initial half of the recorded data superfluous,
as it was not utilised in the analysis. This occurred because the data was exclusively
employed for constructing the matrix, with Google Analytics managing that data
collection in the background. This process was therefore distinct from the data
collection script and manual access log developed specifically for recording purposes.
Although the separation of recorded traffic had not been originally planned, the
extraneous data collection did not pose any concern. However, the reduction of the
sample size by half did present a challenge in the analysis.
Upon completing half of the traffic recording, a Google Analytics report was
retrieved. This report contained data describing study participants’ path exploration
throughout the web application. More specifically, the data set contained information
regarding the number of visits to each of the web application’s pages in combination
with a list of referring pages from which the visits originated. In a more succinct
manner, the data set described the number of users that had navigated from a specific
page to another (Analytics Help, 2023). An example of the path exploration data set is
visible in Table 1.

28
Table 1
A trivialised version of a Google Analytics path exploration report

Page location / /inventory /login Sum


Page referrer Views Views Views Views
/ 0 25 59 84
/inventory 6 0 7 13
/login 44 1 0 45

The data set obtained from Google Analytics was used as the foundation for
constructing the matrix in Figure 1 since it contained the information necessary to
determine the probability of a serverless function invocation in relation to a web page
visit. The interval for this data set was limited to the interval during which the first
half of user traffic recording in this study took place.
In the process of transforming the Google Analytics path exploration report into a
navigation probability matrix, the matrix illustrated in Table 1 was transformed into
the matrix presented in Table 2. Each cell's value was calculated using the subsequent
formula: The probability of navigating from the referring page A to the destination
page B is equal to the number of referrals from page A to page B, divided by the total
number of referrals originating from page A.
Noteworthy is that the values in Table 2 may appear skewed and unrealistic due to the
downscaled nature of the example. The original dataset, which encompasses all pages
and accurate values, was not utilised for the purpose of illustration in this report.

Table 2
A path exploration report converted to a navigation probability matrix

Page location / /inventory /login


Page referrer Probability Probability Probability
/ 0,0000 0,2976 0,7024
/inventory 0,4615 0,0000 0,5385
/login 0,9778 0,0222 0,0000

29
While Google Analytics was utilised for data collection in this study due to its
widespread popularity, multiple alternative web analytics tools with equivalent
features are available (Bekavac & Garbin Praničević, 2015). Considering the
popularity of Google Analytics, companies and organisations that can potentially
benefit from the findings of this study may already be using the tool and consequently
possess the required data.

2.2.3 Generating a function-trigger map

As detailed in Figure 1, ASIP requires a function-trigger map, necessitating certain


data collection. As the map should contain information about which serverless
functions are invoked by each respective page in the web application, these
connections had to be established.
To generate the map, a browser was used to visit each page of the web application
individually while monitoring the HTTP requests sent by the browser. This approach
was employed because the serverless function’s identifying name can be derived from
the Unified Resource Identifier (URI) in the request sent to trigger the function. This
solution was especially convenient since the web application backend was constructed
using an Infrastructure as Code (IaC) solution where the URI and name of each
serverless function were defined together in a single document.

2.2.4 Replaying user traffic for evaluation

Following the completion of the implementation of ASIP along with all of its
dependencies, half of the recorded user traffic, reserved only for replay, was used to
evaluate ASIP.
A critical consideration was to ensure that the recorded real-time user data could be
replayed in synchronisation with the HTTP access log, in order to provide ASIP with
the same data that is typically available to it. Accordingly, ASIP was developed with
modular functionality that allows for seamless switching between the use of real-time
data and pre-recorded data, as required.
To evaluate ASIP, it was essential for ASIP to have the capability to base its
calculations on the pre-recorded user traffic data. In line with established best
practices in the software engineering industry, the logic governing whether ASIP
should utilise real-time data or pre-recorded data was abstracted to a separate script
running on a dedicated server, as depicted in Figure 7. The external server was only
utilised for testing ASIP and correspondingly not required for ASIP to function.
A script was written in Python and developed specifically for replaying requests in
this study. Noteworthy regarding this script is that it was designed to send requests
asynchronously which was significant to the script’s ability to replay requests at
precisely the same frequency they were originally sent. This meant that the script
30
execution was not obstructed by the delay between each request and belonging
response. The previously collected access log included timestamps which allowed the
script to compute the time interval between each request and its succeeding request.
Based on this information, the script was able to wait for the required duration before
sending the succeeding request.
Prior to replaying user traffic, the collected data was modified to simulate a higher
number of simultaneous users, to obtain more reliable results for the study. For
instance, all the requests in the access log were multiplied by 10, resulting in the
simulation of 10 users performing the same actions as a single user from the recorded
sessions. To introduce further variation, unique delays were added to the beginning of
the simulated user sessions. These modifications were made to create a more realistic
load on the system and mimic varying user behaviours, which could help in
evaluating the performance and scalability of the system under different conditions.
ASIP was designed to always attempt to fetch real-time data from the traffic replayer
script first. If the traffic replayer does not respond with pre-recorded real-time data, it
serves as an indication that traffic is not currently being replayed. In that case ASIP
fetches data directly from Firebase Realtime Database instead.
The external server depicted in Figure 7 was constituted by an AWS EC2 instance on
the same AWS region as the web application in order to minimise potential network
latency and fluctuations skewing the collected data values.

31
Figure 7
Components involved in user traffic replaying

2.2.5 Collecting performance metrics

During the replaying of user traffic, various metrics were collected to assess the
efficiency of ASIP. This chapter outlines the considered metrics and the methods
employed to obtain them.
AWS, the cloud provider used in this study, offers certain metrics and reports for
serverless function invocations. However, at the time of the study, none of the
available metrics for serverless functions included response times or provisioned
concurrency duration, which were deemed critical for evaluating ASIP's effectiveness.
Consequently, to obtain an average response time for all requests, it was determined
that customising the traffic replayer script illustrated in Figure 7 would be more
suitable as it provided complete control over data collection. With this approach, the

32
response time for each replayed request was recorded, and the total number of sent
requests was stored to facilitate the calculation of average response time later on.
Timers were initiated for each HTTP request originating from the access log and
halted upon receiving a response from ASIP. After a 25-minute duration, the replay
script was manually terminated. The 25-minute replay duration was chosen since each
individual traffic recording consisted of two ten-minute sessions with a five-minute
pause in between. Subsequently, the response times for every individual function were
documented in a file. Furthermore, an average response time was calculated for each
function, as well as an overall average for all functions. This enabled the identification
of potential anomalies by detecting any significant deviations in response time
between different functions.
The script was designed to automatically stop the recording after exactly 25 minutes,
ensuring that the same number of requests were sent during each replay. This was
important to acquire representative average response time metrics.
Since ASIP relies on using provisioned concurrency, which warms up a function in
advance, it inevitably leads to additional monetary expenses. Therefore, it was
essential to assess whether the additional cost is justified by the response time
reduction achieved by ASIP.
Since the additional monetary charge for provisioned concurrency is a direct result of
the time duration under which a function has had provisioned concurrency, this time
metric was collected. The duration of active provisioned concurrency was measured
using a combination of multiple AWS services, including S3, CloudTrail, and
Athena. First, an AWS S3 bucket was allocated, subsequently, CloudTrail was
configured to log all AWS account activity to files in the S3 bucket. These logs
included all events triggering changes to the provisioned concurrency configuration
for each function, documenting when ASIP set it to 1 and when the configuration was
removed, effectively resetting it to 0.
The time difference between the log events constituted the duration, in seconds, of
provisioned concurrency for the specific function. The sum of all such durations for
each distinct serverless function, throughout the 25-minute traffic replay timeframe,
represented the total time each function was allotted provisioned concurrency. It was
therefore possible to identify when provisioned concurrency had been allocated and
deallocated for a specific function by analysing these logs.
Finally, Athena was used to define a database table that mirrored the structure of the
event log data stored on the S3 bucket. With this setup, a Structured Query Language
(SQL) query was designed in Athena, with the specific aim of calculating the total
time span for which each function had provisioned concurrency activated during the
traffic replay. Consequently, the query's output was a list of all functions, each
33
accompanied by a duration (expressed in seconds), indicative of the period during
which provisioned concurrency was active.

2.3 Data analysis


This section outlines the data analysis methods employed in this study to answer the
research questions. Relevant scientific literature on research methods is used to justify
the chosen methodology. Additionally, this section describes the procedures used to
analyse the collected data to provide a comprehensive understanding of the process,
affecting the study’s validity and reliability.
The research methodology employed in this study involved the use of quantitative
analysis methods to generate data, as quantitative methods are well-suited for
measuring performance metrics such as response time (Creswell, 2014). User traffic
replaying was performed under various circumstantial factors, with a focus on
different provisioned concurrency configurations. To collect the data, three different
scenarios were tested:
1. Provisioned concurrency set to 0 for all functions, which represents the default
scenario.

2. Provisioned concurrency dynamically adjusted by ASIP, which demonstrates


the proposed mitigation strategy.

3. Provisioned concurrency set to 1 for all functions, which serves as a control


scenario for comparison.

These scenarios were chosen to enable a comprehensive evaluation of the ASIP


artefact compared to both a default scenario and a control scenario, enhancing the
overall validity of the study (Bryman, 2016).

The quantitative results obtained through replaying user traffic form the basis of the
data analysis in this study. As half of the recorded traffic was used for construction of
the probability matrix, only the remaining data was used for replay and analysis. This
resulted in a smaller sample size, however it was deemed a necessity to separate the
data used for matrix construction from the data used for replay to ensure reliable
results.
The average response time achieved by ASIP was compared to the default scenario
where the provisioned concurrency for all functions was set to 0. Through calculating
the percentage reduction in response time achieved by ASIP compared to the default
scenario, a direct comparison could be made with existing mitigation strategies for
evaluation.

34
In addition to measuring the average response, an integral aspect to consider was the
probability threshold required for ASIP to enable provisioned concurrency for a
function. If a function’s invocation probability is computed to a value above the
threshold, the function is pre-initialised through provisioned concurrency, otherwise,
the provisioned concurrency is removed. In this study, a fixed threshold was used
since the possibility to calculate an optimal threshold was not explored and reserved
for future work.
When benchmarking the average response time in three different scenarios as
previously described, the probability threshold was set to 40%. There was no
particular aspect suggesting that this precise threshold would be more suitable than
others. An area to explore could be the possibility to dynamically configure this
threshold according to data such as real-time user traffic, due to resource and time
limitations, this was left out of the study for future work.
Since there was no information suggesting which warm-up threshold would be the
most suitable, additional data collection took place. ASIP was tested with three
different thresholds, one for 20%, one for 40%, and one for 60%. This was considered
valuable since a comparison between different intervals could indicate how the
effectiveness of ASIP is affected by different thresholds. Such data collection would
also provide information useful in striking a balance between monetary expense and
efficiency in cold start reduction. It was also possible that certain thresholds would be
suitable for certain types of applications with a certain type of user traffic pattern, this
was however not possible to accurately assess as only a single web application was
used in combination with a single workload for evaluation.
The response time data collection was performed with a method resulting in data sets
where all response times were connected to the specific serverless function which
responded in that time. This was done in order to identify potential anomalies in the
data such as a specific function not working as intended and skewing the results,
however that could also be the case in a realistic scenario as many applications may
contain bugs causing abnormal response times. To be able to view response times per
function, the traffic replaying script was designed to categorise all recorded response
times by function name.
All recorded response times were stored in text files by the script replaying the
requests. Following this, the response times were imported into Microsoft Excel so
that various relevant statistical metrics could be computed and used for evaluation. In
addition to computing the average response time, the standard deviation in response
time was also computed. This was done to identify the impact which ASIP had on the
consistency and predictability of the response time as predictable response times are
typically desirable.

35
In addition to computing the standard deviation in response time, various percentiles
in response time were also computed. Percentiles were computed to assess the effect
of ASIP on response time percentiles as cold starts may be excluded in percentiles.
According to AWS pricing lists regarding provisioned concurrency at the time of
conducting the study, constantly having a provisioned concurrency of one instance for
a single function resulted in a monthly expense of $11,85 (Amazon Web Services,
2023b). Consequently, the cost associated with using ASIP could be calculated with
the help of the provisioned concurrency duration metric obtained from the traffic
replaying.
The monthly expense of $11,85 is a fixed cost and does not include additional
variable charges such as cost per request and execution duration. These variable costs
did however not need to be considered when evaluating ASIP since it has no impact
on these costs as the number of requests and function execution duration are not
influenced by ASIP.
The findings of the data analysis helped determine whether the reduction in overall
response time achieved by ASIP was statistically significant and if it made ASIP a
promising candidate for adoption within the industry. The threshold for significance
was set at a reduction in time greater than 20%, as previous solutions have achieved
similar reductions through methods that may be more eligible.
Potential limitations that may have influenced the results, such as sample size, data
quality, and external factors were also considered during the data analysis. This was
done to ensure the integrity and validity of the findings.
In conclusion, a quantitative analysis method was used to analyse the data collected
under different circumstantial factors. The findings from the data analysis were used
to evaluate the effectiveness of ASIP in reducing response time and its potential for
adoption within the industry.

36
2.4 Validity and reliability
This section describes measures and considerations made to ensure the accuracy and
trustworthiness of the study findings. Furthermore, this section details concerns
regarding the generalisability of the results and the procedures to ensure it.
A cloud provider-agnostic serverless framework was utilised when constructing the
serverless backend application included in this study. Through the use of such a
framework, the application can be conveniently migrated between different serverless
platform providers to ensure a general applicability and external validity for the
results.
Choice of cloud provider for conducting the empirical research does not affect the
general validity of the research because the serverless concept is consistent across
different serverless platform providers. While various cloud providers have
incorporated different solutions to mitigate the cold start issue, the issue still remains.
This is a result of the serverless concept where the number of computing instances are
scaled to zero in the absence of traffic.
Obtaining access to an existing web application with real users in order for ASIP to be
tested in a real scenario was considered optimal with regards to reliability. However,
since access to such an application was unattainable, an alternative solution was
employed in this study. Human test participants were invited to use a serverless web
application, generating realistic user traffic data that enabled a robust assessment of
ASIP's performance.
Considerations regarding the sample size of test participants for the user traffic
recording were made to ensure the data was adequately representative. The concept of
saturation, which is commonly used in qualitative research to determine requirements
for representative samples, was considered in this process. The saturation concept is
not directly applicable to this study since it is only used in qualitative studies,
however the saturation concept can assist in determining a sufficient number of test
participants (Mason, 2010).
While it is important to note that saturation is traditionally linked with qualitative
research, it can provide valuable insights in estimating an appropriate sample size in
the context of this study, as well. Previous research implies that studies with narrower
scopes often require smaller sample sizes to reach saturation compared to studies
aiming to explore a process spanning multiple disciplines (Mason, 2010).
This concept of saturation, therefore, is pertinent to the validity of this study, given
that it is a bachelor thesis with defined scope and limitations. This research is not
intended to be generalizable across numerous disciplines; instead, it focuses on a more

37
specific context—interactive web applications. Hence, the sample size does not need
to be extensive to achieve saturation.
Prior to being replayed, the recorded traffic data was modified to mitigate the
limitations of a modest sample size. This was done through multiplying the traffic
data by a factor of 10 to simulate more users. This resulted in an access log consisting
of 19 000 requests to be replayed during a 25 minute period, resulting in an average of
approximately 13 requests replayed per second. This was considered appropriate since
the replay server was an AWS EC2 instance with limited capacity. Furthermore,
variation in the traffic data was increased through appending a unique delay to the
start of each simulated user. Consequently, large quantities of requests were sent
which enabled the acquisition of more representative results so that external validity
could be ensured.
The user traffic was replayed from a script running on an AWS EC2 instance in the
same region as the serverless backend. This resulted in exceptionally short response
times. However, this was not a major concern, as the primary goal of the replay was
not to determine the actual response time but rather to assess the average reduction in
response time compared to normal circumstances without a mitigation strategy in
place.
There are several potential drawbacks to using an EC2 instance in the same region as
the serverless functions, one of which is that the experimental setup may not
accurately reflect typical conditions for a web application user. Replaying requests
from a personal computer located remotely from the AWS data centre hosting the
serverless web application might provide a more realistic scenario, as it would more
closely resemble a typical user accessing the web application from a geographically
distant location. However, this approach could be subject to temporary network
latency fluctuations and similar factors, rendering an EC2 instance beneficial as it is
less likely affected by such factors.
To address the issues associated with all traffic replaying and handling taking place
within the same data centre region, one possible solution could involve replaying the
requests multiple times from various networks, ensuring consistent results and
minimising the impact of network latency variations. This would provide a more
robust evaluation of the serverless functions' performance under diverse conditions,
which is more representative of real-world usage scenarios.
As performance metrics were collected for three different scenarios to evaluate ASIP,
the collection was repeated five times for each scenario in order to detect potential
anomalies compromising the reliability of the results. The average value from the five
repetitions were then used for presentation in this study.

38
Montgomery (2017) suggests that at least five repetitions are often needed to get a
good estimate of experimental error when evaluating systems. While this is more in
the context of industrial experiments and process optimization, the principle can be
applied to other fields as well.
The choice of five tests per scenario was made to achieve saturation in the results,
reducing the likelihood of anomalies and ensuring a more reliable average
performance measure. It is important to note that the selection of the number of tests
or repetitions in any experiment is a balance between statistical validity and practical
feasibility with regards to aspects such as time, cost, and resources. As such, the
number five was deemed a suitable compromise for this study.

2.5 Considerations
This section accounts for considerations relevant to this study. Considerations were
made primarily with regards to ethical aspects in connection with the data collection
methods employed in the study, as such considerations were deemed relevant.
Given that an HTTP access log was used to record requests sent by test participants
during data collection, handling potentially sensitive data with discretion was crucial.
The proxy server was configured with a verbose output logging setting to replay
requests with identical headers and properties, occasionally including user passwords.
Therefore, test participants were given predetermined login credentials to avoid
collection of potentially sensitive data.
Due to the nature of the data collected in the study, test participants should, from an
ethical perspective, be informed about the data collection and the purpose of the
collection so that informed consent may be obtained. Furthermore, it was critical that
the data collected was kept secure and confidential to prevent unauthorised access or
misuse.
As this study utilised Google Analytics and Google Firebase Realtime Database for
data collection, careful consideration was made to ensure compliance with the
General Data Protection Regulation (GDPR) act. Achieving GDPR compliance while
using services from Google may be intricate since Google is a U.S-based company
subject to U.S legislation (Peukert et al., 2022).
Beneficial to this study’s compliance with GDPR is that the collected data contained
no personally identifiable information. The information stored in Firebase Realtime
Database described only quantitative website traffic data.

39
3 Theoretical framework
This chapter includes definitions of relevant terms and describes technological
concepts relevant to the study. Furthermore, this chapter provides a foundation for the
research area and the research questions.

3.1 Serverless computing


Serverless computing is a concept that involves partitioning applications into multiple
small services, aligning with industry best practices. Furthermore, the serverless
concept abstracts away operational infrastructure maintenance.
Migrating suitable software to a serverless platform may decrease resource costs
significantly. Among cloud providers the most common pricing model for serverless
is based on function execution time instead of resource allocation which traditionally
determines operational expenses. Consequently, the serverless concept offers a highly
scalable solution for software both in terms of computational resources and cost since
both are scaled in accordance with demand (Baldini et al., 2017). The serverless
concept is an integral part in this study as it is centred around mitigating cold start
issues which is a phenomenon closely related to the serverless concept.

3.2 Cold start phenomenon


In the context of a serverless cloud computing architecture, the term cold start refers
to a delay that occurs prior to the invocation of a function. This delay stems from the
initialisation of a computational resource required prior to a function invocation. In
addition to the initialisation of a computational resource, code libraries upon which
the function depends also need to be loaded. These two prerequisites for a function
invocation constitute the cold start delay (Eismann et al., 2021). The cold start
phenomenon is the focal point for this study and a novel approach to alleviate the cold
start issue is presented.

3.3 Doherty threshold


The Doherty threshold is a reference point commonly considered within the user
experience field. The Doherty threshold was introduced by Doherty and Thadani
(1982) and suggests that the optimal response time for interactive systems is 400
milliseconds and below. A response time below the 400 milliseconds threshold is
asserted to result in a significant productivity increase. Furthermore, such a response
time is stated to be beneficial to the user’s engagement and ability to stay attentive.
The background to the Doherty Threshold is rooted in the principles of cognitive
psychology. Human cognitive processes have a certain rhythm, and 400 milliseconds
is known to align well with this rhythm, sustaining the user's attention and facilitating
seamless interaction. A response time that surpasses this limit is believed to disrupt

40
the user's flow of thought, potentially leading to decreased engagement, and in more
severe cases, user frustration (Doherty & Thadani, 1982).
The Doherty Threshold provides additional insights into managing system latency, a
key factor affecting user satisfaction. Latency, defined as the delay between a user's
action and the system's response, can be effectively minimised by adhering to the
Doherty Threshold, thereby ensuring a fluid, uninterrupted user experience and
contributing to greater user satisfaction and loyalty (Doherty & Thadani, 1982).
In conclusion, the Doherty Threshold, as presented by Doherty and Thadani (1982),
extends beyond being a mere numeric indicator. It serves as a vital reference in the
field of user experience, with implications spanning system design, cognitive
psychology, and user engagement. By integrating this threshold into design and
development processes, practitioners can better accommodate user needs, thereby
enhancing productivity, engagement, and overall user satisfaction.
Consequently, the Doherty threshold relates intricately to the cold start phenomenon
since a cold start typically results in a response time far exceeding the 400 millisecond
threshold. Conversely, a request not affected by a cold start often yields a response
time well below 400 milliseconds (Lloyd et al., 2018).
In this context, the Doherty Threshold has been incorporated into this study to
emphasise the importance of low response times in interactive systems. By
minimising the occurrence and impact of cold starts, system designers can ensure they
meet or even surpass the standards set by the Doherty Threshold, thereby enhancing
user engagement and productivity.

3.4 Provisioned concurrency


Provisioned concurrency describes a number of pre-initialised serverless container
instances allocated to a specific function (Chahal et al., 2021). In this study,
provisioned concurrency is utilised for reducing cold start occurrences since it allows
the concurrency for a specific function to be adjusted through a Command Line
Interface (CLI) or an HTTP API. Provisioned concurrency is designed to reduce cold
starts as it allows for concurrent invocations of the same function to be handled by
pre-initialised containers (Amazon Web Services, 2023a).
While the term provisioned concurrency is unique to AWS, equivalent features are
provided in other cloud platforms such as Google Cloud and Microsoft Azure (Google
Cloud, 2023; Microsoft, 2023). Therefore, a software artefact controlling provisioned
concurrency, developed in combination with AWS, could be modified and migrated to
other cloud providers.

41
3.5 Existing literature on cold start mitigation strategies
Cold start mitigation strategies in existing literature typically revolve around
performing code-optimisation with the purpose of reducing loading times or
modifying the technical implementation of a serverless platform to reduce cold start
(Wen et al., 2022). Modifying serverless platform technology is only possible with
open-source serverless frameworks which can be used as an on-premises alternative to
cloud providers (Lee et al., 2021). These approaches may have limitations, as certain
code dependencies always need to be loaded, and container instances always need to
be initialised due to the nature of serverless computing.

3.6 Provisioned concurrency as an alternative approach


Instead of reducing the cold start delay, provisioned concurrency eliminates the cold
start completely for requests to which it is successfully applied. Depending on various
factors such as cloud provider and programming language, a request exposed to cold
start typically has a response time significantly higher than a normal request. Previous
research has shown that cold start latency can constitute up to 80% of the total
response time (Wen et al., 2022).

3.7 Limitations of provisioned concurrency


It is important to note that provisioned concurrency involves a number of constantly
active computational resources dedicated to the function to which it is applied.
Therefore, utilising provisioned concurrency without an effective scaling policy may
result in significant monetary expenses (Amazon Web Services, 2023a).
In this study, provisioned concurrency is used not to reduce the cold start delay, but to
reduce the number of cold start occurrences as an alternative to the commonly used
cold start mitigation strategies mentioned in the scientific literature.

42
4 Results
This chapter is designed to encapsulate the findings generated from the data collection
and analysis process. It not only showcases the results but also analyses the results
with respect to the employed methodologies that led to these outcomes.

4.1 Presentation of collected data


The aim of this section is to present the raw data collected during the study in an
unbiased manner, abstaining from interpretations or personal commentary. The data
focuses primarily on performance metrics obtained by replaying traffic data.
After replaying the traffic with and without ASIP active, the performance metrics
were compiled to permit a comparison and evaluation of these scenarios. A control
scenario was incorporated as well, where the concurrency configuration was
uniformly set to a value of one for each function. Table 3 portrays the average
response time and provisioned concurrency duration metrics.

Table 3
Comparison of performance metrics under different scenarios

Metric (Average) Scenario


Concurrency
Default With ASIP
Fixed To 1
Response Time (ms) 120,12 107,86 99,148
Standard Deviation In
263,36 225,25 164,35
Response Time (ms)
50th Percentile Response
63,03 65,86 61,89
Time (ms)
90th Percentile Response
146,20 149,71 122,47
Time (ms)
99th Percentile Response
1 916,16 1 414,64 877,71
Time (ms)
Provisioned Concurrency
0 5848,8 10499,70
Duration (s)

Table 3 elucidates the average response time in milliseconds and the average
provisioned concurrency duration in seconds. It is crucial to point out that the results
tabulated for the ASIP scenario were gleaned when the warm-up threshold was set to

43
40%. The rationale behind the selection of this particular percentage is discussed in
the Data analysis section of the Methods and implementation chapter in this report.
Comprehensive understanding of the implications behind this threshold level selection
necessitates direct reference to that discussion within the relevant section.

Additionally, metrics were recorded at various warm-up threshold levels to facilitate


the evaluation of suitable levels. Table 4 illustrates a performance comparison of
ASIP at different warm-up thresholds.

Table 4
Comparison of performance metrics for different warm-up threshold configurations

Metric (Average) Warm-up Threshold


20% 40% 60%
Response Time (ms) 100,518 107,86 115,268
Standard Deviation In
204,69 225,25 230,64
Response Time (ms)
50th Percentile Response
61,83 65,86 61,93
Time (ms)
90th Percentile Response
123,64 149,71 127,81
Time (ms)
99th Percentile Response
1 314,34 1 414,64 1 476,75
Time (ms)
Provisioned Concurrency
8537 5848,8 5060,8
Duration (s)

Table 4 offers insights that might aid in determining an optimal warm-up threshold
level. These insights are designed to account for relevant circumstantial factors such
as cost-efficiency balance. The foundation of all metrics in Tables X73 and X49 is the
data extracted from traffic recording, repeated under different scenarios and various
warm-up thresholds. The complete data sets can be found in Appendix B.

In addition to the results presented in this section, results describing response times
categorised by function are also generated. However, these are primarily used to
identify potential anomalies which could undermine the reliability of the findings and
consequently placed in the appendix, specifically in Appendix C.

44
4.2 Data analysis

The following analysis provides a comprehensive interpretation of the data collected


in this study. The findings expand on the analysis method section and elucidate the
correlation among data variables.

One significant observation from Table 3 is the provisioned concurrency duration for
the standard scenario, which is 0. This is due to the serverless platforms' default
behaviour, which does not utilise any provisioned concurrency unless requested by the
customer.

The data in Table 3 reveals a 10% reduction in average response time when using
ASIP compared to the default scenario. Interestingly, this reduction jumps to 16%
when the warm-up threshold is set to 20%, as inferred by comparing the standard
scenario from Table 3 with the 20% threshold level results in Table 4.

However, the average response time metric may not be the most relevant as the cold
starts constitute only a minimal partition of all the response times included in the
average. Therefore, various statistical measures were computed for the results.

An examination of Table 3, outlining percentile values for the various scenarios,


reveals no substantial improvement with ASIP in the 50th and 90th percentiles. In
fact, there is a slight increase in these response time measures, possibly due to
overhead activities from continuous function instance provisioning.

However, the 99th percentile response time considerably decreases with ASIP,
indicating a reduction in the cold start frequency. The data in Table 3 shows that ASIP
reduces the 99th percentile response time by 25% when set to a 40% threshold, and
reduces it by 31% when given a 20% threshold, as demonstrated in Table 4.

The reduced 99th percentile response time is a positive indication for ASIP, as it
suggests a reduction in the cold start frequency. This can be deduced from the
percentile response time since it shows the longest single response time after
excluding the 1% of requests which were the most time-consuming. More succinctly,
when the longest cold start durations are excluded, the highest response time is
significantly lower when using ASIP, suggesting that the number of cold starts is
decreased by ASIP. Additionally, ASIP not only reduces the response times but also
decreases the standard deviation, indicating more predictable response times – a trait
highly sought after in interactive systems (Chen et al., 2001).

Further analysis can be performed through comparing the different test results,
presented in Table A1 in Appendix B. The five test results for the standard scenario
demonstrate a substantial variability in average response time, suggesting a potential

45
reliability issue with the standard scenario data. To ensure that this data was also
reliable, three additional traffic replays were performed for the standard scenario, in
addition to the existing five.

Results from the additional tests for the standard scenario are visible in Table A6 in
Appendix D. These additional results aligned with the previously obtained data in
terms of variation, corroborating the original findings.

Based on the data presented in Table 3 and Table 4, the monetary expense associated
with adopting ASIP can be estimated. Given that the fixed cost for provisioned
concurrency on a single function in AWS was $11,85 monthly when the study was
conducted, a monthly cost estimation can be calculated based on the number of
serverless functions in the web application (Amazon Web Services, 2023b).

The control scenario with fixed concurrency in Table 3 represents the combined total
time during which all functions had provisioned concurrency enabled. Through
comparing the provisioned concurrency duration between the fixed control scenario
and ASIP in Table 3, it can be calculated that ASIP causes the duration to be 58% of
what the duration would be with provisioned concurrency constantly enabled for all
functions. Therefore, the monthly cost of implementing ASIP into an existing web
application can be calculated using the following formula:

𝐶 = 0. 58 · 𝐹 · 11. 85 (2)

Here, 𝐹 is the number of functions in the web application and 𝐶 is the monthly
operational monetary expense for ASIP, expressed in U.S dollar currency. In formula
(2), the value 0. 58 is derived from results for ASIP configured with a 40% warm-up
threshold. With other threshold levels, as depicted in Table 3, the corresponding value
can be calculated by dividing the provisioned concurrency duration for that threshold
with the provisioned concurrency duration for fixed concurrency in Table 3.

However, the cost-benefit analysis of adopting ASIP with a 40% warm-up threshold,
leading to functions having provisioned concurrency enabled 58% of the time, varies
depending on how much value is placed on the 10% reduction in response time.

It is important to consider that the validity of the formula in (2) is contingent upon the
predetermined workload which was replayed and the value of 0.58 may vary
significantly during different times of the day. However, the formula still serves as an
indication for the cost associated with ASIP when there is a certain type of incoming
traffic. During the 25-minute traffic replay, there was constant traffic to most
endpoints with the exception of the five-minute pause in the middle. During a period

46
of low traffic, the cost for ASIP would plausibly be significantly lower as fewer
functions would become subject to instance provisioning.

Table 4 highlights the difference in performance metrics between different warm-up


threshold levels. It demonstrates that the average response time is decreased as the
threshold level decreases. This is because a lower warm-up threshold leads to more
functions being pre-initialized through provisioned concurrency, further reducing the
occurrence of cold starts. However, this increased use of provisioned concurrency
leads to additional monetary expenses. Consequently, this results in a trade-off
situation concerning the reduction in response time and the increased expense, where
users of ASIP must strike a balance between the two based on their unique conditions
and resources.
Through an analysis of the response time categorised by function, available in
Appendix C, an anomaly in a certain function can be identified through several of the
scenarios. The discrepancy is the response time for the function named getSales,
which under several scenarios and threshold levels, has a 90th percentile response
time that is significantly higher than for all other functions.
While it is difficult to draw conclusions regarding why the getSales function shows a
deviating response time only for some of the scenarios and threshold levels, there
could be several factors causing this anomaly. After analysing the interactive
graphical interface of the web application used in the study, it became evident that the
page from which the getSales function was triggered was only accessible through
navigation from a single page. This page had very limited content and functionality,
and during the traffic recording with test subjects, it was observed that most
participants rapidly navigated to the page triggering getSales after briefly visiting this
originating page.
Therefore, in the recorded real-time data, very few users were present simultaneously
on the originating page. This results in the navigation probability matrix having a low
probability for navigation from this page to the page which triggers getSales.
Therefore, it is plausible that the probability for invocation of the getSales function,
according to ASIP’s algorithm, was never high enough to cause the function to be
pre-initialised.
The code for the getSales function was also analysed to identify potentially time
consuming activities performed by the function. Specifically, certain database
operations may vary significantly in delay depending on the database schema and
query. However, no such potential cause could be identified, as the getSales function
only performed data retrieval operations equivalent to those performed by the other
functions included in the data.

47
Additionally, through analysing the collected results, it could be identified that
getSales was the function least frequently triggered out of all the functions included in
the study. As such, it is natural that the computational resource for the function
container is deallocated in between invocations, explaining the anomaly.
To summarise, analysis of the results suggest that predicting function invocations
based on real-time user traffic is a feasible method for mitigating cold starts.
However, several refinements can be applied to enhance the approach developed in
this study, in order to achieve a more effective mitigation.

48
5 Discussion
This chapter contains discussions regarding the results from the study in relation to
existing research on the subject. Additionally, this chapter includes remarks regarding
limitations and implications of the study.

5.1 Result discussion


This section evaluates the results from the analysis in relation to the purpose and
research questions. In more precise terms, this section contains discussions regarding
causality with respect to the results, in combination with establishing connections to
existing research. The first research question posed by this study is:
[RQ1]: Under which circumstances can serverless function invocations be predicted
based on real-time user distribution across a web application?

Foundations for answering this research question can be found in the results
presented in Table 3. The table shows that ASIP causes a reduction in average
response time – indicating that it is able to predict serverless function invocations to a
certain extent. However, successful predictions are contingent upon specific
circumstances, including the method and design of ASIP.

During the development of ASIP, it was evident that a web monitoring tool was
necessary to collect the traffic data required to construct the probability matrix. A
web analytics tool such as Google Analytics or equivalent can provide extensive
information regarding user behaviour.

In this study, only a limited part of the data available from Google Analytics was
used to construct the matrix. However, a more refined approach could incorporate
more detailed data, typically available from web analytics services. It can also be
noted that once the matrix in this study was generated, it was fixed and not updated in
accordance with newer traffic data. Therefore, this is an aspect which could be
improved through continuously adjusting the matrix based on incoming traffic.

Beyond the probability matrix utilised in this research, factoring in seasonal


anomalies could potentially enhance the accuracy of predictions. For example,
e-commerce application traffic may experience spikes during holiday sales or special
promotional events (Ensafi et al., 2022). Predicting such patterns of increased activity
could enrich the currently prevalent traffic-based forecasting implemented by ASIP.

Jegannathan et al. (2022) provides a mitigation strategy that accounts for these
seasonal anomalies. This strategy, grounded in the statistical SARIMA model,
detailed in the introductory chapter of this report, operates by identifying recurring
traffic patterns and anticipating their recurrence. That type of approach could be

49
effective in mitigating cold starts during recurring events such as holiday sales.
Nevertheless, irregular events such as specific promotional campaigns may pose a
greater challenge in pattern identification. In those instances, it could be beneficial to
manually input information into an artefact such as ASIP. This could be achieved, for
instance, by pre-informing the artefact about expected traffic fluctuations on certain
dates. This allows ASIP to base its predictions on a blend of this data and the
typically used real-time data and matrix.

However, manual intervention in informing ASIP about projected seasonal anomalies


necessitates additional effort. Therefore, a careful evaluation of whether the manual
input justifies its potential impact on reducing cold start instances is imperative.

During the development, a custom solution was developed and used for monitoring
real-time data. The custom solution is based on Firebase Realtime Database. This is a
minimalistic approach providing only real-time user quantities for each page,
nevertheless it is an integral component for ASIP to function, however it is possible
that other real-time monitoring tools can be utilised, or developed, potentially
providing additional data useful for invocation prediction.

Another component integral to the ability to predict function invocations was the
function-trigger map. As the probability matrix only contained data regarding
navigation probability between the web application’s respective pages, the pages
needed to be mapped to the functions which they triggered. This was required since a
visit to a certain web page may trigger either none or any number of serverless
functions. Consequently, a mapping between pages and functions needed to be
generated, referred to in this study as a function-trigger map.

A drawback to the function-trigger map is that it needs to be updated manually in


accordance with modifications made to the web application. Therefore, it may be
desirable to establish an automated method for maintaining and updating the
function-trigger map. A potential solution could involve analysing the code for the
web application frontend to identify which functions are triggered from which pages.

As many web application frontends are based on frontend frameworks such as


Angular, with which the web application used in this study was implemented, such
web applications follow a common code structure (Bhardwaz & Godha, 2023). This
facilitates development of a solution that is able to generate a function-trigger map
through analysing the frontend code as it can be integrated with applications
implemented through such a framework.

Furthermore, a requirement not foreseen prior to the fulfilment of the study, was the
warm-up threshold for ASIP’s algorithm. Such a threshold is needed to balance the

50
cost-efficiency aspect. While three different fixed values for the threshold were tested
in this study, this is an area with potential for improvement. The threshold could be
determined dynamically through analysis of user traffic. Furthermore, it would be
possible to configure the warm-up threshold to deviate for certain pages where a cold
start could be more intrusive than on others. It could also be possible to adjust the
warm-up threshold based on the time of day, as traffic during certain hours may be
more or less prioritised, depending on the type of application.

Conclusively, a number of prescriptive conclusions can be drawn in order to answer


the first research question. It is evident that a cold start mitigation strategy, based on
real-time traffic such as ASIP, requires at least a few different components to function
as intended. These components include: 1) a navigation probability matrix, or
equivalent basis for navigation prediction, 2) real-time user traffic monitoring, 3) a
function-trigger map, 4) an appropriate function warm-up threshold.

The second research question posed by this study is:

[RQ2]: Which cold start mitigation strategy provides the most significant delay
reduction?

This research question can be answered through comparing the performance of ASIP,
presented in Table 3, with existing scientific cold start mitigation strategies. While it
is evident from Table 3 that ASIP achieves a certain reduction in average response
time, this may not always be sufficient to justify the associated operational expense,
as a result of the use of provisioned concurrency.

A benefit with ASIP is that it can be adapted to any serverless platform technology
which provides the provisioned currency feature that is also available under different
names at both Google Cloud and Microsoft Azure (Microsoft, 2023; Google Cloud,
2023). Existing cold start mitigation artefacts typically do not provide this flexibility
since they are specific to a certain serverless platform technology.

Considering the characteristics and observed performance of ASIP, it is plausible that


it would be particularly beneficial in applications with predominantly low traffic that
occasionally experience sudden bursts of demand. Although the tests did not directly
explore this scenario, the behaviour and properties of ASIP suggest that it could
effectively diminish the occurrence of cold starts during unexpected demand surges,
all while maintaining a minimal additional cost. This is attributable to the fact that
provisioned concurrency would largely remain idle during periods of low traffic. This
insight is instrumental as it indicates the optimal conditions under which ASIP could
potentially provide the greatest cost-effectiveness.

51
Research by Vahidinia et al (2022) into reducing cold starts has shown a reduction in
memory consumption by 12.73% and average response time delay by 22.65%, as
described in the problem statement section of this report. This solution is therefore
more effective in reducing the cold start delay since ASIP has achieved at most a
16% delay with a 20% warm-up threshold which causes additional monetary charges.
However, the solution by Vahidinia et al (2022) is specifically catered to optimising a
serverless platform called Apache OpenWhisk.

While the solution by Vahidinia et al (2022) may potentially be adopted to similar


open-source serverless technologies, this would require extensive work. There is also
no possibility to adopt the solution when using cloud services since they employ
proprietary technology for their respective serverless platforms, in contrast to
open-source platforms. This is one of ASIP’s advantageous traits, as it can be
migrated to any of the major cloud service providers such as AWS, Azure, and
Google Cloud, and also to any open-source serverless platform providing a feature
equivalent to provisioned concurrency.

Another cold start mitigation approach by Lin & Glikson, (2019) is based on
maintaining a pool of function instances in stand-by, ready to handle incoming
requests. As with most mitigation approaches, these pre-warmed instances need to be
scaled in accordance with demand to avoid excessive costs. This pool-based approach
is claimed to reduce the 99th percentile response time up to 85%. This can be
compared to ASIP, which shows a reduction of the 99th percentile by 31% when set
to a 20% warm-up threshold.

However, as with most existing approaches, the pool-based approach from Lin &
Glikson, (2019) is tailored specifically towards a certain serverless platform, in this
case a platform called Knative, which is based on Kubernetes container technology.
Due to this limitation, ASIP may be favourable despite not providing an equally
effective mitigation approach, as it can be adopted in more circumstances.

A paper by Solaiman & Adnan, (2020) proposes a container management architecture


called WLEC to mitigate the cold start problem in serverless computing. The authors
propose a queue-based solution to minimise cold start latency. The evaluation results
show that WLEC decreases cold start occurrences by nearly 27%. While the decrease
in cold start occurrences can not be directly compared against ASIP’s decrease in
average response time, it serves as an indication of the effectiveness of the solution
presented by Solaiman & Adnan.

Another mitigation approach by Lee et al. (2020), is based on fusing functions that
are chained together in a workflow and typically invoked sequentially. Chaining

52
functions to form a workflow is a specific use case for serverless functions, well
suited for machine learning activities.

For one of the studied workflows, consisting of nine chained functions, this method
reduced cold start delay by approximately 50%. However, it is important to note that
in the least favourable scenario, the response time of a workflow increased by 14%.

The function fusion approach, while effective in certain scenarios, presents a


significant drawback. As outlined by Lee et al. (2020), it necessitates a deep
understanding and meticulous planning, which can be time-consuming. In contrast,
the ASIP methodology does not require such extensive understanding or planning
since it does not depend on specific architectural patterns.

Furthermore, function fusion introduces additional complexity into the coding


process. To perform the same tasks, developers must write extra code, which,
according to Lagerström et al. (2017), could lead to undesirable consequences in the
long term. Complex code tends to give rise to security vulnerabilities, bugs, and
errors, potentially resulting in higher costs or security risks. In comparison, ASIP
avoids such issues. It does not necessitate any additional coding beyond the
implementation of ASIP itself, which can operate in an isolated environment separate
from the serverless functions.

Li et al. (2021) have proposed another approach, called Pagurus, also described in the
problem statement of this report. The solution is grounded on the principle of
resource sharing among containers. The core principle is to optimise the utilisation of
idle, but already warm, containers, thereby improving the overall efficiency of the
system.

Pagurus relies on the ability to let different serverless functions be executed within
the same virtual container, i.e the container is allowed to be reused. This is useful
when different serverless functions share the same dependencies primarily in terms of
external code libraries. In such cases, the cold start is completely eliminated since a
virtual container that can handle the function invocation is already pre-initialized,
despite the fact that it was previously used for a different function. The functions do
not need to share precisely the same dependencies in order to share containers,
however, the more mutual dependencies, the more effective is the cold start
mitigation (Li et al., 2021).

It is stated that under optimal circumstances, Pagurus decreases the cold start delays
by 76%, however the average response time is increased by 0,48%. This increase can
be attributed to the fact that Pagurus relies on scheduling container sharing which
introduces overhead in some cases.

53
Conclusively, there is no universal cold start mitigation that can be adopted to any
platform. Therefore, the choice of mitigation strategy is limited by the serverless
platform technology. However, ASIP provides certain flexibility in this regard, as it
can be integrated into any serverless platform that allows for configuration of
provisioned concurrency.

In terms of effectiveness, the Pagurus approach by Li et al. (2021) stands out as the
most promising candidate as it significantly reduces a typical cold start by 76% while
utilising limited resources itself. Therefore, virtually no additional charge is
introduced when using Pagurus.

However, as Pagarus is based on the container reuse policy with the prerequisite that
multiple serverless functions share dependencies, this requires a vast number of
serverless functions to be deployed in the same environment. This increases the
probability that a function which shares dependencies with another exists. This is the
case for prominent cloud providers such as AWS, as their services are in popular
demand. In fact, several serverless platforms such as AWS Lambda, Azure Functions
and Apache OpenWhisk do utilise the concept of container reuse in order to minimise
cold start durations and resource consumption (Kumari & Sahoo, 2023).

Therefore, it is evident that a combination of cold start strategies is typically


favourable since this results in the most effective cold start mitigation. As this study
has been performed with the AWS Lambda platform, the results from the study are
influenced by a combination of ASIP and the internal mitigation strategies within the
Lambda platform.

As internal mitigation strategies are part of proprietary technology, it is difficult to


examine how these influence the results. Therefore, it may be valuable to perform
this study with different cloud providers, to elucidate and evaluate potential
differences in the results when their internal mitigation approaches are paired with
ASIP.

5.2 Method discussion


This chapter critically examines the methodologies utilised in the study, evaluating the
degree to which the research objectives and inquiries were adequately addressed.
Moreover, it dissects the advantages and potential limitations inherent in the chosen
research methods, including an emphasis on considerations of validity and reliability.
The research method selection for this study, namely the deductive research approach,
proved to be a suitable choice given the nature of the research problem. The focus was
on reducing a measurable time delay, which required a systematic and numerical

54
method. In this regard, the deductive approach provided a robust framework, allowing
for the development and testing of hypotheses derived from existing theories.
The use of the design science methodology to develop and evaluate the Adaptive
Serverless Invocation Predictor (ASIP) software artefact enabled a practical
investigation of the research problem. This approach allowed for empirical data
generation, which was essential for the study's purpose. Furthermore, the use of a
quantitative analysis method was appropriate given the focus on measurable
performance metrics.
However, the application of this methodology may have been limited by its
specificity. Design science research typically requires a high level of expert
knowledge, potentially limiting the reproducibility of the study. Additionally, while
quantitative methods provided clear, numerical data, they may have overlooked more
nuanced or qualitative aspects of the research problem, such as the user experience of
serverless applications.
For example, qualitative methods could be used to explore user experience, which is
not easily quantified but can provide valuable insights into how users interact with
serverless applications and how their experience might be improved. Qualitative data,
such as user feedback or observations of user behaviour, could also provide additional
context that might inform the development of solutions like ASIP (Mackey & Gass,
2015).
While the focus of this study was on reducing measurable time delay, future research
could consider a mixed-methods approach, combining quantitative and qualitative
methods. This could provide a more comprehensive understanding of the research
problem, combining the measurable performance improvements identified in this
study with a qualitative understanding of the user experience.
The data collection methods employed in this study, particularly the use of traffic data
from study participants and an open-source web application, offered a realistic basis
for testing ASIP. The combination of different data collection methods, including a
proxy server, script recording, and Google Analytics, provided a comprehensive set of
data that addressed the research questions. However, the reliance on study participants
for traffic data collection could have introduced bias or variability, potentially
affecting the reliability of the findings.
Furthermore, while entrusting the traffic replaying script with the task of collecting
response time data was effective and yielded the necessary data, it may have been
advantageous to utilise an established solution to guarantee the precision of the
collected data. Nonetheless, given the straightforward nature of data collection related
to response times, the simplicity of the script streamlined the process of verifying its
proper functioning.
55
The collection of provisioned concurrency duration, however, was not as trivial as it
required a combination of multiple AWS services and a complex SQL query. This
means that this data may not have the highest accuracy, partly because changes in
provisioned concurrency are not instant and a few seconds during the instance
initialisation phase could therefore not be accounted for.
Since the method for collection of provisioned concurrency duration was developed
specifically for this study, this method has not been tested before and can therefore not
be asserted to provide reliable data. It was therefore important to ensure that the
method was accurate through other means. Here, the control scenario with fixed
concurrency proved instrumental, as the concurrency duration for that scenario could
be precalculated given the number of provisioned function instances (seven in this
case) and that each traffic replay lasted 25 minutes.
Subsequently, the expected concurrency duration was compared against the duration
measured through AWS services and the result was a close match. The anticipated
duration was 10 500,00 seconds and the average measured duration was 10499,70
seconds, as shown in Table A3 in Appendix B, underscoring the accuracy of the
measurements.
The decision to separate the data used for matrix construction from the data used for
replay was necessary to ensure reliable results. Although this resulted in a smaller
sample size, the impact on the overall validity of the study was minimised by careful
management of the data.
The analysis of ASIP's performance under different provisioned concurrency
configurations was a strength of the study. By comparing ASIP's performance to both
a default and a control scenario, the study provided a robust evaluation of ASIP.
However, the choice of a fixed probability threshold for ASIP's function
pre-initialization might have limited the study's ability to fully evaluate ASIP's
performance under various realistic conditions.
Furthermore, the study did not explore the possibility of calculating an optimal
threshold or dynamically configuring the threshold according to real-time user traffic.
This represents an area for future improvement and research. While testing ASIP with
different thresholds provided some insights, the study's reliance on a single web
application and a single workload for evaluation may not have fully captured ASIP's
effectiveness under different conditions.
The cost analysis of using ASIP was an essential part of the research, given the
practical implications of adopting ASIP in the industry. Although the study factored in
the cost of provisioned concurrency, it did not include other variable costs, such as
cost per request and execution duration. While these costs are not influenced by ASIP,

56
their consideration might have provided a more holistic view of the cost implications
of ASIP adoption.
The study's findings, indicating the effectiveness of ASIP in reducing response time,
were based on a significant threshold of 20%. Although this threshold was informed
by previous research, there is a possibility that it could limit the interpretation of
results.
In terms of validity and reliability, the study was designed with these requirements in
mind. The research methodology was structured and data collection methods were
systematic, contributing to the study's reliability. However, as mentioned, factors such
as the choice of a fixed threshold and the reliance on a single web application and
workload for evaluation may have affected the validity of the results. Additionally,
potential external factors like network latency, server conditions, or variability in
participant behaviour could have influenced the results, despite not being under the
control of the study.
In retrospect, there are certain aspects that could have been done differently to
improve the study. For instance, using a more diverse set of web applications and
workloads for evaluation could have provided a broader perspective on ASIP's
performance. Also, exploring the possibility of dynamically configuring the threshold
based on real-time user traffic could have allowed for a more nuanced understanding
of ASIP's effectiveness.
In terms of the study's purpose, the research methods chosen allowed for the
development and evaluation of a software artefact aimed at reducing response time in
serverless applications. The findings indicated that ASIP was effective in achieving
this goal, demonstrating the suitability of the methods employed. However, the
limitations noted suggest that there is room for further refinement of the methodology
to fully address the research questions.
Overall, the research methods used in this study were largely effective in addressing
the research problem and questions. The study demonstrated the potential of ASIP as
a solution for reducing response time in serverless applications. However, the
limitations identified provide valuable insights for future research, indicating areas
where methodological adjustments could enhance the validity and reliability of the
findings.

57
6 Conclusions and further research
This chapter encompasses discussions regarding the results of the study in relation to
previous research within the same area. Additionally, this chapter accounts for
potential implications of the study.

6.1 Conclusions
The study's focus was on addressing the critical issue of cold starts in serverless
computing, a prevalent issue in the existing cloud-based service environment.
Contrary to the prevalent reliance on historical data to predict user behaviour, the
conducted study introduced a real-time user traffic monitoring approach to anticipate
computational resource demand. This has proved to be an innovative contribution to
the problem statement, paving the way for enhanced efficiency and performance in
serverless computing.

6.1.1 Practical implications

In this section, practical implications with regards to the industry, the public sector
and society are accounted for. Such implications are discussed in relation to the study
as well as the broader research area addressed by the study.
The findings from this study shed light on the significance of mitigating cold starts in
serverless computing and could have far-reaching practical implications in various
sectors. The direct beneficiaries would be companies in the tech industry, especially
those which rely heavily on cloud-based services. The reduction in cold starts would
not only enhance the performance of their serverless applications, but also lead to cost
savings, as serverless pricing models are often based on the execution time of the
functions.
This research could also influence decisions made in the public sector, particularly
within government departments and agencies that utilise cloud computing for their
digital transformation initiatives. By applying the strategies discussed in this study to
reduce cold starts, these entities could improve the efficiency of their digital services,
which would ultimately lead to better public service delivery.
Moreover, the implications of this study extend to societal issues, particularly in
relation to environmental sustainability. In 2015, data centres were responsible for the
consumption of 3% of all electricity generated globally. This consumption has been
expected to increase dramatically (Alhindi et al., 2022).
As indicated in the paper by Alhindi et al. (2022), serverless functions can
significantly reduce power consumption in data centres, which are known to account
for a large portion of the world's energy consumption. The mitigation of serverless
cold starts presented in this study could encourage more organisations to adopt a

58
serverless computing technology as a substitute for traditional constantly-running
servers. This would contribute towards the global efforts to combat climate change by
lowering the carbon footprint associated with digital technologies.
Furthermore, the reduction in energy consumption could also lead to cost savings for
data centre operators, which could potentially lower the costs of cloud services for
end-users. In developing regions, where access to digital technologies is often limited
by cost, this could have profound implications by making digital services more
affordable and hence accessible to a wider population (World Bank, 2016).
In conclusion, the practical implications of this study are multi-dimensional, ranging
from enhancing the performance of serverless applications, to contributing to
environmental sustainability and social equity. The strategies discussed in this study
for mitigating serverless cold starts thus hold significant promise for the future of
cloud computing.

6.1.2 Scientific implication

This section contains discussions regarding the potential implications which this study
may engender with respect to the scientific community.
The findings of this study could stimulate further research into refining serverless
architectures. By demonstrating a method to reduce cold starts, this study paves the
way for increased efficiency and performance in serverless computing.
This study also underscores the potential of leveraging real-time user traffic
monitoring to optimise computing resources. This technique could be applied in other
areas of cloud computing and networking research, prompting scientists to explore
innovative ways to harness real-time data for system optimization. Utilising real-time
data is also a novel approach within research conducted specifically to address the
cold start problem. As such, it can potentially encourage further research on cold start
mitigation based on real-time monitoring, as opposed to existing methods centred
around historical data.
The method used in this study to evaluate the software artefact provides a framework
for others to benchmark and assess similar tools and strategies. This could lead to
more standardised evaluation methods in serverless computing research, enabling
more meaningful comparisons between studies and more cumulative scientific
progress.
By quantifying the reduction in operational costs associated with the implementation
of the developed artefact, this study contributes to research on cost efficiency in cloud
services. This could spur further investigations into cost-saving strategies, an area of
significant interest given the growing use of cloud services in both academia and
industry.

59
6.2 Further research

This section delves into potential avenues for enhancing the scope of this study by
proposing novel areas for exploration, drawing insights from the conducted research.

A promising approach to achieve more effective cold start mitigation involves the
amalgamation of this study's method with pre-existing cold start mitigation
techniques. This possibility was not investigated in this study, presenting an
opportunity for future research.

A more sophisticated approach to generating the matrix in Figure 1 may result in


more accurate function invocation predictions while simultaneously reducing the
manual effort required to implement ASIP. Such research may belong to the machine
learning field which falls outside the scope of this study, consequently the potential
software-aided generation of this matrix is considered future work which can be
performed to augment this study.
An alternative approach might involve replacing the matrix entirely with a more
comprehensive solution encompassing additional dimensions. One such dimension
could be time, equipping the software artefact with the data needed to account for
variances in user navigation frequency. Furthermore, the dataset for ASIP could be
enriched by capturing more granular data on user path exploration throughout their
entire website visit.
Utilisation of detailed path exploration reports, available in tools like Google
Analytics, could facilitate the prediction of function invocations more accurately and
further in advance. These reports could provide insights into the probability of
navigation not just from page A to page B, but also the likelihood of a user
subsequently progressing from page B to page C. This would enable the prediction of
a function invocation triggered from page C while the user is on page A, thereby
facilitating predictions further in advance, providing another interesting dimension for
future research.
For the further refinement of ASIP, it might be beneficial to consider the invocation
duration for each serverless function, given its impact on the function's ability to
handle requests over time. This change could lead to a more efficient and predictive
system, pushing the boundaries of what is currently possible in serverless computing.
Conclusively, this section highlights multiple aspects of the proposed software artefact
that can be improved. In combination, these components form a proof of concept for a
refined advancement of ASIP holding significantly greater potential in terms of
effectiveness in reducing cold starts as well as adoption in the industry.

60
In closing, this study has highlighted the emerging potential of leveraging real-time
data analytics in the field of serverless computing, specifically in enhancing cold start
mitigation techniques. The convergence of these technologies opens up new
possibilities and promising avenues for future research. By pushing the boundaries of
this technological synthesis, the intention is not just to refine existing cold start
mitigation techniques but also to redefine the broader scope of serverless computing.
This study serves as a stepping stone in the broader continuum of technological
development. Looking ahead, it is anticipated that serverless computing, fortified by
the findings of this research, will assume an increasingly integral role in propelling
innovation in the digital domain. While the full potential of these technologies has yet
to be harnessed, the journey of discovery is ongoing, underscoring that the value of
scientific research lies not only in established knowledge but also in the pursuit of
new understanding.

61
7 References
Agarwal, S., Rodriguez, M. A., & Buyya, R. (2021). A Reinforcement Learning
Approach to Reduce Serverless Function Cold Start Frequency. 2021
IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet
Computing (CCGrid). https://doi.org/10.1109/ccgrid51090.2021.00097
Ajzen, I. (1991). The Theory of Planned Behavior. Organizational Behavior and
Human Decision Processes, 50(2), 179–211.
https://doi.org/10.1016/0749-5978(91)90020-T
Alhindi, A., Djemame, K., & Heravan, F. B. (2022). On the Power Consumption of
Serverless Functions: An Evaluation of OpenFaaS. 2022 IEEE/ACM 15th
International Conference on Utility and Cloud Computing (UCC).
https://doi.org/10.1109/ucc56403.2022.00064
Amazon Web Services. (2023a). Lambda function scaling - AWS Lambda. Retrieved
February 17, 2023, from
https://docs.aws.amazon.com/lambda/latest/dg/lambda-concurrency.html
Amazon Web Services. (2023b). Serverless Computing – AWS Lambda Pricing.
Retrieved May 14, 2023, from https://aws.amazon.com/lambda/pricing/
Analytics Help. (2023). [GA4] Path exploration. Retrieved March 13, 2023, from
https://support.google.com/analytics/answer/9317498?hl=en
Baldini, I., Castro, P., Chang, K., Cheng, P., Fink, S., Ishakian, V., Mitchell, N.,
Muthusamy, V., Rabbah, R., Slominski, A., & Suter, P. (2017). Serverless
Computing: Current Trends and Open Problems. Research Advances in Cloud
Computing, 1–20. https://doi.org/10.1007/978-981-10-5026-8_1
Bekavac, I., & Garbin Praničević, D. (2015). Web analytics tools and web metrics
tools: An overview and comparative analysis. Croatian Operational Research
Review, 6(2), 373–386. https://doi.org/10.17535/crorr.2015.0029
Bhardwaz, S., & Godha, R. (2023). Svelte.js: The most loved framework today. 2023
2nd International Conference for Innovation in Technology (INOCON), (pp.
1-7). https://doi.org/10.1109/INOCON57975.2023.10101104
Blitzstein, J. K., & Hwang, J. (2019). Introduction to probability. CRC Press.
Bryman, A. (2016). Social research methods (5th ed.). Oxford University Press.
Chahal, D., Ramesh, M., Ojha, R., & Singhal, R. (2021). High Performance
Serverless Architecture for Deep Learning Workflows. 2021 IEEE/ACM 21st
International Symposium on Cluster, Cloud and Internet Computing (CCGrid).
https://doi.org/10.1109/ccgrid51090.2021.00096
Chen, X., Mohapatra, P., & Chen, H. (2001). An admission control scheme for
predictable server response time for web accesses. 10th International
Conference on World Wide Web. https://doi.org/10.1145/371920.372156

62
Creswell, J. W. (2014). Research design: Qualitative, quantitative, and mixed methods
approaches (4th ed.). SAGE Publications.
Doherty, W. J., & Thadani, A. J. (1982). Design and response time in man-computer
conversational transactions. IBM Systems Journal, 21(1), 3-21.
Eismann, Scheuner, J., Erwin van Eyk, Schwinger, M., Grohmann, J., Abad, C. L., &
Iosup, A. (2020). Serverless Applications: Why, When, and How? IEEE
Software, 38(1), 32–. https://doi.org/10.1109/MS.2020.3023302
Ensafi, Y., Amin, S. H., Zhang, G., & Shah, B. (2022). Time-series forecasting of
seasonal items sales using machine learning – A comparative analysis.
International Journal of Information Management Data Insights, 2(1),
100058. https://doi.org/10.1016/j.jjimei.2022.100058
Fellah, A., & Bandi, A. (2021). Microservice-based Architectures: An Evolutionary
Software Development Model. EPiC series in Computing.
https://doi.org/10.29007/1gx5
Google Cloud. (2023). Concurrency. Retrieved May 15, 2023, from
https://cloud.google.com/functions/docs/configuring/concurrency
Ivan, Vasile, & Dadarlat. (2019). Serverless Computing: An Investigation of
Deployment Environments for Web APIs. Computers, 8(2), 50.
https://doi.org/10.3390/computers8020050
Iivari, J. (2010). Twelve theses on design science research in information systems.
Integrated Series on Information Systems (Vol. 28, pp. 43–62).
https://doi.org/10.1007/978-1-4419-5653-8_5
Jegannathan, A. P., Saha, R., & Addya, S. K. (2022). A Time Series Forecasting
Approach to Minimize Cold Start Time in Cloud-Serverless Platform.
ArXiv:2206.15176 [Cs]. https://arxiv.org/abs/2206.15176
Kitchenham, B. A., Budgen, D., & Pearl Brereton, O. (2011). Using mapping studies
as the basis for further research – A participant-observer case study.
Information and Software Technology, 53(6), 638–651.
https://doi.org/10.1016/j.infsof.2010.12.011
Kumari, A., & Sahoo, B. (2023). ACPM: adaptive container provisioning model to
mitigate serverless cold-start. https://doi.org/10.1007/s10586-023-04016-8
Lagerström, R., Baldwin, C. Y., MacCormack, A., Sturtevant, D., & Doolan, L.
(2017). Exploring the Relationship Between Architecture Coupling and
Software Vulnerabilities. Engineering Secure Software and Systems, 53–69.
https://doi.org/10.1007/978-3-319-62105-0_4
Lee, S., Yoon, D., Yeo, S., & Oh, S. (2021). Mitigating Cold Start Problem in
Serverless Computing with Function Fusion. Sensors, 21(24), 8416.
https://doi.org/10.3390/s21248416

63
Li, Z., Chen, Q., & Guo, M. (2021, August 25). Pagurus: Eliminating Cold Startup in
Serverless Computing with Inter-Action Container Sharing. ArXiv.org.
https://doi.org/10.48550/arXiv.2108.11240
Lin, P.-M., & Glikson, A. (2019). Mitigating Cold Starts in Serverless Platforms: A
Pool-Based Approach. ArXiv.org. https://doi.org/10.48550/arxiv.1903.12221
Lloyd, W., Ramesh, S., Chinthalapati, S., Ly, L., & Pallickara, S. (2018). Serverless
Computing: An Investigation of Factors Influencing Microservice
Performance. 2018 IEEE International Conference on Cloud Engineering
(IC2E). https://doi.org/10.1109/ic2e.2018.00039
Mackey, A., & Gass, S. M. (2015). Second Language Research. Routledge.
https://doi.org/10.4324/9781315750606
Mason, M. (2010). Sample Size and Saturation in PhD Studies Using Qualitative
Interviews. Forum Qualitative Sozialforschung / Forum: Qualitative Social
Research, 11(3). https://doi.org/10.17169/fqs-11.3.1428
Montgomery, D. C. (2017). Design and analysis of experiments. John Wiley & Sons.
Microsoft. (2023). Concurrency in Azure Functions. Retrieved May 15, 2023, from
https://learn.microsoft.com/en-us/azure/azure-functions/functions-concurrency
Peukert, C., Bechtold, S., Batikas, M., & Kretschmer, T. (2022). Regulatory Spillovers
and Data Governance: Evidence from the GDPR. Marketing Science.
https://doi.org/10.1287/mksc.2021.1339
Solaiman, K., & Adnan, M. A. (2020). WLEC: A Not So Cold Architecture to
Mitigate Cold Start Problem in Serverless Computing. 2020 IEEE
International Conference on Cloud Engineering (IC2E).
https://doi.org/10.1109/ic2e48712.2020.00022

‌Vahidinia,P., Farahani, B., & Aliee, F. S. (2022). Mitigating Cold Start Problem in
Serverless Computing: A Reinforcement Learning Approach. IEEE Internet of
Things Journal, 1–1. https://doi.org/10.1109/jiot.2022.3165127
Wen, J., Chen, Z., Li, D., Chen, J., Liu, Y., Wang, H., Jin, X., & Liu, X. (2022).
LambdaLite: Application-Level Optimization for Cold Start Latency in
Serverless Computing. ArXiv:2207.08175 [Cs].
https://arxiv.org/abs/2207.08175
World Bank. (2016). World Development Report 2016: Digital Dividends.
Washington, DC: World Bank. https://doi.org/10.1596/978-1-4648-0671-1

64
8 Appendices

65
Appendix A: Information provided to study participants

Information till testdeltagare


Upplysningar om studiens syfte samt omfattning av förekommande datainsamling och
övriga villkor.

Syfte & mål


Studien utförs av William Branth & Gustav Persson inom ramen för en
kandidatuppsats vid School of Engineering, Jönköping University. Ämne för studien
är kallstarter hos serverlösa funktioner vilket är ett aktivt forskningsområde. Vid
studiens färdigställande publiceras den i Digitala Vetenskapliga Arkivet (DiVA)
portal.

Konfidentialitet och anonymitet


Genom att delta i denna vetenskapliga studie samtycker du till att ditt namn samlas in
för organisering av insamlad data. I övrigt görs deltagandet i studien helt anonymt.

Frivilligt deltagande
Deltagande i denna studie är frivilligt och deltagare äger rätt att under studien dra
tillbaka sitt samtycke och avsluta sin medverkan i studien utan några negativa
konsekvenser. All personlig information kommer att hanteras med högsta sekretess
och integritet.

Kontaktinformation
Vid ytterligare frågor kring studien samt rapportering av eventuella bekymmer eller
problem, se nedanstående kontaktuppgifter.

Gustav Persson, pegu20ji@student.ju.se


William Branth, brwi20vx@student.ju.se

Tack
Tack för ditt deltagande i denna vetenskapliga studie. Genom att delta i studien har du
bidragit till att utöka den vetenskapliga utvecklingen inom det teknologiska fältet.

Signering
Härmed intygas att
har tagit del av och förstått informationen i detta dokument avseende deltagande i den
beskrivna vetenskapliga studien.

66
Appendix B: Performance metrics from traffic replaying

Table A1
Statistical measures for standard scenario without ASIP

Test # Response Time (ms)


Standard 50th 90th 99th Concurrency
Average
Deviation Percentile Percentile Percentile Duration (s)
1 106,36 271,76 62,35 145,23 1 948,84 0
2 101,91 265,8 63,48 145,39 1 930,15 0
3 137,52 270,75 63,29 147,25 1 907,25 0
4 112,79 255,13 63,51 148,02 1 921,27 0
5 142,02 253,36 62,52 145,11 1 873,29 0
Average 120,12 263,36 63,03 146,2 1 916,16 0

Table A2
Statistical measures for ASIP with 40% warm-up threshold

Test # Response Time (ms)


Standard 50th 90th 99th Concurrency
Average
Deviation Percentile Percentile Percentile Duration (s)
1 113,31 265,81 63,07 149,43 1 927,55 5984
2 110,6 181,64 62,24 143,21 533,81 5716
3 102,49 191,47 78,16 160,44 1 212,61 6040
4 106,76 200,63 62,44 144,08 1 504,57 5972
5 106,15 288,1 63,4 151,38 1 894,65 5532
Average 107,862 225,53 65,862 149,708 1 414,64 5848,8

67
Table A3
Statistical measures for control scenario with fixed concurrency

Test # Response Time (ms)


Standard 50th 90th 99th Concurrency
Average
Deviation Percentile Percentile Percentile Duration (s)
1 98,05 171,26 62,56 121,29 1 233,84 10500,63
2 98,9 151,21 62,22 127,53 617,58 10504,52
3 98,61 152,63 62,08 120,68 563,11 10497,50
4 97,1 162,11 61,26 116,45 637,94 10506,34
5 103,08 184,54 61,35 126,42 1 336,09 10489,50
Average 99,148 164,35 61,894 122,474 877,71 10499,70

Table A4
Statistical measures for ASIP with 20% warm-up threshold

Test # Response Time (ms)


Standard 50th 90th 99th Concurrency
Average
Deviation Percentile Percentile Percentile Duration (s)
1 97,42 186,7 61,9 123,13 1 133,04 8015
2 100,21 211,28 61,27 124,65 1 351,83 9120
3 101,2 197,63 60,92 117,52 1 243,89 8810
4 100,46 214,41 62,57 130,18 1 473,29 8445
5 103,3 213,44 62,49 122,73 1 369,64 8295
Average 100,518 204,692 61,83 123,642 1 314,34 8537

68
Table A5
Statistical measures for ASIP with 60% warm-up threshold

Test # Response Time (ms)


Standard 50th 90th 99th Concurrency
Average
Deviation Percentile Percentile Percentile Duration (s)
1 125,51 241,99 61,94 134,47 1 503,17 4764
2 110,73 236,51 62,44 125,79 1 455,42 5116
3 123,42 228,39 61,45 120,91 1 443,84 5092
4 114,59 223,29 61,88 131,61 1 544,06 5124
5 102,09 223 61,97 126,25 1 437,25 5208
Average 115,268 230,636 61,936 127,806 1 476,75 5060,8

69
Appendix C: Difference in response time between functions

Figure A1
Response time per function for standard scenario without ASIP

70
Figure A2
Response time per function with ASIP at 40% warm-up threshold

71
Figure A3
Response time per function for control scenario with fixed concurrency

72
Figure A4
Response time per function with ASIP at 20% warm-up threshold

73
Figure A5
Response time per function with ASIP at 60% warm-up threshold

74
Appendix D: Additional traffic replaying to ensure reliability

Table A6
Additional tests for standard scenario without ASIP

Test # Average Response Time (ms)

1 127,08
2 140,1
3 105,04
Average 124,07

75

You might also like