Survey On Autonomic Workload Management: Algorithms, Techniques and Models

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 7, JULY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.
ORG
29
Survey on Autonomic Workload Management: Algorithms, Techniques and Models

Basit Raza, Abdul Mateen, M M Awais and Muhammad Sher
AbstractIn Database Management System (DBMS) and Data Warehouse (DW), workload is very important entity to be managed in a proper way and responsive way. Earlier the DBMS and DW workload has been managed automatically but with growing size of data and increase in numbers of users to handle workload automatically become difficult or impossible. Workload management has become challenge for the database community and vendors. It is challenge to identify the queries, which create problem and/or resource contention queries. Further how we can know before executing queries and take decision about suspending or to kill the problematic queries. This paper provides the basis and ac hievements about autonomic computing in workload management. We surveyed the literature of workload management in DBMSs and DWs and categorized into self-* characteristics. These self-* characteristics include self-inspection, self-Optimization self-Configuration, selforganization, self-Prediction and self-Adaptation. The survey provides comparative analyses and highlights the short comings of the previous workload management techniques, algorithms and models. Finally the surveyed literature is categorized on the basis of workload type and autonomic perspective. Index TermsWorkload, autonomic, self-Optimization, self-Configuration, self-Inspection, self-Prediction, self-Organization and self-Adaptation
1 INTRODUCTION
HE growing complexity of computer systems emphasizes the need for developing autonomic computing (AC) systems. Basically AC system evolves through five stages of autonomicity which starts from Basic, Managed, Predictive, Adaptive and then Autonomic [1]. AC has challenges which include conceptual, architectural, middleware and applications challenges [2]. Autonomic computing systems have five components which are Negotiation, Execution, Observation, Deliberation and Failure Recovery [3]. The promising benefits of AC are also required in DBMSs and DWs. Increases in data size, need of maximum functionality, and shortage of skilled database administrators (DBAs), all these factor motivated the DBMSs and DWs industry to develop Autonomic DBMSs. In this regard, early research projects focused on the design of table indexes and optimization of memory. However, recent research is focused on development of intelligent tools such as expert systems, configuration management, performance tuning tools and easy to use interfaces. The AC characteristics (self-*) have been presented in [2, 4, 5]. Self-Optimization AC system itself performs different related activities and executes utility in an effi
Basit Raza is PhD Computer Science scholar at International Islamic University, Islamabad, Pakistan. Abdul Mateen is PhD Computer Science scholar at International Islamic University, Islamabad, Pakistan. Mian Muhammad Awais is Associate Professor at Department of Computer Science, Lahore University of Management Sciences, Lahore, Pakistan. Muhammad Sher is Professor at Department of Computer Science, International Islamic University, Islamabad, Pakistan.
cient way in the presence of workload, required and available resources for improving system performance. Self-Configuration - An AC system has the ability to configure itself according to desired goals/ objectives. AC system dynamically adapts its configuration, recognize modification in the environment and then have the capability to reconfigure itself without any human interference. Self-Healing It is the characteristic in which the AC system always remains in a consistent state with respect to time. When there is a chance of failure of the system, autonomic system has the capability to recovers it by using logs and backup. Self-Protection It is the characteristic of autonomic system that must have capability to protect itself from unauthorized access. Protection of system includes security, privacy and data encryption mechanisms. Self-Inspection It is the ability of the system taking intelligent decision on the basis of selfawareness. It includes awareness of available resources current status, environment, limitations and interdependencies with other systems. Self-Organization The Autonomic system that has the ability to dynamically restructure and reorganize layout of data, indexes and other system related data for improving system performance. Self-Prediction It is the characteristics of AC system that can predict information of system according to available resources and according to the environment. It will be helpful for decision making about the workload. Self-Adaptation - It is the characteristic of autonomic system to adapt the changes dynamically so that the performance of the system can be improved. The rest of the paper is organized as follow: Section 2
2011 Journal of Computing Press, NY, USA, ISSN 2151-9617 http://sites.google.com/site/journalofcomputing/
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 7, JULY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG
30
provides an overview on autonomic workload management. Section 3 provides the previous work on workload management in DBMSs and DWs w.r.t. selfoptimization, self-configuration, self-inspection, selforganization, self-prediction and self-adaptation. Section 4 provides analyses and limitations of existing work. Section 5 provides categorization w.r.t workload type and autonomic perspective of workload management in DBMSs and DWs. Finally section 6 concludes and provides future directions.
tensity, demanding resource etc) itself and mange workload with minimal human intervention. This autonomic technology has a high potential to be incorporated in current DBMSs. The workload management challenges are what will be the execution time to complete a query, identification and decision about problem queries, identification of resource oriented queries and resource contention, workload characterization, prediction and adaptation.
2 AUTONOMIC WORKLOAD MANAGEMENT (AWM)

DBMSs and DWs are used in different organization and are main sources of information. There is need of organization to get accurate and in time information for successful decision for the organization. In DBMSs workload consists of complex queries, batch files, batch reports and incremental data. The workload includes insertion queries, updation queries, deletion queries and resource demanding queries. There is versatility in workload and the type of workload has great impact and requires serious attention. The workload can be Online Transaction Processing (OLTP) or Decision Support System (DSS) or Online Analytical Processing (OLAP). All the resource allocation and de-allocation is performed by considering the type of workload. Earlier DBA was responsible to manage workload and to perform tuning and configuration of the workload and all these activities are performed by the DBA manually. With the passage of time, size of data has grown up and complexity of the data has increased. Due to the growth in data size and changing behaviour of the workload it is becoming difficult or impossible to mange workload by the DBA. Normally all the activities are performed by the DBA or through automatic way but due to large number of resources required and less number of availability of resources or improper resource management, systems becomes overload and cannot be handled that leads to failure of the system. In this situation workload manager unable to handle the workload and the incoming workload either will be rejected or handled when resource becomes free. Ultimately system becomes down, losing the interest of user, wastage of time and money. The research in context of workload management has been carried out from many years and different database vendors have developed and incorporated number of tools for this purpose. These techniques and tools are related with scheduling, MPL, resource allocation and configuration of different parameters for workload. This motivates researchers and database vendors to work on self managing workload which have the ability to recognize, predict and adapt change autonomically. Autonomic workload management is expected to mange the workload in an efficient and responsive way. Autonomic workload manager should absorb the workload without any effect on other requests, maximum utilize the available resources and solve the problem within few seconds. We believe that this autonomic workload manager will collect necessary information about workload (type, in-
3 AUTONOMIC WORKLOAD MANAGEMENT IN DBMS AND DW

Autonomic workload management should have selfoptimization, self-configuration, self-inspection, selfprediction, self-organization and self-adaptation features. Self-optimization in workload management exhibits that all tasks related with workload must be executed in an efficient manner. In order to achieve efficiency in workload management, configuration of different components should be performed in self-managed and appropriate way. Self-inspection in autonomic workload management supports better decisions making by using the knowledge of its resources, limits, intensity etc. Self-prediction in workload management helps to forecast the different aspects such as resource demand, workload frequency and memory requirements etc for the future. Self-organization in autonomic workload management allows reorganizing and restructuring the layout of data and indexes in order to make improvements. Self-adaptation allows adapting the changes in workload according to the available resources and environment. We are discussing the workload management techniques, algorithms and models w.r.t. six autonomic characteristics.
3.1 Self-Optimization Self-Optimization is the characteristic that optimizes the process of modifying a system or some components so that it works more efficiently with minimum resources. Research in the area of self-optimization of workload management has been carried out by different researchers and vendors. Optimization is achieved through observing the incoming requests and proper resource utilization. Following section discusses some literature in this regard. Krompass et al. [6] discusses an adaptive QoS management technique, in which they used economic model that is used to handle individual request of Business Intelligence (BI) and OLTP workload proactively. They provide a systematic way to arrange different requests by dividing these into different classes based on cost and time limit. They also proposed a model which calculates the cost of a request by differentiating under achieving and marginal gains of a Service Level Objective (SLO) threshold. The framework is evaluated to observe its effectiveness by performing experiments on different workloads. Pang et al. devices a Priority Adaptation Query Resource Scheduling (PAQRS) Algorithm [7] based on Priority Memory Management (PMM) Algorithm and deals with multi-class query workload. Architecture and its implementation for cluster based web services has been discussed in the paper [8]. Web service workload is
31
divided into different service classes for each gateway. Resources are allocated to different services w.r.t MPLs for each gateway. There is a feed back control component that maintains the service classes in a performance model. This model considers that all the requests have the same size, however in DBMS different requests have different size and resource demand. In paper [9], the microeconomics concepts are used to device resource allocation framework for multiuser environment. The main purpose of this framework is to reduce the response time of running queries. As compare to single query progress indicator, this framework considers the impact of concurrent running queries on each other. The effectiveness of framework is provided by implementing it in PostgreSQL. 3.2 Self-Configuration It is the ability of a DBMS to configure and reconfigure itself dynamically according to the given goals/ objectives and changing conditions of the workload. A lot of research has been carried out by the different researchers in this area. We have highlighted those areas of workload management where work is done on configuration to manage workload. Menasce et al. [10-12] devised a QoS controller for E-commerce applications that has the ability to manage workload. They proposed that QoS requirements can be achieved by adjusting different configuration parameters within a system. These adjustments are done through the QoS Controller by considering three performance goals which are average response time, throughput and rejection probability. A technique for characterization and workload models [10] is developed for E-commerce environment. The authors introduced a Customer Behavior Model Graph (CBMG) that represents similar navigational pattern for group of customers who perform same activities. This technique is also evaluated with different experiments and results. QoS level in Ecommerce application by dynamically monitoring and tuning discussed in [11]. This technique identifies best configuration parameters by combining hill climbing technique with analytical queuing model. They performed experiments to evaluate their technique by making comparison of QoS levels. In paper [12], authors have designed controllers that use analytic performance models with combinatorial search techniques. They presented the effectiveness of their technique through simulation and by performing experiments. Weikum et. al. [13] identified the basic reasons of performance problem for On Line Transaction Processing (OLTP) workload through different metrics. The workload management is done by performing system and database configuration. Brown et. al. [14] suggested a technique that is used to alter the Multiprogramming level (MPL) and memory allocation.
been conducted by different researchers. Wasserman et. al. [15] presented analysis of characterization technique for BI workload. The approach is based on some resource belonging parameters such as CPU consumption, sequential and random I/O rate and joins degree. The sizing technique works by collecting the input data from the user. The input data is validated and resource demand identified for each workload class. A technique [16] is developed for admission controlling and for E-commerce workload scheduling to improve response time and stability. The technique works online by observing the cost of requests differentiates types of requests, protection from overloading and uses preferential scheduling. The technique can be implemented in a proxy and works without making any change in operating system, web and application server or database. The technique is implemented on TPC-W workload. A technique for query suspension and resumption with minimum overhead is discussed in [17]. The author proposed induction of asynchronous checkpoints for each cardinality in a query. They proposed an optimized plan for suspension which dumps the current state to disc and going back to previous checkpoint. The optimized plan performs its tasks (suspension or resumption) with less overhead by observing the time constraint during suspension. In case of query suspension all the resources will be released while in case of resumption the required resources will be resumed. They also implemented this technique in a tool named as PREDATOR and showed that this tool has better results than others. In paper [18] authors proposed a technique for query suspension and resumption to handle workload in an efficient manner.In DB2, flow of requests is controlled proactively and dynamically by streamlining the requests according to the available resources and workload [19, 20] through Query Patroller. This strategy helps to execute the small and high priority queries without any delay.
3.3 Self-Inspection Self-Inspection is the characteristic that have the ability to make intelligent decisions which are based on selfawareness. There are number of tools and techniques that are used to examine the workload at all times. Research in the area of self-inspection of workload management has
3.4 Self-Prediction It is the characteristics of the systems that have the ability to monitor the system all the time. Slef-predicting system provides predictions about the resource demand, changing behaviour of the workload. These systems predict for the future on the basis of previous history and mathematical models. A lot of research has been done in this area. We have found the literature over workload prediction. Chetan Gupta et. al. [21, 22] proposed a framework for prediction of query execution time by using query execution plan and system load via applying machine learning techniques. Dayal et. al. [23] introduced a new approach of workload management by predicting resource usage of queries. They examined different techniques find Kernel Canonical Correlation Analysis (KCCA) best for correlation of incoming query attributes and performance attributes. It predicts the performance of workload based on correlations. Martin et. al. [24] proposed exploratory and confirmatory models for monitoring and analysis of workload. For developing autonomic workload management model they applied different machine learning and data mining techniques. Ganapathi et.
32
al. [25] developed a model that predicts the effective performance parameter of query. These parameters includes elapsed time of query, records used, message bytes etc. The model is validated by applying statistical machine learning techniques using HP Neoview database. Said Elnaffer and Martin [26-28] proposed framework PsychicSkeptic Prediction framework (PSP). PSP predicts the shifts of workload from Decision Support System (DSS) to OLTP workload. It has three components Psychic (offline), Skeptic (online) and Training data Model. Thereska et. al. [29-31] have developed URSA MINOR a test bed for prediction of workload that leads to self-managing or autonomic systems. The test bed has two main components observer and stardust which are using what-if model. It was designed in such a way that it can be easily incorporated in existing systems as well as new systems. DB Resource Advisor [32-34] is developed to predict response time and throughput of workload autonomically. They used what-if models on developed framework. The framework is validated and incorporated in the database.
3.5 Self-Organization Self-organization is the characteristic to organize the given workload in such a way so that it can be executed in an efficient manner without any human interaction. Through Self-organization we can obtain optimal result however it is performed only when previous execution is not efficient. Different researchers, scientists work on resource allocation and de-allocation techniques to manage workload [35-37]. They proposed different resource allocation techniques for assigning resources to the workload efficiently. Below is the discussion of Self-organization performed in workload management. The Bruno et. al. [38] proposed a framework for online tuning that examines the current workload at all the time and make changes in the physical design. When a query is processed, their framework collects information of the query execution plan (QEP), calculates its cost and then selects best QEP to alter physical design. It uses technique to calculate the cost for each QEP with the help of query optimizer. The framework considers the index correlation during the process that avoids physical design oscillation. Systems fail to perform properly due to design process complexities and workload information. To overcome this problem a tool Hippodrome [39] is introduced to automate the design and configuration process and reducing the human interaction. It performs this task in iterative fashion up till it finds the best storage design. Similar storage technique as discussed in [39] is introduced for Storage Area Networks (SANs) in [40]. Authors presented two algorithms for designing a cost effective SAN. These algorithms provide snapshots, mirroring, backup & configuration and evaluated via experiments over design problems to show their effectiveness. 3.6 Self-Adaptation Self-adaptation is a way to convert an old system into a target system to achieve maximum efficiency. This property has the capability to adapt all the required changes in the system according to the environment. Self-adaptation
has been achieved by performing different experiments and developing various frameworks and models, some of which are discussed as under. Schroeder et al. [41] proposed to use response time as an SLA that is incorporated in their framework to achieve QoS. They proposed a framework for scheduler, External Queue Management System (EQMS) that is used to limit the multiprogramming level on the concurrent requests. Their EQMS consists of three components, scheduler, MPL Advisor and performance monitor and self-optimize in an adaptive way. This scheduler is independent from the internals of DBMS. Their framework has a feedback loop that is used to execute more than one query at a time. During the process some information like available resources and number of executing requests are used by the feedback loop. In the paper [42], authors proposed a framework for workload adaptation that has two components which are workload detection and its control and four functional components, i.e. workload characterization, performance modeling, monitoring and control. The authors proved the effectiveness of their framework via implementation.
ANALYSES AND DISCUSSION
This section presents the analyses, critical discussion, pros & cons and limitations of the techniques, models w.r.t autonomic characteristics discussed in section 3. The analyses are also presented through a table having 7 columns that describe the purpose, workload type, behavior, existing techniques or models and human intervention.
4.1 Self-Optimization The framework provided for QoS in workload management [6] is beneficial for OLTP and BI workload. The framework is scalable as it can implement the new workload management concepts with already previously implemented policies. The framework uses economic model with two economic cost functions (Opportunity Cost, Marginal Gains). The scheduling policy used for OLTP workload in this framework is enhanced by considering the combine effect of priorities and service level objectives rather than considering merely priority. PAQRS [7] is used to schedule the complex type of workload and reduces the number of missed deadline thereby making the efficient use of system resources. It has bias control mechanism, which regulates the distribution of missed deadlines among different query classes. The MPL and memory is allocated on the basis of regular and reserve group quota. By doing this, PAQRS make adjustments between the miss ratio and the target distribution. PAQRS cannot handle transactions and is limited to workload consisting of mix queries. Its performance degrades with the increased workload fluctuations. So the adaptation mechanism of PAQRS is not up to the mark and need to be improved. Multi-Query SQL Progress Indicator [9] is distinguished from previous work by considering the impact of queries over each other. It has good prediction and adaptation ability. In case of wrong information, it makes corrections by estimation mechanism.
33
TABLE 1 ANALYSIS OF WORKLOAD MANAGEMENT W.R.T SELF-OPTIMIZATION

Tool QoS management technique [6] PAQRS [7] Purpose Handle individual query in OLTP and BI workload W Type BI/ OLAP OLTP Static/ Dynamic Dynamic Technique/ Model/ Algorithm Economic Model Human Intervention Administrator may set the threshold for expensive queries DBA defines missed distribution deadlines, performance objectives Exp /Imp Exp
Algorithm reduces missed deadlines, allocate memory and assign privileges Resources are allocated to different services w.r.t MPLs
Dynamic
PMM Algorithm
Exp
Web services architecture [8]
Dynamic
MPLs
Exp
Multi-Query SQL Progress Indicator [9]
Provides visualization of running queries & used it for handling workload
TPC
Dynamic
Exp Imp
4.2 Self-Configuration There will be maximum session drop in characterization technique [10] when there are huge sessions or maximum load. Moreover the technique has no mechanism to manage or recover these drop sessions. The technique for QoS of E-commerce [11] workload can handle dynamic workload and short term fluctuations. The technique uses heuristic optimization with predictive queuing model and provides better results. It uses reactive approach rather than proactive. The techniques uses hill climbing technique for searching but when it stuck, the sub-optimal solution will be achieved. The QoS Controller maximizes
the throughput up to 88% on average. When the control interval level is less than equal to 11, the QoS controller do not exhibit any performance. However when the control interval exceeds 11 then the performance increases up to 95%. M&M Algorithm [14] is very simple, robust and responsive algorithm that works well for different workloads, configurations and memory requirement. It has a mechanism for disk buffer classes that allows dealing with the interdependence among classes. It uses the feedback mechanism that adjusts the class goals when these goals are violated or exceeded.
TABLE 2 ANALYSIS OF WORKLOAD MANAGEMENT W.R.T SELF-CONFIGURATION

Tool Workload Characterization Technique [10] QoS Technique for E-commerce [11] Menasce [12] Purpose A characterization technique for E-commerce workload Dynamically monitors and tunes E-Commerce workload to achieve QoS level Predict QoS parameters of workload W Type TPC-W Static/ Dynamic Dynamic Technique/ Model/ Algorithm Queuing model, performance model based approach Hill climbing technique & Analytical queuing model Hill climbing, Analytical performance model, Combinatorial search technique Human Intervention Exp /Imp Exp
TPC-W
Dynamic
Exp
TPC-W
Dynamic
Sim Exp
Weikum et al. [13]
Discussed the reasons of performance problem for OLTP workload Handle multi-class workload via multiprogramming level & memory allocation
OLTP
Dynamic
Imp
M&M Algorithm [14]
OLTP
Dynamic
fragment fencing algorithm
Administrator defines initial MPL limit
Sim
34
4.3 Self-Inspection The proposed characterization technique by Wasserman et. al. [15] is performed only with TPC-H benchmark data and mostly the values of parameters are based on their assumptions. The technique can be improved by taking data from other benchmark and real time data. The workload is characterized only considering the resource demand of the user. The GATEKEEPER [16] falls in the category of self-inspection as it provides an admission and scheduling of the workload. This technique shows consistent performance even in the presence of heavy workload; and throughput is enhanced up to 10 percent by reducing thrashing and better memory reference behavior. Due to the use of SJF response time is reduced 14 percent and prevents starvation; but in case of heavy workload this technique degrades 15 percent. This technique is limited to E-commerce workload. The technique in Query Suspension & Resumption [17] has proven experimentally for simple and heavy workload and it is observed that it meets suspend time constraint and thereby reducing the overhead. The technique uses hybrid approach for query suspension where suspend time
overheads are negligible and due to this better results can be seen. After query resumption, the technique does not reutilize the given query. The technique allows to suspend whole query as compared to previous techniques where switching is performed between individual operators. The memory wastage is higher in previous techniques due to switching points and shows worse results for unexpected suspend. In Stop & Restart Technique [18], check pointing is used to improve performance. As the technique save only the remaining part of the suspend query, so there will be less memory wastage with faster restart. The Merge-Pipeline algorithm used in this technique is more efficient than the Current-Pipeline algorithm. The overheads of monitoring of this technique are low and for TPCH workload the average overhead is 3%. The focus of the approach is the regenerating of all the results rather than generating only the remaining results. In Query Patroller, [19, 20] on the basis of profile provided by the administrator, it limits the flow of long running queries to avoid saturation and ensures better resource utilization. The following TABLE represents the techniques and model for workload self-inspection.
TABLE 3 ANALYSIS OF WORKLOAD MANAGEMENT W.R.T SELF-INSPECTION

Tool Wasserman et. al. [15] Purpose Analyze the BI workload characterization technique W Type BI/ OLAP Static/ Dynamic Dynamic Technique/ Model/ Algorithm Singular Value Decomposition and Semi Discrete Decomposition Shortest job first algorithm Human Intervention Exp /Imp Exp
Gatekeeper [16]
Query Suspension & Resumption [17] Stop & Restart Technique [18]
A technique to streamline the E-commerce workload through admission control to improve throughput and response time Database centric approach to suspend and resume queries to manage workload A technique for stop and restart of queries with minimum overhead
TPC-W
Dynamic
Exp
BI/ OLAP
Dynamic
Asynchronous Check-pointing
User can specify the query execution plan
Imp
DSS TPC-H
Dynamic
Query Patroller, [19, 20]
Prioritize the small and high queries, query status and future query trend
OLTP BI/ OLAP
Dynamic
GetNext Model Optimizer Cost Model, Merge-Pipeline algorithm -
Imp
Administrator assigns the privileges at user and system level
Imp
4.4 Self-Prediction The proposed model [21, 22] does not consider the different time spans of the day, i.e. at some time workload is low or high in frequency. It does not handle sudden changing behavior of the workload. The framework [25] is tested for the queries that have 10 minutes execution time for 32-node system. The prediction framework predicts 85% of the total queries accurately (20 % margin).
The KCCA model adapted by the prediction framework does not predict all the queries. The KCCA model does not have the ability to perform continuous retraining. There is no discussion about the description and storage of workload models [24]. There is no proper methodology to monitor the managed element. The proposed methodology does not provide the autonomic maintenance strat-
35
egy for models with the evolution of system. PSP [26-28] framework predicts the workload shift through offline and online strategy. The proposed framework works by classifying the workload and handle the workload without any human involvement. As compared to the other dynamic prediction techniques has fewer overheads but it is confined to OLTP to DSS shift detection. The PSP framework as compare to the other dynamic prediction techniques has less overhead but it is confined to OLTP to
DSS shift detection. PSP framework is limited to scheduled tasks and does not have ability to manage drastic workload change. The Skeptic component uses linear model to verify trends of the shifts. Dayal et. al. [23] have developed autonomic model for some specific scenario but have not provided any general solution. There is no discussion about the description and storage of workload models. There is no proper methodology to monitor the managed element.
TABLE 4 ANALYSIS OF WORKLOAD MANAGEMENT W.R.T SELF-PREDICTION

Tool Chetan Gupta et. al. [21, 22] Dayal et. al. [23] Purpose Proposed a predicting model for query execution time in a warehouse Evaluated the existing algorithms for long running queries and introduces an approach to manage workload through resource usage prediction of queries Monitors and Self manages the workload W Type TPC-H Static/ Dynamic Dynamic Technique/ Model/ Algorithm Binary Tree Human Intervention Exp /Imp Exp
OLTP BI/ OLAP
Dynamic
Kernel Canonical Correlation Analysis (KCCA)
Exp
Exploratory& Confirmatory Models [24] Ganapathi et. al. [25]
DSS OLTP
Exploratory model and Confirmatory model Statistical Machine Learning Approach
Imp
PSP [26-28]
Proposed and developed a system to predict the performance metrics (Elapsed time, records used, disk I/Os and message bytes for queries) Predict workload shifts
TPC-DS
Dynamic
Exp
DSS OLTP
Static/ Dynamic
Imp
URSA MINOR [29-31]
Resource Advisor [3234]
A cluster-based storage system that permits the changes of data selection to encoding schemes and fault & timing models A modular architecture is presented in which CPU; buffer and storage models are integrated to predict the response time and throughput
OLTP TPC
Static/ Dynamic
Clustering Fault & Timing Model
OLTP
Dynamic
what-if model
An administrator corrects the poorly selected distributions and to change these as system evolve. DBA have to provide the information about workload, i.e. whether workload is open/ closed loop and associated rate parameters
Exp
Exp
The proposed methodology does not provide the autonomic maintenance strategy for models with the evolution of system. Resource Advisor [32-34] is presented with a modular architecture in which CPU, buffer and storage models are integrated to predict the response time and throughput by identifying the required key components. Authors have taken the advantage of end-to-end tracing technique in visualization and understanding performance of the system. Resources are properly allocated
on the basis of continuous monitoring. As compare to the resource advisor, current DBMSs lack of CPU, buffer and disk models. By using these models, Resource Advisor provides an accurate prediction and best performance results. When the size of buffer pool is lower then Resource Advisor has high overheads per transaction. Due to continuous monitoring, the CPU overhead is 6.2% for online and 1.2%for offline execution. This overhead can be reduced by using some other appropriate techniques.
36
Finally, the tool is evaluated through a prototype implementation in SQL Server however it has the ability to incorporate with some other DBMS. Ursa Minor [29-31] uses object-based storage which exposes more information about the stored data. The Ursa Minor provides scalability using cluster based technique and dynamic adaptive behavior through online choice. The re-encode process of Ursa Minor takes some extra time of the system but the throughput increases up to three times.
4.5 Self-Organization The performance model of HIPPODROME [39] not only predict the improvements but also has the ability to in-
form whether the proposed solution would conflict with the existing workload or not. It places the data based on the usage pattern and if required it can expand the storage for large workload. It reassigns stores by viewing the device utilization dynamically. It performs storage design without changing the application and takes long-term useful decisions. It improves the design of storage system by observing the system performance and using the feedback loop. The resource allocation model of HIPODROM uses economic model to tradeoff between the quality and cost. APPIA [40] technique can be enhanced by adding randomized reassignment.
TABLE 5 ANALYSIS OF WORKLOAD MANAGEMENT W.R.T SELF-ORGANIZATION

Tool Online Physical Design Tuning Algorithm [38] Hippodrome [39] Purpose Algorithm to modify the physical design dynamically with minimum overhead Suggests the best storage design. W Type TPC-H Static/ Dynamic Dynamic Technique/ Model/ Algorithm Human Intervention Exp /Imp Imp
Postmark
Dynamic
RAID
Administrator initiates the process by providing the workload capacity information -
Imp
APPIA [40]
Suggests the best storage design for network environment
Dynamic
Multi Layer Flow Merge
Imp
4.6 Self-Adaptation EQMS [41] provide a self-tuning and adaptive response through scheduler and MPL advisor. This will help to handle the changes in system load or workload dynamically and provides better results due to its feedback loop. It works well for all type of workload due to its core idea that is reducing the contention by imposing limit on the MPL. Experiment in Query Scheduler [42] is performed
on stable workload which is not suitable for dynamic environment where the workload changes rapidly such as in OLTP or OLAP. During the experiment, the total cost of a query is used as a parameter that may generates error prone results. It is confined to linear workload; however in real environment most of the time workload is nonlinear as well.
TABLE 6 ANALYSIS OF WORKLOAD MANAGEMENT W.R.T SELF-ADAPTATION

Tool EQMS [41] Purpose Develop framework to achieve QoS in workload management Workload adaptation framework, a cost based performance model that improves the performance prediction W Type TPC-C TPC-W Static/ Dynamic Dynamic Technique/ Model/ Algorithm Queuing Analysis Human Intervention DBA rules for QoS class assignment, define QoS class & targets and specify tolerable performance penalties Exp /Imp Imp
Query Scheduler [42]
OLTP BI/ OLAP
Dynamic
Kalman Filter Queuing model
Prot Imp
RESEARCH DISCUSSION
We have surveyed different workload management techniques, algorithms and models of workload management
in DBMSs and DWs which are tested and implemented with different types of workload and falls in one of the autonomic characteristics. Some models produced best
37
results and now have become the part of the DBMS, and some are used as third party tools to enhance the performance of the system. In some frameworks human intervention is necessary however autonomic technology allow no human intervention. We have categorized literature into workload type and autonomic perspective.
Self-Optimization Self-Configuration Self-Inspection Self-Organization Self-Prediction Self-Adoption
5.1 Workload Type Following TABLE 7 and Fig. 1 represent the percentage of work done by different researchers, scientists and vendors. They used different workload for their experiments. The table represents that much of the work is done on OLTP and OLAP type of workload.
TABLE 7: RESEARCH WORK DONE WITH RESPECT TO WORKLOAD TYPE
Fig. 2. Research work done w.r.t Autonomic characteristics
CONCLUSION AND FUTURE WORK
SNo 1 2 3 4 5 6 7 8 9
Workload Type OLTP DSS BI/ OLAP TPC-C TPC-DS TPC-H TPC-W TPC-R Postmark
Work Done 10 3 7 3 1 3 6 1 1
Percentage 29 9 20 9 3 9 17 3 3
The survey paper discussed the various aspects of autonomic workload management. To observe the current state of autonomic level in workload management, we have divided the available literature on workload management to self-* characteristics. TABLE 1-6 summarize the autonomic computing in workload management w.r.t Self-* characteristics on the basis of different parameters. The above analyses show the effectiveness of different workload management techniques, models and algorithms. Up to so far few advances on the workload management in the context of autonomic computing have been done. However more efforts and improvements are essential on previous as well as new workload management techniques. In future we are planning to develop framework for autonomic workload management that have the ability to handle all the tasks proactively and autonomically.
OLTP DSS BI/ OLAP TPC-C TPC-DS TPC-H TPC-W TPC-R Postmark
ACKNOWLEDGMENT
We wish to thanks the Higher Education Commission (HEC) of Pakistan who is supporting this research work and its implementation.
REFERENCES
[1] Fig. 1. Research work done with respect to workload type [2] [3] Practical Autonomic Computing: Roadmap to Self Managing Technology, An IBM Journal Paper, January 2006. Manish Parashar and Salim Hariri, Autonomic Computing: An Overview, Springer-Verlag Berlin Heidelberg, LNCS 3566, pp. 247259, 2005. Jana Koehler, Chris Giblin, Dieter Gantenbein, Rainer Hauser, On Autonomic Computing Architectures, IBM Zurich Research Laboratory, Switzerland, http://www.zurich.ibm.com/pdf/ebizz/idd-ac.pdf S. R. White, J. E. Hanson, I. Whalley, D. M. Chess, J. O. Kephart, An Architectural Approach to Autonomic Computing, Proceedings of the IEEE International Conference on Autonomic Computing (ICAC04), 2004. Basit Raza, Abdul Mateen, Tauqeer Hussain, Mian M. Awais, Autonomic Success in Databases Management Systems, In 8th International Conference on Computer and Information Science (ICIS 09), Shanghai, China, June 1-3, pp. 439-444, 2009. Stefan Krompass, Andreas Scholz, Martina-Cezara Albutiu, Harumi Kuno, Janet Wiener, Umeshwar Dayal, Alfons Kemper, Quality of Service Enabled Management of Database Workload, In Service-Oriented Computing IEEE Computer Society Technical Committee on Data Engineering, 2008.
5.2 Workload and Autonomic Characteristics Following TABLE 8 and Fig. 2 represent the percentage of work done by different researchers, scientists and vendors.
TABLE 8: RESEARCH WORK DONE WITH RESPECT TO AUTONOMIC CHARACTERISTICS
[4]
SNo 1 2 3 4 5 6
Autonomic Characteristic Self-Optimization Self-Configuration Self-Inspection Self-Organization Self-Prediction Self-Adaptation
Work Done 3 5 5 3 7 2
Percentage 12 20 20 12 28 8
[5]
[6]
38
[7] [8]
[9] [10]
[11]
[12] [13] [14] [15]
[16] [17] [18]
[19] [20] [21] [22] [23]
[24] [25]
H. Pang, M. Carey, and M. Livny. Multiclass Query Scheduling in Real-Time Database Systems. IEEE Transactions on Knowledge and Data Engineering, 7(4):533 551, August 1995. G. Pacifici, M. Spreitzer, A. Tantawi, and A. Youssef. Performance Management for Cluster Based Web Services, IEEE Journal on Selected Areas in Communications, Volume 23 (12), pp. 2333- 2343, 2005. G. Luo, J. F. Naughton, and P. S. Yu. Multi-Query SQL Progress Indicators. In 10th International Conference on Extending Database Technology (EDBT), pages 921941, 2006. D. A. Menasce, V. A. F. Almeida, R. Fonseca, and M. A. Mendes. A Methodology for Workload Characterization of E-commerce Sites, Proceedings of the First ACM Conference on Electronic Commerce (EC-99), 1999, USA, pp. 119 - 128. D. A. Menasce, D. Barbara, and R. Dodge. Preserving QoS of E-commerce Sites through Self-Tuning: A Performance Model Approach, Proceedings of the 3rd ACM conference on Electronic Commerce, USA, 2001, pp. 224 234. D. A. Menasce, and M. N. Bennani. On the Use of Performance Models to Design Self-Managing Computer Systems, Pros. of Comp. Measurement Group Conf., 2003, USA, pp. 1 - 9. G. Weikum, C. Hasse, A. Monkeberg, and P. Zabback. The COMFORT Automatic Tuning Project. Information Systems, 19(5):381432, 1994. K. P. Brown, M. Mehta, M. J. Carey, and M. Livny. Towards Automated Performance Tuning For Complex Workloads, Pros of 20th VLDB Conference, Santiago, Chile, 1994. T. J. Wasserman, P. Martin, D. B. Skillicorn, and H. Rizvi. Developing a characterization of business intelligence workloads for sizing new database systems. In Proceedings of the 7th ACM International Workshop on Data Warehousing and OLAP, pages 713. ACM Press, 2004. S. Elnikety, E. Nahum, J. Tracey, and W. Zwaenepoel. A method for transparent admission control and request scheduling in e-commerce web sites. In WWW, 2004. B. Chandramouli, C. N. Bond, S. Babu, and J. Yang. Query Suspend and Resume. In Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, 2007. S. Chaudhuri, R. Kaushik, R. Ramamurthy, and A. Pol. Stopand-Restart Style Execution for Long Running Decision Support Queries. In Proc. of 33rd Intl. Conf. on Very Large Data Bases (VLDB), 2007. IBM Corporation. DB2 Query Patroller Guide: Installation, Administration, and Usage, 2003. Sam S. Lightstone, Guy Lohman, Danny Zilio, Toward Autonomic Computing with DB2 Universal Database, SIGMOD Record 2002, Vol. 31(3), 2002. A. Mehta., C. Gupta, S. Wang, U. Dayal, Automatic Workload Management for Enterprise Data Warehouses, IEEE Data Engineering Bulleton. 31(1): 11-19, 2008. C. Gupta and A. Mehta. PQR: Predicting Query Execution Times for Autonomous Workload Management. In the Proceedings of Int. Conf. on Autonomic Computing, 2008. Umeshwar Dayal, Harumi Kuno, Janet L. Wiener, Kevin Wilkinson, Archana Ganapathi, Stefan Krompass, Managing operational business intelligence workloads, ACM SIGOPS Operating Systems Review, Volume 43, Issue 1, pp. 92-98, 2009. P. Martin, S. Elnaffar, and T. Wasserman. Workload models for autonomic database management systems. In Pros. of Int. Conf. on Autonomic and Autonomous Systems, pp. 10, 2006. Archana Ganapathi, Harumi A. Kuno, Umeshwar Dayal, Janet L. Wiener, Armando Fox, Michael I. Jordan, and David A. Patterson: Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. International Conference on Data Engineering - ICDE, 2009: 592-603.
[26] [27] [28]
[29]
[30]
[31]
[32]
[33]
[34] [35]
[36] [37]
[38]
[39]
[40]
[41]
[42]
S. Elnaffar, "A Methodology for Auto-Recognizing DBMS Workloads," Proceedings of Centre for Advanced Studies Conference (CASsCON 02), (October 2002). S. Elnaffar, P. Martin, and R. Horman, Automatically classifying database workloads, 2002. [Online]. Available: citeseer.ist.psu.edu/ elnaffar02automatically.html. S. Elnaffar and P. Martin. An intelligent framework for predicting shifts in the workloads of autonomic database management systems. In the Proceeding of IEEE International Conference on Advances in Intelligent Systems Theory and Applications, 2004. Abd-El-Malek, M., Courtright II, W. V., Cranor, C., Ganger, G. R., Hendricks, J., Klosterman, A. J., Mesnier, M., Prasad, M., Salmon, B., Sambasivan, R. R., Sinnamohideen, S., Strunk, J. D., Thereska, E., Wachs, M., and Wylie, J. J. 2005. Ursa Minor: versatile cluster-based storage. In Conference on File and Storage Technologies. USENIX Association, 5972. 10, 11, 62, 95 Thereska, E., Narayanan, D., Ailamaki, A., and Ganger, G. R. Observer: keeping system models from becoming obsolete. In the Workshop on hot topics in autonomic computing (HotAC 07), 2007. Thereska, E., Narayanan, D., and Ganger, G. R. 2005. Towards self-predicting systems: What if you could ask whatif? In 3rd International workshop on self-adaptive and autonomic computing systems. S. Agrawal, S. Chaudhuri, L. Kollar, A. Marathe, V. Narasayya, & M. Syamala. Database tuning advisor for MS SQL Server 2005. In the Proceeding of 30th VLDB conference, Aug. 2004. D. Narayanan, E. Thereska, and A. Ailamaki. Continuous resource monitoring for self-predicting DBMS. In Int. Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS05), pages 239248, 2005. Microsoft SQL Server 2005 Books online, September 2007. http://msdn2.microsoft.com/en- us/library/ms190419.aspx B. Schroeder, M. Harchol-Balter, A. Iyengar, and E. M. Nahum. Achieving Class-Based QoS for Transactional Workloads. In Proc. of the 22nd Intl. Conf. on Data Engineering (ICDE), page 153, 2006. M. Mehta and D. J. DeWitt. Dynamic Memory Allocation for Multiple-QueryWorkload. In the Proceedings of the Nineteenth International Conference on Very Large Data Bases, 1993. D. L. Davison and G. Graefe. Dynamic Resource Brokering for Multi-User Query Execution. In Pros of the ACM SIGMOD International Conference on Management of Data, pp. 281292, 1995. N. Bruno and S. Chaudhuri. An online approach to physical design tuning. In Proceedings of the 23rd International Conference on Data Engineering, pages 826835. IEEE Computer Society, 2007. E. Anderson, M. Hobbs, K. Keeton, S. Spence, M. Uysal, and A. Veitch. Hippodrome: running circles around storage administration. In Conf. on File & Storage Technology (FAST02, pp. 175188, 2002. J. Ward, M. OSullivan, T. Shahoumian, and J. Wilkes. Appia: automatic storage area network design. In the Conference on File and Storage Technology (FAST02), pages 203217, Jan. 2002. B. Schroeder, M. Harchol-Balter, A. Iyengar, and E. M. Nahum. Achieving Class-Based QoS for Transactional Workloads. In the Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, page 153, 2006. B. Niu, P. Martin, W. Powley, Workload Adaptation in Autonomic DBMSs, An IBM White Paper, 2006.

Survey On Autonomic Workload Management: Algorithms, Techniques and Models

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Survey On Autonomic Workload Management: Algorithms, Techniques and Models

Uploaded by

Copyright:

Available Formats

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 7, JULY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.

Survey on Autonomic Workload Management: Algorithms, Techniques and Models

2011 Journal of Computing Press, NY, USA, ISSN 2151-9617 http://sites.google.com/site/journalofcomputing/

2 AUTONOMIC WORKLOAD MANAGEMENT (AWM)

3 AUTONOMIC WORKLOAD MANAGEMENT IN DBMS AND DW

ANALYSES AND DISCUSSION

TABLE 1 ANALYSIS OF WORKLOAD MANAGEMENT W.R.T SELF-OPTIMIZATION

Web services architecture [8]

Multi-Query SQL Progress Indicator [9]

Provides visualization of running queries & used it for handling workload

TABLE 2 ANALYSIS OF WORKLOAD MANAGEMENT W.R.T SELF-CONFIGURATION

Weikum et al. [13]

M&M Algorithm [14]

fragment fencing algorithm

Administrator defines initial MPL limit

TABLE 3 ANALYSIS OF WORKLOAD MANAGEMENT W.R.T SELF-INSPECTION

User can specify the query execution plan

Query Patroller, [19, 20]

OLTP BI/ OLAP

GetNext Model Optimizer Cost Model, Merge-Pipeline algorithm -

Administrator assigns the privileges at user and system level

TABLE 4 ANALYSIS OF WORKLOAD MANAGEMENT W.R.T SELF-PREDICTION

OLTP BI/ OLAP

Kernel Canonical Correlation Analysis (KCCA)

Exploratory& Confirmatory Models [24] Ganapathi et. al. [25]

Exploratory model and Confirmatory model Statistical Machine Learning Approach

URSA MINOR [29-31]

Resource Advisor [3234]

Clustering Fault & Timing Model

TABLE 5 ANALYSIS OF WORKLOAD MANAGEMENT W.R.T SELF-ORGANIZATION

Administrator initiates the process by providing the workload capacity information -

Suggests the best storage design for network environment

Multi Layer Flow Merge

TABLE 6 ANALYSIS OF WORKLOAD MANAGEMENT W.R.T SELF-ADAPTATION

Query Scheduler [42]

OLTP BI/ OLAP

Kalman Filter Queuing model

Self-Optimization Self-Configuration Self-Inspection Self-Organization Self-Prediction Self-Adoption

Fig. 2. Research work done w.r.t Autonomic characteristics

CONCLUSION AND FUTURE WORK

Autonomic Characteristic Self-Optimization Self-Configuration Self-Inspection Self-Organization Self-Prediction Self-Adaptation

[12] [13] [14] [15]

[16] [17] [18]

[19] [20] [21] [22] [23]

[26] [27] [28]

You might also like