Metrics in Risk Determination For Large-Scale Distributed Systems Maintenance

Metrics in Risk Determination
for Large-Scale Distributed

Systems Maintenance
Maureen Ann Raley Advisor: Letha Hughes Etzkorn
University of Alabama in Huntsville Computer Science Department
Telephone: 703-325-3510 University of Alabama in Huntsville
Maraley@comcast.net Telephone: 256-842-6291
letzkorn@cs.uah.edu
Nationwide Computer System
Upgrade
– Upgrade networked computers, distributed throughout
the continental US
– Systems consisted of a mix of mainframe computers
and client/server systems used in a financial
environment
– Lessons-learned survey concentrated on the client
computers or workstations (no mainframes or servers)
• Data entry centers
• Computer centers (in-house software developed and maintained)
• Field offices (operations)
Upgrade
Condition: Firm deadline
Non-compliant systems at deadline would be either:

– Removed from operational use and transferred/disposed
– Disconnected from the network and accorded strictly
stand-alone status
• No transfer of data other than hardcopy printout
• Prohibited “Sneaker-net” transfer by floppy drive, modem, or
other electronic means
Upgrade
Purpose of the upgrade
• Standardize the software and hardware throughout the

agency
• Modernize the computers and software components
• Dispose of obsolete components
• Introduce a layered operating system with administrative
and user privileges
Upgrade
Expected benefits
• More conducive to configuration management controls

• Easier to maintain and upgrade
• Barriers to prevent unauthorized software (games, favorite
programs, instant messaging, peer-peer networking)
• More secure and less susceptible to external or internal
hacking
Upgrade
Software
• Predominately COTS
– Business automation application suites
– Operating systems
• Network
• Individual workstations
• But also . . . programs developed in-house

– data entry and data analysis
• And . . . in-house customized COTS-based systems

Upgrade
Hardware
• Commercial grade desktop computers
– No consistency as to
• Age
• Manufacturer
• RAM, HD, I/O, capabilities
Upgrade
System Priorities
• Availability
• Data integrity
• Performance
• Security
Upgrade
Lessons Learned
94 personnel interviewed at 12 different locations
• Data entry clerks
• Inventory control specialists
• Information technology personnel
• Middle and upper management
• Secretaries and administrative assistants.
Upgrade
Lessons Learned Questionnaire
1. What were the biggest obstacles you found in
achieving nationwide compliance?
2. Based on what you have learned during the
nationwide compliance effort, what would you
want to see done differently if we were starting
the effort now?
3. How is training being handled in your
organization? Are there adequate resources? Is
time allocated for training?
Upgrade
The responses note:

1. Management change in direction
2. Lack of understanding in the side effects of management directives
3. Need for bottoms-up input into management decisions
4. Status assessment based on an inaccurate inventory database
5. Understaffing
6. Mismatched arrival of hardware and software
7. Unrealistic compatibility testing
8. Loss of functionality in the replacement software
9. Inadequate training
Upgrade
1. & 2. Change in management direction and
inadequate side effect analysis caused rework
– Upper management would direct paths to the goals that
were always not the most reasonable or efficient.
– Management failed to identify all the side effects of
their direction and the implications to the end users.
– Management would also direct action, then recall that
directive, and direct a different approach. This caused
frustration and rework.
Upgrade
3. Lack of bottoms-up input from users
• Project office held weekly conference calls on nationwide
status and twice weekly conference calls on issues, as well
as proactive, engaged working groups consisting of the
national office and field coordinators, but . . .
– Some of these issues were due to management issuing directives
without field input or vetting.
– Sometimes field-level management made the decisions for their
end-users without soliciting input.
– Ensuring a “bottoms up” comment process would have helped to
mitigate some of the rework.
Upgrade
1., 2. & 3. A process to vet upper management
decisions could help to reduce the amount of
rework by the end users.
 Choose a set of end users at diverse locales to “beta
test” the management directives.
 Distribute proposed directives for comment and ensure
both headquarters personnel and field users, reviewed
these proposals.
 Direct field management to allot time to end users to
ensure a meaningful review.
Upgrade
4. Inaccurate inventory database
– The IS inventory database was queried weekly to provide reports
on the HW and SW compliance status.
– Inaccuracies largely due to the small, mobile nature of the
computers and the large numbers of them to track.
– Effort to correct IS database was being worked by another office,
but did not coincide with the project office milestones.
– As the inventory database became more accurate, a realistic
estimation of the work done and yet to be done became clearer as
the project deadline approached.
– Modern inventory control methods, such as radio frequency ID
(RFID) tagging could be used for more accurate tracking of small,
easily movable components.
Upgrade
5. Excessive workload, staffing shortages
• Field coordinators worked the upgrade effort in addition to
their normal workload.
– Because of this, some field coordinators were not as dedicated to
the upgrade effort as others
• Additional staff (skilled support) was used at HQ
 Management should also augment staffing levels in the

field.
Upgrade
6. Mismatched arrival of replacement HW/SW
• Delays were as long as several months, due to vendor supply shortages
• Inadequate storage space for new HW platforms arriving first
• Old HW platforms could not be removed without complete
replacements
• Incompatible or upgrade violation: new SW and old HW; new HW
and old SW
• Installation of new SW on old incompatible platforms caused systems
failures and rework for reformatting and reinstallation of old SW.
Upgrade
7. Lack of realistic compatibility testing
– Compatibility testing was not independent and had no oversight.
The division overseeing the upgrade did the compatibility testing.
– Testing was done on “idealized” machines with the new COTS
software suite at HQ, not in the field, not by field end-users, not
with in-house field application, and not under field workload
conditions.
– Most of the software was compatible, BUT some critical
incompatibilities existed with the field applications not used at
HQ.
– Testing should have followed software engineering “best
practices” independent tests, oversight, and under operational
conditions.
Upgrade
8. Loss of functionality in the new software
• Transition from one office automation software suite (i.e. Word
Perfect, Lotus, Oracle, email) to another (MS Word, Excel, Access,
Outlook).
• Data-entry transition: line-entry to graphical, mouse-driven system.
• In-house developed replacement programs incurred the "loss of
functionality" complaint less often than COTS software.
– In-house programming team came closer to maintaining the basic
functions of the old in-house software.
– "Not all the right features" and "too many unneeded features" issues are
common to COTS based systems (not tailored to a specific task, but
instead are developed for mass market sales).
• A more rigorous COTS selection process could mitigate the
inadequacies.
Upgrade
– Usually did not address the new skill sets needed.

– Not always tailored for the different skill levels.
– Frequently expected to be self-taught.
– Expected to occur in addition to the normal workload.
Upgrade
• The most successful training: combination of introductory overview,
followed by hands-on classroom training with a knowledgeable and
motivated instructor.
• Unsuccessful training:
– Sometimes none at all
– Unmotivated, poorly trained instructors
– Self-study -- a CD ROM at one's desktop, (ok if computer literate,
overwhelming if not)
• Information technology and computer specialists adapted with ease.
• Non-computer oriented people, such as secretaries, administrative
assistants, and data entry clerks, had much more trouble.
Upgrade
Benefits
– Uniform nationwide software and hardware
• More effective configuration management
• Easier maintenance and upgrades installation
– Cyclical replacement or upgrades hardware and
software easier to plan and effect
– Layered operating systems (administrative user
privileges) inhibit installation of ad hoc end user
software
– End Game Assessment has applications for recovery
from hostile information system attacks
Observations
• Not all risks are known and can be planned for in
advance
• This project mitigated many of the problems
encountered during the transition by:
– Consistent monitoring of the hardware and software
components’ compliance status using inventory data
– An “issues” database, addressed weekly
– A “risks” database for issues with a probability of
occurrence with negative consequences, addressed every
two weeks
Possibilities for Future Research
• Quantification of the loss due to rework and the
effect of double or treble responsibilities on the
lower level staff (data not available)
• Further investigation on the consequences of
effective and ineffective management decisions
• Development of metrics for risk analysis

Upgrade
Actual Compliance Growth from Inventory Data
Compliance
Compliance Growth -- Field Division Growth --
120 120.00
100 100.00
80 80.00
60 60.00
Percent Compliant Percent Compliant
40 40.00
20 20.00
0 0.00
1357911
11
31
5172
9212
3252
73
93133
5374
9414
3454
7595
15355
7691 1357911
11
3151
7292
12
3252
7393
133
53
7494
14
3454
75953
1557
556
91
Week Week
Figure 1. Percentage Field Compliance Growth Figure 2. Percentage IS Compliance Growth
Metrics Considerations
• The determination of an appropriate set of metrics to analyze riskduring the
maintenance phase of a distributed system upgrade
• Standard (actual data), as well as normalized metrics
– Normalizing would deal with the varying number of devices (with this sample data, the total number of units in
the system changes as new units are added, old ones are disposed of, and as the inventory accuracy grows). By
normalizing the metric suite, we can compare distributions of different size.
• An adaptive sizing model that deals not with the total number of system
devices (units), but only with the units modified during a certain period
– A period of time, i.e., a week
– A time-independent period, or threshold, determined by number of devices, i.e. 100. Using this model, instead of
dividing by the actual number of devices in the system, as in the standardized model, would be divided by the
number of recently modified devices.
– A sliding window might used as well.
• A history complexity metric for each location (component) to assess the effect
of the complexity of the period.
– This could help determine if risk increases during bursty or chaotic periods and during periods of high activity.
• Validation: Statistical analysis of the results of the actual compliance growth

(from the inventory data) vs. the results of the alternative metrics.
Table 1. Software Metrics Sets Comparison
# Software Software US Objective 9 Breadth of Testing Test Coverage Measures the
T&E Panel Engineering Government Testing Progress extent and
(STEP) Institute AMC-P-70- success of
test
(SEI) 14
coverage.
AMC-9-70- Provides
14 indication of
1 Cost --------------- Cost Track sufficiency
Deviations development of testing.
expenditures 10 Depth of Testing Test Measures the
2 Schedule Schedule Schedule Track Testing Progress Sufficiency degree to
Progress Deviations progress vs. which the
schedule required
functionality
Readiness to
has been
proceed to successfully
next phase demonstrated
3 Computer Computer Computer Track 11 Fault --------------- Defect (Fault) Measures the
Resource Resource Resource planned vs. Profiles Density number of
Utilization Utilization Utilization actual faults vs.
memory time. Tracks
utilization, open, closed
I/O channels, trouble
reports by
throughput
priority
4 Software --------------- Software Assessment 12 Reliability --------------- --------------- Attempts to
Engineering Development of predict
Environment Tools contractor’s system
(CMM level) development downtime by
environment tracking
5 Requirements --------------- --------------- Traces trouble
Traceability requirements reports using
to design, math models
code, and 13 Manpower Personnel Development Tracks
(optional) Manpower personnel
test
loading and
6 Requirements Software Requirements Measures turnover
Stability Vol atility, Definition & changes in 14 Development --Design Development Provides
Unit Stability requirements progress Progress Progress percentage
Development and the (optional) –Unit (Completeness) of
Progress effect on development development
development progress completion
effort – Testing across all
7 Design Software Design Track design Progress phases and
work
Stability Vol atility, Structure changes and elements
Design their effect 15 Incremental --------------- Tracks
Complexity of Release schedule and
development Content units per
and release to
configuration track
8 Complexity Software --------------- Provides functional
Size indication of preservation
Design potential 16 Supportability Indicates
Complexity problem how
easy/difficult
areas where the software
test effort will be to
should be maintain
focused.
One Possibility - Entropy Metrics
• Essentials of mathematical theory of information - “A
Mathematical Theory of Communication,” Claude Shannon
(1949)
– Established fundamental bounds on the performance of
communication systems in the presence of noise
– Exerted an enormous influence on many disciplines --
communications, biology, mathematics, physics.
• Entropy has been defined in terms of the information content of

software and used to measure code complexity
– Also has been used effectively as an indicator for reusability
Entropy Metrics
Have been used successfully to:
• Measure SW quality during SW development

– Used directed graphs to model a SW system and measure coupling, cohesion,
size, length, complexity at module level (Allen, Khoshgoftaar, et al, 1996 -
present)
– Measured complexity in object-oriented design (Davis, Etzkorn, Bansiya,
Gholston, et al, 1999 - present)
• Measure SW complexity during SW maintenance
– Based on message flow between modules, extended to COTS (Chapin, 1988)
• Measure Cost Growth during large-scale system development
– Queried experts to develop a cost model validated to 3% accuracy vice 300%
predicted (Martin, Lenz, Glover, et al, 1981)
• Measure SW complexity using process entropy (not code)
– Theorized the number of times a module was modified adversely affected code
complexity, validated 13%-45% improvement (Hassan & Holt, 2003)
Possible Entropy Metrics - Division 1 SC1
SC1
SC2
SC2
SC3 SC3
SC4
SC4
SC5 SC5
SC6 SC6
SC7
SC7
Inventory Stability SC8
Compliance Growth SC8
SC9
SC9
SC10 SC10
R1
R1
R2 R2
R3 R3
R4
R4
SC1 SC1
SC2 SC2
SC3 SC3
SC4 SC4
SC5 SC5
SC6 SC6
SC7 SC7
Devices Isolate Devices Needing
SC8 SC8
to Standalone Modification
SC9 SC9
SC10 SC10
R1 R1
R2 R2
R3 R3
R4 R4
Backup Slides
Possible Entropy Metrics - Division 2
CC1 CC1
CC2 CC2
CC3 CC3
HQ1 HQ1
HQ2 HQ2
HQ3 HQ3
HQ4 HQ4
Compliance Growth HQ5 Inventory Stability HQ5
HQ6 HQ6
HQ7 HQ7
NC1 NC1
NC2 NC2
NC3 NC3
Aux Aux
CC1 CC1
CC2 CC2
CC3 CC3
HQ1 HQ1
HQ2 HQ2
HQ3 HQ3
Devices Isolate HQ4 Devices Needing HQ4
HQ5 HQ5
to Standalone Modification
HQ6 HQ6
HQ7 HQ7
NC1 NC1
NC2 NC2
NC3 NC3
Aux Aux
Shannon’s Equation
C.E. Shannon, in “A Mathematical Theory of
Communication,” (1948) proposed to measure the amount
of uncertainty, or entropy, in a distribution by the following
equation:
n
Hn(P) = - (pk * log2 pk)
k=1
n
where pk  0,  k  1, 2, . . . n and  pk = 1
k=1

Metrics in Risk Determination For Large-Scale Distributed Systems Maintenance

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Metrics in Risk Determination For Large-Scale Distributed Systems Maintenance

Uploaded by

Copyright:

Available Formats

Metrics in Risk Determination

for Large-Scale Distributed

Non-compliant systems at deadline would be either:

• Standardize the software and hardware throughout the

• More conducive to configuration management controls

• But also . . . programs developed in-house

• And . . . in-house customized COTS-based systems

The responses note:

 Management should also augment staffing levels in the

– Usually did not address the new skill sets needed.

• Development of metrics for risk analysis

• Validation: Statistical analysis of the results of the actual compliance growth

• Entropy has been defined in terms of the information content of

• Measure SW quality during SW development

You might also like