You are on page 1of 7

Quality Assurance for Stable Server

Operation

 Masafumi Matsuo  Yuji Uchiyama  Yuichi Kurita

Server products are the core elements of mission-critical systems in society and
business and are required to operate stably at all times. Furthermore, as technology
continues to advance rapidly and the environment surrounding server products
changes, the quality that customers demand is changing. In addition to providing
high functionality and high performance, servers must support virtualization and
energy-saving functions and the migration to Cloud services. In the light of these
changes, quality assurance for assuring stable operation in server products is also
evolving by optimizing development processes and developing new evaluation
techniques. This paper introduces server-product development processes and
evaluation techniques based on the policy of “ensuring product quality right from
the start” and quality management in pursuit of quality, cost, and delivery.

1. Introduction both of which reflect Fujitsu’s passion for ensuring


As the backbone of mission-critical systems product quality right from the start. We then
in society and the corporate world, servers must introduce quality management at Fujitsu from
provide stable operating quality. Ensuring design quality to operating quality based on the
server quality is an important requirement for ISO9001 standard and describe the features of
achieving a level of server operation acceptable the manufacturing process at Fujitsu IT Products
to customers. Ltd. (FJIT), a Fujitsu Group manufacturing plant
At Fujitsu, we work to achieve high quality in Japan focused on monozukuri (innovative
in our server products and provide stable manufacturing).
operation in the customer’s system through an
extensive design review, an uncompromising 2. Development processes
system evaluation, and validation testing, supporting quality assurance
all based on the concept of “ensuring product At Fujitsu, quality is built into the server
quality right from the start.” We also manage new-product development process according
quality information obtained in all development to a system of internal standards, as shown
processes and information about post-delivery in Figure 1. The results of each development
operating quality in the customer’s system and phase are reviewed in assessment conferences
use this data as a basis for quality management. and review meetings, and a transition to the
In this paper, we begin by introducing next development phase is allowed only if those
development processes that support quality results satisfy established standards. This
assurance and evaluation methods that assure operation system is based on the concept of
stable operation in recognition of market needs, ensuring product quality right from the start.

164 FUJITSU Sci. Tech. J., Vol. 47, No. 2, pp. 164–170 (April 2011)
M. Matsuo et al.: Quality Assurance for Stable Server Operation

Design review
Development Basic design Enter-testing Enter- Shipping
Design testing production
plan review review assessment assessment
study meeting assessment
meeting meeting conference conference conference

shipping support
Specific design

production and
Design testing
Basic design

Commercial
Validation
Planning

testing
Development planning
Concept design

Architect team and development process monitor

Figure 1
Outline of development processes.

Here, the design phase has the most impact on of each component-development group. The
quality, and quality assurance methods in this team’s tasks range from reaching consensus
phase can be broadly divided into two types. on specifications and planning a development
The first applies know-how gained in previously process to actual process management. The
developed products to the design-review process position of “development process monitor” has
in new product development and strives to also been introduced inside the quality assurance
prevent the recurrence of problems experienced department, which checks the process phase
in the past. The second type uses logic simulators transitions from a third-party perspective, so that
and server-structure simulators incorporating the validity of the development process can be
advanced techniques with the aim of improving judged from both inside and outside viewpoints.
design quality. The development processes
from planning to design testing are conducted 2.2 High quality right from the start
under the supervision of the design department, In the waterfall style of development,
while those from validation testing onward are incorporating high quality in the design phase
handled by the quality assurance department, is an important point in providing the customer
which checks the quality of new products from with a stable system. In the past, the design
the customer’s viewpoint. The characteristic testing phase went only as far as performing
elements of these development processes are function checks on a component-by-component
summarized below. basis. This approach could result in development
delays due, for example, to fatal logic changes
2.1 Architect team and development when performing validation testing at the system
process monitor level after entering the testing phase. For this
To enhance the quality of the entire system reason, the development department and quality
in addition to that of individual components, assurance department decided to work together
an architect team is formed from members on emphasizing system quality right from the

FUJITSU Sci. Tech. J., Vol. 47, No. 2 (April 2011) 165
M. Matsuo et al.: Quality Assurance for Stable Server Operation

development phase. virtual simulators that use Fujitsu’s Virtual


Product Simulator tool. The aim here is to
2.2.1 Design quality improvement by eliminate manufacturing problems early. These
proactive evaluation activities enable an integrity level equal to the
To improve design quality, validation items production model to be achieved in the validation
concerned with high load contentionnote 1) and testing phase, thereby achieving stable quality.
accelerated margin testsnote 2) have been moved up
to design testing, thereby providing a mechanism 3. Evaluation methods for
for detecting critical problems caused by assuring stable operation
operation timing and fluctuations. In addition, This section describes evaluation methods for
to stabilize the quality of commercial production assuring stable operation in servers. An essential
as early as possible, the production process is requirement of a mission-critical server is stable
launched at the test-machine manufacturing step operation with no interruption of the customer’s
and manufacturing quality is examined closely. business processes. To meet this requirement,
This enables latent problems in the commercial many measures for ensuring reliability are
production process to be resolved beforehand. implemented in hardware and software at the
To improve the evaluation accuracy, the design stage. These measures are evaluated in
test coverage in each test phase is quantified and an environment equivalent to actual operation in
diagnosis rates are calculated to provide feedback the customer’s system. The following introduces
to design testing. These measures provide a these characteristic evaluation methods.
mechanism for detecting critical failures at an
early stage so that appropriate measures can be 3.1 System RAS evaluation for customer’s
taken. operating environment
Past validation testing involved product
2.2.2 Process improvement in server- specification reviews and testing based on internal
structure evaluation standards. However, this approach did not take
In the past, server-structure evaluation into account the customer’s actual operating
in the quality assurance department consisted environment, which led to various problems. The
of checking equipment in the validation testing majority of these problems involved defective
phase. However, a manufacturing problem recovery operations in hardware and software
discovered at that time could require some time originating in the timing of failed parts. In the
to resolve, causing shipping to be delayed. customer’s operating environment, software can
In response to this problem, the support behave in unforeseen ways leading to system
department, commercial-production department, crashes. This presented a quality assurance
and quality assurance department now join problem with regard to recovery operations
forces from the specific design stage to perform after the failure of a part linking hardware and
thorough cross-checking using pilot models and software.
To deal with this problem, Fujitsu created
note 1) For example, high load contention tests a verification tool that links recovery operations
increase and decrease the frequency of and hardware/software and can thus evaluate
memory access from both the input/output
ports and central processing unit. system reliability, availability, and serviceability
note 2) Accelerated margin tests change the (RAS) in the customer’s assumed operating
operation speed of a semiconductor device by environment. This tool automatically and
varying the voltage, temperature, frequency,
etc. repeatedly generates hardware false failures and

166 FUJITSU Sci. Tech. J., Vol. 47, No. 2 (April 2011)
M. Matsuo et al.: Quality Assurance for Stable Server Operation

Storage BBC tester


system
IP router

Industry standard
server

Server

All pins are


automatically
clipped by a robot.

Pseudo-failures are generated by


0-V clipping of all physical pins and
exhaustive RAS testing is performed
for part failures.

BBC: Black box clip


IP: Internet protocol
Figure 2
Outline of BBC tester.

then checks resulting system operation to isolate


problems that depend on the timing of failures. Server

In addition, problems related to failures Software configuration

that occur only rarely in the customer’s system,


Patches
such as short circuits between signals, are also
checked using Fujitsu’s Black Box Clip tester, Operating system
version number
which automatically and repeatedly generates
failures of this type (Figure 2). This tester
clips physical signal pins in hardware to 0 V Hardware combinations
to generate pseudo-failures and evaluate their Partitioning

impact on the system from the RAS perspective. Memory


Input/output
This evaluation is performed exhaustively for
・・・
all pins in the pursuit of high quality across the
entire system.
Figure 3
3.2 System-wide compatibility evaluation Evaluation of compatibility between hardware
and software.
System evaluation is not limited to the
abovementioned system RAS evaluation. It
also targets compatibility between hardware benchmark tools and data processing to perform
and software by extracting configurations system evaluations under high-load, low-load,
(partitioning, memory, and input/output) and high-load-contention operating conditions.
envisioned by product specifications from
a hardware perspective and considering 3.3 Application of automation technology
combinations of operating system versions and In the past, testing was limited to fixed
patches from a software perspective (Figure 3). system configurations, which meant that
Stable operation is also assured by combining sufficient testing could not be performed with

FUJITSU Sci. Tech. J., Vol. 47, No. 2 (April 2011) 167
M. Matsuo et al.: Quality Assurance for Stable Server Operation

regard to timing, completeness, etc. To solve enables problems arising in each development
this problem, Fujitsu is combining a function for phase to be reported periodically to the project
dynamically changing the system configuration leader so that the risk of development-process
(dynamic reconfiguration) with automation delays and quality degradation can be reduced at
technology to perform detailed configuration an early stage.
testing. In this way, problems such as memory
leaks, which in the past were difficult to detect 4.2 Commercial-production quality
until they became obvious, can be detected at an management
early stage through constant monitoring by an In the commercial-production stage,
automated tool. manufacturing quality is not the only matter of
concern. It is also important to achieve a balance
4. Quality management of quality, cost, and delivery (QCD).
To continuously provide products that On the basis of this idea, Fujitsu is working
satisfy customer needs, Fujitsu is expanding to construct optimal manufacturing lines in
quality improvement activities at every phase pursuit of QCD and proactively introduce new test
of product development, commercial production, processes after the design review stage. Fujitsu
and customer service on the basis of the quality is also establishing checkpoints in the stage
assurance concepts in quality management preceding shipping assessment and introducing
systems (ISO9001). processes for observing the characteristics of
manufacturing systems that can maintain stable
4.1 Design quality management shipping at production plants. After commercial
Design quality is built into upstream production begins, quality targets are set for
development processes. In other words, the each manufacturing process, and the plan–do–
results output from the planning, basic design, check–act (PDCA) cycle is repeated in quality
and specific design phases determine product improvement activities to maintain quality and
quality. To raise the degree of completion in each pursue QCD.
of these design phases, Fujitsu not only performs The quality information obtained in
progress and problem checking, as is generally these activities is shared by various business
done, but also draws up a list of development departments and the quality assurance
risks that can be envisioned beforehand in department with the aim of achieving company-
the planning phase while also assessing risk wide dissemination of quality information. This
conditions at review meetings and assessment quality information is also being used as input for
conferences held at the time of phase transitions. improving design quality in subsequent server
The review meetings and assessment models.
conferences decide whether to allow a phase
transition to occur by performing examinations 4.3 Operation quality management
according to assessment standards established To supply products with exceptional quality
for each phase and by assessing the impact of and maintain stable operation in delivered
the risk on subsequent processes. In addition, products, Fujitsu sets quality targets for each
processing monitoringnote 3) by a third party product and promotes field quality management.
These quality targets are determined by
note 3) Processing monitoring monitors and surveying quality trends throughout the industry
objectively evaluates whether the rules and with an eye to exceeding the quality levels of
procedures established in the development
processes are being observed. other companies. Target values are subject to

168 FUJITSU Sci. Tech. J., Vol. 47, No. 2 (April 2011)
M. Matsuo et al.: Quality Assurance for Stable Server Operation

periodic review to improve quality even further. time production and streamlining by process
Operation quality is maintained mainly combining and process mixing.
by data evaluation based on statistical methods
and recurrence prevention activities based on 6. Future issues
thorough root cause analysis. Recent improvements in semiconductor
Data evaluation based on statistical methods technology have been accompanied by higher
borrows ideas from reliability engineering to levels of integration to the point where logic
analyze trends in early life failure, random circuits that in the past could be achieved only
failure, and abrasion failure for each model and using several chips can now be consolidated on a
component/unit. The results are used to predict single chip. This development makes for smaller
future quality trends and take preventive actions and lighter equipment but also magnifies the
as soon as possible. risk since a single logic failure can now have a
If a peculiar trend is observed from trend big impact on the development process through,
analysis or failure cause, an expert team conducts for example, the need for time-consuming fixes.
a thorough hunt for the root cause and oversees This risk makes logic design simulation and
activities aimed at halting the spread of damage testing that can cover new technologies all the
and preventing recurrence of the failure. If a more important in design reviews.
design factor is involved, the root cause for both There is also a demand for power-saving
a built-in cause and failure-occurrence cause is equipment as Cloud services that use large-
clarified using the “5 Whys” method of analysis scale systems in data centers expand. However,
and the PDCA cycle is repeated to provide while energy-saving modes exist for reducing
feedback to the development processes. power when equipment is not being used, they
Operating quality data is shared among the also present new problems related to operation
development, quality assurance, and support timing. This situation calls for the development
departments as well as plant departments at of evaluation techniques from a new perspective.
regular quality meetings in an effort to improve
quality. 7. Conclusion
This paper described Fujitsu’s efforts to
5. Focus on monozukuri develop quality assurance techniques to support
Fujitsu’s core server products are stable operation in its server products. These
manufactured by FJIT, whose business policy efforts seek to fulfill departmental missions
is to improve customer satisfaction through the through optimal QCD while improving quality
pursuit of QCD. To this end, FJIT interacts assurance in response to ever-changing market
closely with the Fujitsu development department trends and customer needs. Looking forward,
and quality assurance department and works Fujitsu will pursue evaluation techniques from
constantly to achieve optimal monozukuri new perspectives in the face of market trends
while repeating the PDCA cycle to make daily toward Cloud computing and will work to
improvements with a QCD balance in mind. develop optimal quality management from the
Fujitsu has recently initiated company- perspective of the Cloud customer.
wide production-innovation activities by
introducing the Toyota Production System to
enhance monozukuri from a customer-centric
perspective. Prime activities here are continuous
flow processing focused on one-piece-at-a-

FUJITSU Sci. Tech. J., Vol. 47, No. 2 (April 2011) 169
M. Matsuo et al.: Quality Assurance for Stable Server Operation

Masafumi Matsuo Yuichi Kurita


Fujitsu Ltd. Fujitsu Ltd.
Mr. Matsuo is engaged in the evaluation Mr. Kurita is engaged in the quality
and quality management of enterprise management of enterprise servers.
servers.

Yuji Uchiyama
Fujitsu Ltd.
Mr. Uchiyama is engaged in the
evaluation of enterprise servers.

170 FUJITSU Sci. Tech. J., Vol. 47, No. 2 (April 2011)

You might also like