You are on page 1of 7

BENCHMARKS

In computing, a benchmark is the act of running a computer program, a set of


programs, or other operations, in order to assess the relative performance of an
object, normally by running a number of standard tests and trials against it.

Performance benchmarking is the science of making objective


assessments of theperformance of one system over another. Benchmarks
are also useful for assessing performance improvements obtained by
upgrading a computer or its components. Good benchmarks enable us to
cut through advertising hype and statistical
tricks. Ultimately, good benchmarks will identify the systems that
provide good
performance at the most reasonable cost.

Each benchmark tries to answer the question: “What computer should I


buy?”

 Clearly, the answer to the question is “The system that does the job with
the lowest cost-of-ownership”.

 Cost-of-ownership includes

 project risks, programming costs, operations costs, hardware costs,


and software costs.

The term benchmark is also commonly utilized for the purposes of elaborately


designed benchmarking programs themselves.

Benchmarking examples

1. Before

 Intel Pentium IV, 3.0 GHz, 800 MHz FSB, 1 MB Cache

After

 x86-microprocessor with a performance giving a minimum score of


193 under the benchmark Sysmark 2004 rating

2. Before

 Intel Pentium 4, 3 GHz or equivalent


After

 x86 microprocessor with the following performance scores:

 between 165 and 205 under the Sysmark 2004 overall office
productivity benchmark

 between 200 and 235 under the Sysmark 2004 overall


internet content creation benchmark

 between 180 and 220 under the Sysmark 2004 rating

Performance benchmarking is the science of making objective assessments of


theperformance of one system over another. Benchmarks are also useful for
assessing performance improvements obtained by upgrading a computer or its
components

Benchmarks provide a method of comparing the performance of various


subsystems across different chip/system architectures.

Purpose of Benchmark
As computer architecture advanced, it became more difficult to compare the
performance of various computer systems simply by looking at their specifications.
Therefore, tests were developed that allowed comparison of different architectures.

For example, Pentium 4 processors generally operated at a higher clock frequency


than Athlon XP or PowerPC processors, which did not necessarily translate to
more computational power; a processor with a slower clock frequency might
perform as well as or even better than a processor operating at a higher frequency.

Benchmarks are designed to mimic a particular type of workload on a component


or system. Synthetic benchmarks do this by specially created programs that impose
the workload on the component. Application benchmarks run real-world programs
on the system. While application benchmarks usually give a much better measure
of real-world performance on a given system, synthetic benchmarks are useful for
testing individual components, like a hard disk or networking device

Benchmarks are particularly important in CPU design, giving processor architects


the ability to measure and make tradeoffs in microarchitectural decisions. For
example, if a benchmark extracts the key algorithms of an application, it will
contain the performance-sensitive aspects of that application. Running this much
smaller snippet on a cycle-accurate simulator can give clues on how to improve
performance.

Manufacturers commonly report only those benchmarks (or aspects of


benchmarks) that show their products in the best light. They also have been known
to mis-represent the significance of benchmarks, again to show their products in
the best possible light. Taken together, these practices are called bench-marketing.

Ideally benchmarks should only substitute for real applications if the application is
unavailable, or too difficult or costly to port to a specific processor or computer
system. If performance is critical, the only benchmark that matters is the target
environment's application suite

Challenges of benchmarks

 Some vendors have been accused of "cheating" at benchmarks — doing


things that give much higher benchmark numbers, but make things worse on
the actual likely workload.
 There are few (if any) high quality benchmarks that help measure the
performance of batch computing, especially high volume concurrent batch
and online computing. Batch computing tends to be much more focused on
the predictability of completing long-running tasks correctly before
deadlines, such as end of month or end of fiscal year. Many important core
business processes are batch-oriented and probably always will be, such as
billing.
 Benchmarking institutions often disregard or do not follow basic scientific
method. This includes, but is not limited to: small sample size, lack of
variable control, and the limited repeatability of results

Properties of benchmarks

There are seven vital characteristics for benchmarks.These key properties are:
[1] Relevance: Benchmarks should measure relatively vital features.
[2] Representativeness: Benchmark performance metrics should be broadly
accepted by industry and academia.
[3] Equity: All systems should be fairly compared.
[4] Repeatability: Benchmark results can be verified.
[5] Cost-effectiveness: Benchmark tests are economical.
[6] Scalability: Benchmark tests should measure from single server to multiple
servers.
[7] Transparency: Benchmark metrics should be easy to understand

Types of benchmark
1. Real program
o word processing software
o tool software of CAD
o user's application software (i.e.: MIS)
2. Component Benchmark / Microbenchmark
o core routine consists of a relatively small and specific piece of code.
o measure performance of a computer's basic components [5]
o may be used for automatic detection of computer's hardware
parameters like number of registers, cache size, memory latency, etc.
3. Kernel
o contains key codes
o normally abstracted from actual program
o popular kernel: Livermore loop
o linpack benchmark (contains basic linear algebra subroutine written
in FORTRAN language)
o results are represented in Mflop/s.
4. Synthetic Benchmark
o Procedure for programming synthetic benchmark:
 take statistics of all types of operations from many application
programs
 get proportion of each operation
 write program based on the proportion above
o Types of Synthetic Benchmark are:
 Whetstone
 Dhrystone
5. I/O benchmarks
6. Database benchmarks
o measure the throughput and response times of database management
systems (DBMS)
7. Parallel benchmarks
o used on machines with multiple cores and/or processors, or systems
consisting of multiple machines
SPEC

The Standard Performance Evaluation Corporation (SPEC) is a non-profit


corporation formed to establish, maintain and endorse a standardized set of
relevant benchmarks that can be applied to the newest generation of high-
performance computers. SPEC develops benchmark suites and also reviews and
publishes submitted results from our member organizations and other benchmark
licensees.

SPEC members and associates

SPEC Members:

 3DLabs * Acer Inc. * Advanced Micro Devices * Apple Computer, Inc. *


ATI Research * Azul Systems, Inc. * BEA Systems * Borland * Bull S.A. *
CommuniGate Systems * Dell * EMC * Exanet * Fabric7 Systems, Inc. *
Freescale Semiconductor, Inc. * Fujitsu Limited * Fujitsu Siemens *
Hewlett-Packard * Hitachi Data Systems * Hitachi Ltd. * IBM * Intel * ION
Computer Systems * JBoss * Microsoft * Mirapoint * NEC - Japan *
Network Appliance * Novell * NVIDIA * Openwave Systems * Oracle *
P.A. Semi * Panasas * PathScale * The Portland Group * S3 Graphics Co.,
Ltd. * SAP AG * SGI * Sun Microsystems * Super Micro Computer, Inc. *
Sybase * Symantec Corporation * Unisys * Verisign * Zeus Technology *

SPEC Associates:

 California Institute of Technology * Center for Scientific Computing (CSC)


* Defence Science and Technology Organisation - Stirling * Duke
University * JAIST * Kyushu University * Leibniz Rechenzentrum -
Germany * National University of Singapore * New South Wales
Department of Education and Training * Purdue University * Queen's
University * Rightmark * Stanford University * Technical University of
Darmstadt * Texas A&M University * Tsinghua University * University of
Aizu - Japan * University of California - Berkeley * University of Central
Florida * University of Illinois - NCSA * University of Maryland *
University of Modena * University of Nebraska, Lincoln * University of
New Mexico * University of Pavia * University of Stuttgart * University of
Texas at Austin * University of Texas at El Paso * University of Tsukuba *
University of Waterloo * VA Austin Automation Center *

SPEC Supporting Members:

 EP Network Storage Performance Lab * SuSE Linux AG *

SPEC Tools
 SPEC SERT Suite 2.0  .The SERT suite 2.0 adds a single-value metric,
reduces runtime, improves automation and testing, and broadens device
and platform support.
 SPEC SERT Suite 1.1.1. The SERT suite 1.1.1 is the most current SERT
version supported by the U.S. EPA Energy Star v2.0 program. Designed to
be simple to configure and use via a comprehensive graphical user
interface, the SERT suite uses a set of synthetic worklets to test discrete
system components such as processors, memory and storage, providing
detailed power consumption data at different load levels.
 SPEC Chauffeur WDK Tool. The Chauffeur™ WDK (Worklet Development
Kit) Tool was designed to simplify the development of workloads for
measuring both performance and energy efficiency
 PTDaemon. The power temperature daemon (also known as PTDaemon) is
used to offload the work of controlling a power analyzer or temperature
sensor during measurement intervals to a system other than the SUT.

SPEC benchmarks

 CPU
 Graphics/Applications
 HPC/OMP
 Java Client/Server
 Mail Servers
 Network File System
 Web Servers

SPEC CPU benchmark

SPEC CPU2000 V1.3

 Technology evolves at a breakneck pace.


 SPEC CPU2000 is the next-generation industry-standardized CPU-intensive
benchmark suite.
 SPEC designed CPU2000 to provide a comparative measure of compute
intensive performance across the widest practical range of hardware.
 The implementation resulted in source code benchmarks developed from
real user applications.

These benchmarks measure the performance of the processor, memory and


compiler on the tested system

You might also like