You are on page 1of 9

BENCHMARK

In computing, a benchmark is the act of running a computer program, a set of


programs, or other operations, in order to assess the relative performance of
an object, normally by running a number of standard tests and trials against it.

 Each benchmark tries to answer the question: “What computer should I


buy?”

 Clearly, the answer to the question is “The system that does the job
with the lowest cost-of-ownership”.

 Cost-of-ownership includes

 project risks, programming costs, operations costs, hardware


costs, and software costs.

The term benchmark is also commonly utilized for the purposes of elaborately


designed benchmarking programs themselves.

Benchmarking examples

1. Before

 Intel Pentium IV, 3.0 GHz, 800 MHz FSB, 1 MB Cache

After

 x86-microprocessor with a performance giving a minimum score


of 193 under the benchmark Sysmark 2004 rating
2. Before

 Intel Pentium 4, 3 GHz or equivalent

After

 x86 microprocessor with the following performance scores:

 between 165 and 205 under the Sysmark 2004 overall


office productivity benchmark

 between 200 and 235 under the Sysmark 2004 overall


internet content creation benchmark

 between 180 and 220 under the Sysmark 2004 rating

Performance benchmarking is the science of making objective assessments of


theperformance of one system over another. Benchmarks are also useful for
assessing performance improvements obtained by upgrading a computer or its
components.

Benchmarks provide a method of comparing the performance of various


subsystems across different chip/system architectures.

Purpose of Benchmark

As computer architecture advanced, it became more difficult to compare the


performance of various computer systems simply by looking at their
specifications. Therefore, tests were developed that allowed comparison of
different architectures.
For example, Pentium 4 processors generally operated at a higher clock
frequency than Athlon XP or PowerPC processors, which did not necessarily
translate to more computational power; a processor with a slower clock
frequency might perform as well as or even better than a processor operating
at a higher frequency. See BogoMips and the megahertz myth.

Benchmarks are designed to mimic a particular type of workload on a


component or system. Synthetic benchmarks do this by specially created
programs that impose the workload on the component. Application
benchmarks run real-world programs on the system. While application
benchmarks usually give a much better measure of real-world performance on
a given system, synthetic benchmarks are useful for testing individual
components, like a hard disk or networking device

Benchmarks are particularly important in CPU design, giving processor


architects the ability to measure and make tradeoffs
in microarchitectural decisions. For example, if a benchmark extracts the
key algorithms of an application, it will contain the performance-sensitive
aspects of that application. Running this much smaller snippet on a cycle-
accurate simulator can give clues on how to improve performance.

Manufacturers commonly report only those benchmarks (or aspects of


benchmarks) that show their products in the best light. They also have been
known to mis-represent the significance of benchmarks, again to show their
products in the best possible light. Taken together, these practices are
called bench-marketing.

Ideally benchmarks should only substitute for real applications if the


application is unavailable, or too difficult or costly to port to a specific
processor or computer system. If performance is critical, the only benchmark
that matters is the target environment's application suite

Challenges of benchmarks

 Some vendors have been accused of "cheating" at benchmarks — doing


things that give much higher benchmark numbers, but make things
worse on the actual likely workload.
 There are few (if any) high quality benchmarks that help measure the
performance of batch computing, especially high volume concurrent
batch and online computing. Batch computing tends to be much more
focused on the predictability of completing long-running tasks correctly
before deadlines, such as end of month or end of fiscal year. Many
important core business processes are batch-oriented and probably
always will be, such as billing.
 Benchmarking institutions often disregard or do not follow basic
scientific method. This includes, but is not limited to: small sample size,
lack of variable control, and the limited repeatability of results

Properties of benchmarks
There are seven vital characteristics for benchmarks.These key properties are:
[1] Relevance: Benchmarks should measure relatively vital features.
[2] Representativeness: Benchmark performance metrics should be broadly
accepted by industry and academia.
[3] Equity: All systems should be fairly compared.
[4] Repeatability: Benchmark results can be verified.
[5] Cost-effectiveness: Benchmark tests are economical.
[6] Scalability: Benchmark tests should measure from single server to multiple
servers.
[7] Transparency: Benchmark metrics should be easy to understand

Types of benchmark

1. Real program
o word processing software
o tool software of CAD
o user's application software (i.e.: MIS)
2. Component Benchmark / Microbenchmark
o core routine consists of a relatively small and specific piece of
code.
o measure performance of a computer's basic components[5]
o may be used for automatic detection of computer's hardware
parameters like number of registers, cache size, memory latency,
etc.
3. Kernel
o contains key codes
o normally abstracted from actual program
o popular kernel: Livermore loop
o linpack benchmark (contains basic linear algebra subroutine
written in FORTRAN language)
o results are represented in Mflop/s.
4. Synthetic Benchmark
o Procedure for programming synthetic benchmark:
 take statistics of all types of operations from many
application programs
 get proportion of each operation
 write program based on the proportion above
o Types of Synthetic Benchmark are:
 Whetstone
 Dhrystone
5. I/O benchmarks
6. Database benchmarks
o measure the throughput and response times of database
management systems (DBMS)
7. Parallel benchmarks
o used on machines with multiple cores and/or processors, or
systems consisting of multiple machines

SPEC
The Standard Performance Evaluation Corporation (SPEC) is a non-profit
corporation formed to establish, maintain and endorse a standardized set of
relevant benchmarks that can be applied to the newest generation of high-
performance computers. SPEC develops benchmark suites and also reviews
and publishes submitted results from our member organizations and other
benchmark licensees.

SPEC benchmarks

 CPU
 Graphics/Applications
 HPC/OMP
 Java Client/Server
 Mail Servers
 Network File System
 Web Servers

SPEC Supporting Members:

 EP Network Storage Performance Lab * SuSE Linux AG *

SPEC Tools

 SPEC SERT Suite 2.0  .The SERT suite 2.0 adds a single-value metric,
reduces runtime, improves automation and testing, and broadens device
and platform support.
 SPEC SERT Suite 1.1.1. The SERT suite 1.1.1 is the most current SERT
version supported by the U.S. EPA Energy Star v2.0 program. Designed
to be simple to configure and use via a comprehensive graphical user
interface, the SERT suite uses a set of synthetic worklets to test discrete
system components such as processors, memory and storage, providing
detailed power consumption data at different load levels.
 SPEC Chauffeur WDK Tool. The Chauffeur™ WDK (Worklet Development
Kit) Tool was designed to simplify the development of workloads for
measuring both performance and energy efficiency
 PTDaemon. The power temperature daemon (also known as PTDaemon)
is used to offload the work of controlling a power analyzer or
temperature sensor during measurement intervals to a system other
than the SUT.

SPEC members and associates

SPEC Members:

 3DLabs * Acer Inc. * Advanced Micro Devices * Apple Computer, Inc. *


ATI Research * Azul Systems, Inc. * BEA Systems * Borland * Bull S.A. *
CommuniGate Systems * Dell * EMC * Exanet * Fabric7 Systems, Inc. *
Freescale Semiconductor, Inc. * Fujitsu Limited * Fujitsu Siemens *
Hewlett-Packard * Hitachi Data Systems * Hitachi Ltd. * IBM * Intel *
ION Computer Systems * JBoss * Microsoft * Mirapoint * NEC - Japan *
Network Appliance * Novell * NVIDIA * Openwave Systems * Oracle *
P.A. Semi * Panasas * PathScale * The Portland Group * S3 Graphics Co.,
Ltd. * SAP AG * SGI * Sun Microsystems * Super Micro Computer, Inc. *
Sybase * Symantec Corporation * Unisys * Verisign * Zeus Technology *

SPEC Associates:

 California Institute of Technology * Center for Scientific Computing (CSC)


* Defence Science and Technology Organisation - Stirling * Duke
University * JAIST * Kyushu University * Leibniz Rechenzentrum -
Germany * National University of Singapore * New South Wales
Department of Education and Training * Purdue University * Queen's
University * Rightmark * Stanford University * Technical University of
Darmstadt * Texas A&M University * Tsinghua University * University of
Aizu - Japan * University of California - Berkeley * University of Central
Florida * University of Illinois - NCSA * University of Maryland *
University of Modena * University of Nebraska, Lincoln * University of
New Mexico * University of Pavia * University of Stuttgart * University of
Texas at Austin * University of Texas at El Paso * University of Tsukuba *
University of Waterloo * VA Austin Automation Center *

SPEC Supporting Members:

 EP Network Storage Performance Lab * SuSE Linux AG *

You might also like