P. 1
Exploiting Multi-Core Processors, Flash Memory and High Performance Networking

Exploiting Multi-Core Processors, Flash Memory and High Performance Networking

Views: 404|Likes:
Published by Best Tech Videos
Today's Web 2.0 and cloud computing datacenters severely underutilize the tremendous technology advances of recent years. Much potential is untapped, and scaling, performance, power, space, and complexity issues remain. This presentation examines these leading edge technologies, and discuss the challenges and opportunities in taking advantage of their promise.
Today's Web 2.0 and cloud computing datacenters severely underutilize the tremendous technology advances of recent years. Much potential is untapped, and scaling, performance, power, space, and complexity issues remain. This presentation examines these leading edge technologies, and discuss the challenges and opportunities in taking advantage of their promise.

More info:

Published by: Best Tech Videos on Apr 27, 2009
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

05/11/2014

pdf

text

original

Exploiting Multi-Core, Flash, and High Performance Networking in Web 2.

0 and Clouds Opportunities and Challenges
Dr. Brian O’Krafka, Fellow Darpan Dinker, Vice President of Database Technologies

© 2009

SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL

1

Web 2.0 and Cloud Challenges
Unmanageable Data Growth
 Web traffic  Web transactions  User interaction response time
Billion Transactions

Million Users

Time
Database Size

Web Traffic
[Source]

Web Transactions
[Source]

Response Time

Rack, Power, and Pipe Usage
 Increasing power costs  Limited datacenter rack space  Inefficient, underutilized hardware means wasted energy

Data Complexity
 Too much data to process effectively  Requires extensive data partitioning, application level mapping, caching, replication/recovery, load balancing  Commodity, non-application-specific hardware difficult to use and manage
© 2009 SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL 2

Computing Trends and New Technologies

Four transformational technologies:  Multi-core processors  State-of-the-art, enterprise-class flash memory  Low-latency interconnects  Optimized data access and caching applications

© 2009

SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL

3

Multi-Core Processors
Benefits:
 High computational density (4-8 cores/chip)  High speed data access (on-chip caches)  High thread-level throughput
 Multi-chip coherency

 Standard and commodity products

Challenges:
 Requires extensive optimization to exploit, including:
− High thread level parallelism in software − Granular concurrency control
− Memory affinity to a core

− Thread affinity to a core
© 2009 SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL 4

Leading-Edge Example: Intel® Core™ i7

 Announced April 1, 2009
 4 cores/chip, 8 simultaneous threads/chip  32kB L1, 512 kB L2, shared inclusive 8MMB L3 cache

 Fast switching, low power 45nm high-k metal gate silicon  Multi-chip shared memory coherency with point-to-point memory interconnects (4 x memory bandwidth)  Commodity and standard
© 2009 SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL 5

Enterprise-Class Flash Memory
Compared to HDD:
 100x faster  More reliable

Compared to DRAM:
 Consume 1/100th power of DRAM
 Much higher density→capacity

 Cheaper than DRAM  Persistent when written

Can be organized into modules of different capacities, form factors, physical and programmatic interfaces
© 2009 SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL 6

Flash Memory Challenges
Write access is different from read access behavior Writes done in large blocks (~128 kB)
 Small writes need to be buffered and combined into large blocks before writing (write coalescing)

Before writing, region must be erased  Need background garbage collection to create free regions for writing Limits on how many times erased (~100k)  Writes must be spread uniformly across total flash memory subsystem to maximize effective lifetime (wear levelling)
© 2009 SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL 7

All Flash Memory Is Not the Same

Read B/W Write B/W
HDD 100 mb/s 150.00 mb/s 70.00 mb/s 170.00 mb/s 0.13 mb/s

Erase Lat.

Read Lat.
5,000.00 us

Cost per GB
$0.10 $3.50 $11.00 $70.00 $75.00

NAND MLC 250 mb/s NAND SLC NOR SLC DRAM 250 mb/s 58 mb/s

3.5 ms 1.5 ms 5,000.00 ms

85.00 us 75.00 us 0.27 us 0.08 us

2,000 mb/s 2,000.00 mb/s

© 2009

SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL

8

NOR vs. NAND Comparison
NOR flash memory chips:
 Low density  Long write and erase latencies
 High cost

 Minimal technology investment by tier 1 memory suppliers  Used primarily in consumer devices

NAND flash memory chips:
 Leading SSDs all designed with NAND flash
 2-8x capacity of equivalent-sized NOR chips

 5x bandwidth for large reads, 100x write bandwidth  1/7th the cost of NOR
© 2009 SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL 9

Flash: SLC vs. MLC Comparison
 MLC increases density by storing more than a single bit per memory cell

 MLC costs less than SLC  MLC write bandwidth and erase latency ~2.5 times slower than SLC  MLC has significantly lower lifetime than SLC  SLC dominant in enterprise flash usage

© 2009

SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL

10

Wear Levelling: Typical Flash Lifetimes

Innodb - DBT2 8 Drives 64 GB/drive 45,000 TPM 750 NewOrders/Second 15 Buffer writes/NewOrder 4096 Bytes/page 11250 Writes/Sec per node 1406 Writes/Sec per drive 5.5 MB/Sec/drive 11930 seconds to rewrite drive 3.3 hours to rewrite drive 3.8 Write amplification 0.9 Page reprogram hours 87211 Hours lifetime 10.0 Years lifetime

Memcache 8 Drives 64 GB/drive 240,000 req/Sec 10% SET fraction 2048 Bytes/object 24000 SET/Sec per node 3000 Writes/Sec per drive 5.9 MB/Sec/drive 11185 seconds to rewrite drive 3.1 hours to rewrite drive 3.8 Write amplification 0.8 page reprogram hours 81760 Hours lifetime 9.3 Years lifetime

© 2009

SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL

11

Flash Form Factor/Interface: SSDs vs. PCIe Cards  Flash memory can be installed using SSD or PCIe cards  PCIe flash cards use lots of server processor cycles and memory for garbage collection, write coalescing, wear leveling, mapping:
─ SSDs have ASICs+memory that perform these functions efficiently close to the flash, freeing server processor cores and memory for productive use

 SSDs provide higher degree of parallelism, balance, and maintainability over PCIe flash cards:
– Configurations can be adjusted to match workload capacity, bandwidth, latency requirements – Servers can be provisioned with higher capacity than PCIe flash memory subsystems – Easier to replace SSDs than PCIe cards
© 2009 SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL 12

The Intel® X25-E Extreme SSD
 Leading-edge example of flash technology

 64 GB 2.5” SATA SSD  10 parallel channels accessing SLC NAND flash memory  Native command queuing for up to 32 concurrent operations  Each device delivers 55k of IOPS at 75 us read latency with write buffering for virtually instantaneous writes  On-board ASIC provides write coalescing, wear levelling, space management

© 2009

SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL

13

Challenges: Incorporating Flash Memory
Requires optimized apps and server operating environment to utilize potential I/O throughput and bandwidth, including:  High degree of parallelism  Granular concurrency control
 Low latency access paths, interrupt batching

 Write synchronization  Space management, caching, persistence control, wearlevel control Flash memory driver, controller, device firmware optimization and tuning required to match to workload characteristics
© 2009 SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL 14

High-Performance Interconnects
Latency 1Gb Ethernet 10Gb Ethernet 25.0 us 6.0 us Bandwidth 1 Gb/sec 10 Gb/sec

Benefits:  Latencies as low as six microseconds between server nodes  Feasible to distribute workloads across multiple servers and use replication to multiple server nodes  Provide high availability and data integrity
Challenges:

 Require apps to handle parallel communication, efficient initiation, completion, asynchrony, batching
© 2009 SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL 15

Challenges: Utilizing New Technologies
Using multi-core, flash, and high speed networking in Web 2.0 and cloud computing environments requires:
 High thread-level parallelism and granular concurrency control at each level  Data/thread affinity management down to the core level  Optimized path lengths for data access and thread context switching  Multiple caching levels to exploit component technology access time and bandwidth variations  Optimization for simultaneous multi-threading and flash memory
 Selecting, tuning, and balancing component technologies and configurations to match workloads

 Measurement and analysis tools for implementation and deployment optimization  Application and storage distribution  Failure-tolerant replication and recovery mechanisms
© 2009 SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL 16

Schooner Scalable Hardware Platform
 Intel Nehalem 5550 processors

 IBM System x3650 M2 server platform  512GB flash memory subsystem with highly parallel flash controllers and Intel X25e SSDs  High throughput networking: 1/10-GbE

© 2009

SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL

17

Schooner Operating Environment
Networked Service Application
1/10 G E-Net Management  Application Protocol Handling

Data Fabric
Object Attributes  Thread and Core Management Synchronization/Concurrency Management  DRAM Cache Management Container Management  Object Metadata Management  Replication Management

Admin
Flash Management Subsystem
Space Allocation and Shard Management Object Replacement (cache mode) Persistency Management Tiered Storage Management Configure Monitor Control Optimize

Flash Access
Asynchronous IO Handling  Data Striping, Interrupt Batching

© 2009

SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL

18

Schooner Appliance for Memcached

Increases throughput, network bandwidth, improves power efficiency compared to legacy Memcached servers

© 2009

SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL

19

Schooner Appliance for MySQL Enterprise

DBT2 TPM per Watt

55,000

150

100

50

7,000

0 InnoDB/Disks Schooner

Provides 8x performance increases over legacy disks

© 2009

SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL

20

Schooner Data Access Appliances
First two Schooner appliances:
 Schooner Appliance for Memcached  Schooner Appliance for MySQL Enterprise

 Designed for Web 2.0 and cloud computing datacenters  Provide 8x performance improvements  Consume 1/8th power and space
 Lower TCO by 60%

 Provide 100% compatibility with existing applications and management tools
© 2009 SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL 21

Q&A

© 2009

SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL

22

Comparison of MySQL Performance

NOTE: Durability not guaranteed for “Commodity/Opt. Flash” configuration
© 2009 SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL 23

Contact Us

For more information:
www.schoonerinfotech.com

(650) 328-4200 info@schoonerinfotech.com

© 2009

SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL

24

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->