You are on page 1of 4

2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics

understanding cache policy to evaluate write performance with hierarchical


hardware

Lin Qian, Zhu Mei, Hengmao Pang, Jun Yu, Guangxin Zhu, Haiyang Chen, Min Bu
State Grid Electric Power Research Institute
(SGEPRI)
Nanjing, China
qianlin@sgepri.sgcc.com.cn, meizhu2016@aliyun.com, panghengmao@sgepri.sgcc.com.cn,yujun@sgepri.sgcc.com.cn,
zhuguangxin@sgepri.sgcc.com.cn, chenhaiyang@sgepri.sgcc.com.cn, bumin@sgepri.sgcc.com.cn

Abstract—First, it briefly introduces the status of write- This article focuses on the research work of writing
optimized distributed file system, focuses on the issues related optimized distributed file system. Specifically speaking, the
to write optimization, and then introduces write-optimized full- memory is used as the physical disk acceleration media to
stack cache acceleration and related technology of storage build a multi-level cache, so that the hot and cold data are
tiering, followed by full-stack cache acceleration in SSD Cache. adaptively provided in the storage system. Section 1 of this
Accelerated technology was introduced in detail, and the paper briefly introduces the current status of write-optimized
performance advantages were further demonstrated through distributed file systems and focuses on the issues related to
comparative testing, from the above data, we can clearly see write optimization. Section 2 describes related techniques for
that using the Flash cache, the flash cache of the four nodes is
full-stack cache acceleration and storage tiering; Section 3
64% lower than the SATA hard disk processing time. Finally,
describes full-stack cache acceleration. The SSD Cache
the challenges that may be faced in the future research work
were prospected.
acceleration technology in the focus of the introduction;
Section 4 through the Flash cache and HDD, SATA
Keywords-write feature;cache promoting;storage tiering;flash comparison test to prove the performance advantages of
cache; memory acceleration cache acceleration, the final summary of the full text, and the
future of attention to the research direction of a preliminary
discussion.
I. INTRODUCTION
In recent years, with the continuous development of II. FLASH ACCELERATION RELATED TECHNOLOGIES
emerging technologies such as big data and cloud computing,
the data in the world has increased dramatically, posing A. Full stack cache acceleration
severe challenges to the storage capacity and performance. With the rapid development of hardware technology,
Distributed file system has become a research hotspot as the attention is paid to the local rules of the program. Memory,
underlying mass data storage system supporting the big data SSD, NVM, and 3D-Xpoint are used as the first storage
era [1]. medium, and the storage location of data is dynamically
Data reading and writing are two basic operations adjusted to balance the conflict between data capacity and
provided by a distributed file system. The performance of system performance. . At the same time, we noticed that the
read operations of a distributed file system[2] is extremely performance bottleneck of the flash memory storage medium
high due to the read performance of physical media such as is the write operation, not only because the write operation is
memory and SSD, and read operations do not need to more time-consuming than the read operation, but also
consider data security. Research heat has not received much because the read operation is also affected by the lock
attention compared to write optimization. mechanism when the write operation is not completed.
In the process of writing data, the distributed file system In this research direction, as early as the second session
needs to solve the contradiction between the consistency of of the FAST International Conference in 2003, N. Megiddo
write data brought by the distributed architecture [3] and the [5] and others proposed an adaptive cache algorithm to
overall system performance, and has become a key concern accelerate the storage IO performance. In the following years,
in the field of distributed storage in recent years. FAST, the FAST, MSST and other important international storage The
most important academic conference in the field of storage, optimization of storage IO performance by the cache at the
reported on the topic of writing optimized file systems in the meeting has become a hot topic and will continue until 2016
form of topics in the year 15 and 16. Among them, the write with new applications such as mobile.
optimization of memory acceleration [4] has a supportive S. Huang [6] and others proposed using flash media and
role in significantly improving the system performance, and adaptive algorithms to optimize the underlying storage
the premise of how to ensure data storage security Next, the performance. R. Koller et al. [7] provided high-performance
use of write buffers to speed up the concurrent suspension of storage through the client's flash acceleration technology to
file system problems caused by frequent IO is a hot topic. enhance the coverage of the client to the bottom layer. The

978-1-5386-5836-9/18/$31.00 ©2018 IEEE 308


DOI 10.1109/IHMSC.2018.10176
full-stack IO optimization solution for storage, C. Li, P. Ь Read-only mode: Only read operations are cached,
Shilane [8], etc. achieves an organic balance of performance and the cache process is the same as write-through mode;
and capacity by optimizing the capacity of the SSD write operations are not cached. This mode is very suitable
acceleration process. Dulcardo Arteaga [9] and others solved for reading more than write less scenes;
two problems in the virtual environment. On the one hand, ЬWrite-through mode: For read data, all data is cached
Flashcache acceleration was provided on demand according on the high-speed device SSD device; for write operations, it
to the dynamic load of the VM. On the other hand, the is also cached in the high-speed device SSD and also written
dynamic cache migration technology was used to break the to the HDD. This mode is more suitable for reading more
performance degradation caused by insufficient cache and writing less, and the data written is the hot data scene
capacity. recently accessed. This mode is because writing is also
B. Storage tiering written to the HDD at the same time, so the performance of
writing is not optimal, but the data is the safest.
Under normal circumstances, the application type and Data writeback refers to flushing dirty data from an SSD
data type are different, and the underlying storage IO to an HDD hard disk. This is only available in write-back
processing features also show different ways. In fact, the mode. General data write-back will have some system
storage tiering technology is used to dynamically adapt to parameters, such as the threshold (that is, dirty data reaches a
what type of data should be stored on which kind of certain rate will start to perform data write-back); aging time
performance-adapted storage media. Because the capacity of (data over a certain period of time without operation, you can
high-speed storage media such as SSD is limited, when the perform data write-back); data back The rate of writing
data volume exceeds the storage capacity threshold, it needs (because it is slower to write the SSD winning data into the
to be performed. The storage of hot and cold data is replaced, HDD, and the rate limit of writing back needs to be
and this process is called storage dynamic data layering. considered). The timing and rate of writing back data will
From the figure below, it can be seen that the virtual greatly affect the performance of writing. If the threshold is
volume is stored at the operating system level, in which the set very high, the speed of data writing will be much lower
data tiering control algorithm for the volume is divided into than the speed of data writing to the SSD when the threshold
DRAM cache, SSD cache, SSD non-volatile storage layer, is reached. This will cause the SSD to have insufficient space
HDD volume storage layer, control strategy with IO quantity, for newly written data, which will cause writes to be directly
Read-write ratios, IO positions, IO parameters, and adaptive written to the HDD ordinary hard disk, which will not
changes such as high-speed and slow-media ratios for each achieve write acceleration.
subsystem. The emergence of technologies such as 3D- In the write-back mode, there are four data replacement
xpoint and NVM[10] over the past few years has enabled strategies [12]. In the actual use process, the user can select
storage tiering technology to form a multi-tier, multi-tiered, an appropriate replacement strategy according to the specific
interactive storage trend, and dynamically integrate resource situation, and make the maximum possible degree to keep
information in real time, under the premise of ensuring data the hotspot data in the SSD to the maximum extent. SSD's
security and consistency, and in physical storage. Maximize performance advantages are maximized.
performance with non-volatile high-speed storage. FIFO (First In First Out): First-in-first-out strategy,
which is the simplest replacement strategy. The essence of
this strategy is to always select the block replacement that
has spent the longest time in the SSD, that is, the block that
newly enters the SSD, and replace it first. The reason is that
the block that was first written to the SSD is the most likely
to be read/written no longer. This strategy is only ideal when
accessing the address space in linear order, otherwise it is
inefficient;
LRU (Least Recently Used): The longest unused strategy
recently. The essence of this strategy is that, when it is
Figure 1. Storage layered framework
necessary to replace the block in the SSD, the block that has
not been used for the longest period of time is selected to be
The existing data tiered storage strategy [11] is mainly replaced. The LRU strategy is related to the last time each
divided into three types, which are divided into write-back, block is used. Due to the complexity of this strategy, there is
read-only, and write-through: an LRU-approximate strategy NUR (Not Recently Used) that
ЬWrite-back mode: For read data, all cached to high- has not been used recently;
speed device SSD (if the SSD space is not enough, according LFU (Least Frequently Used): Minimal use of
to the replacement strategy, the data in the cache is replacement strategy. The essence of this strategy is that,
synchronized to the ordinary hard disk; for the write when it is necessary to replace the block in the SSD, the
operation, it will also be cached to the SSD; However, when block replacement that uses the least time is selected. The
dirty data (inconsistent data in the HDD) in the SSD reaches LFU strategy is related to the number of times each block is
a certain threshold, dirty data is synchronized to the SSD. used;

309
RAND: random replacement strategy. The essence of this The essential function of Device mapper is to forward IO
strategy is that, when it is necessary to replace the block in requests from the mapped device of the logical device to the
the SSD, the serial number of the SSD to be replaced with corresponding target device according to the mapping
the hardware and software random number is replaced. relationship and the IO processing rule described by the
target driver.
III. SSD CACHE ACCELERATION TECHNOLOGY DM-Cache utilizes the Device Mapper principle. The
Device mapper is a mapping framework mechanism from HDD common disks and SSD disks are regarded as Target
logical device to physical device provided in the Linux 2.6 Devices in the DM, aggregated into one virtual device
kernel. Under this mechanism, users can easily formulate (Cache device), and the HDD ordinary disks are mapped
management strategies for implementing storage resources with the SSD disk blocks. The Dm-Cache can be mapped to
according to their own needs. the HDD ordinary disks according to the mapping. The
Device mapper is registered as a block device driver in operations are converted to operations on the SSD device,
the kernel. It contains three important object concepts: thereby realizing the function of storing the hot spot data in
mapped device, mapping table, and target device: the SSD. The mapping of the HDD common disk to the SSD
Ь Mapped device is a logical abstraction that can be device block may be a simple linear mapping or a Hash
understood as a logical device provided by the kernel. mapping.
Ь The mapping table describes the mapping from the IV. PERFORMANCE TESTING
Mapped device to the Target device.
Ь Target device represents the physical space segment A. Comparison test with HDD hard disk
mapped by a mapped device. For a logical device
Here we chose the Flash cache as the Cache software for
represented by a mapped device, it is a physical device to
testing. Compared with the performance of Flash cache and
which the logical device is mapped.
HDD hard disk, the test produced 4 concurrent requests to
The three objects in the Device mapper and the target
ngnix to forward pressure to 6 clients to mix and
driver plug-in together form an iterable device tree. As
concurrently read and write data, and record the average
shown below:
feedback time and processing per transaction per press. The
total number of transactions and each node and client nmon
data, using the Filebench tool for results analysis.
Flash cache, 5 nodes, 400 concurrent, 50% read, 50%
write, 8k files:

Figure 3. Flash cache Test

HDD hard disk, 5 nodes, 400 concurrent, 50% read, 50%


Figure 2. Device Mapper kernel object mapping hierarchy write, 8k file:

As can be seen from the above figure, the top-level root


node in the tree structure is the mapped device that is
ultimately provided as a logical device, and the leaf node is
the underlying physical device represented by the target
device. The smallest device tree consists of a single mapped
device and target device. Each target device is exclusively
owned by the mapped device and can only be used by a
mapped device. A mapped device can be mapped to one or
more target devices, and a mapped device can be used as the
target device on its upper mapped device. This layer can
theoretically be infinitely iterated under the device mapper
architecture. Figure 4. HDD Test

310
From the above data, we can clearly see that with the analyzes that the future write-optimized distributed file
Flash cache, the read/write performance has been greatly system still has room for optimization in terms of network
improved, which is about 6 times higher than that of the latency, local file system load awareness, and further
ordinary HDD hard disk. optimization of the metadata structure.

TABLE I. RAN- READ AND WRITE TEST RESULTS OF FLASH CACHE ACKNOWLEDGMENT
AND HDD
This work was financially supported by the State Grid
Disk Type 5 Node Notes Corporation of Science and Technology Project(WBS
HDD hard 372ops/s 5 Nodes, 400 number:521104170019).
disk Concurrency, 50% REFERENCES
Read, 50% Write,
[1] Zhou Jiang,Wang Weiping,Meng Dan,Ma Can,Gu Xiaoyan,Jiang Jie,
8k File “Key Technology in Distributed File System Towards Big Data
Flash cache 61ops/s 5 Nodes, 400 Analysis,” Journal of Computer Research & Development, Feb. 2014 ,
Concurrency, 50% 51 (2) :pp.382-394.
Read, 50% Write, [2] Zhu, Yifeng. Improved read performance in a cost-effective, fault-
tolerant parallel virtual file system (CEFT-PVFS). In Proceedings 3rd
8k File IEEE/ACM International Symposium on Cluster Computing and the
Grid, pp 730-735, 2003.
[3] Chen, Jie; Tan, Zhihu; Wu, Fei; Xie, Changsheng. SJournal. A new
B. Comparison test with SATA hard disk design of journaling for file systems to provide crash consistency. In
Compared with the performance of Flash cache and Proceedings - 9th IEEE International Conference on Networking,
SATA hard disk, the scenario description: Architecture, and Storage, pp 53-62, 2014.
Ь 800 concurrent random reads and writes [4] Chen, Kerhong; Bunt, Richard B.; Eager, Derek L. Write caching in
distributed file systems. In Proceedings International Conference on
Ь File size: 128k (60%) 1M (40%) Distributed Computing Systems, pp 457-466, 1995.
Ь Number of initialization files per client: 5000 Folders: [5] N. Megiddo and D. S. Modha. ARC: A self-tuning,low overhead
50 replacement cache. In Proceedings ofthe 2nd USENIX Conference on
Ь Read & Write Strategy: Read 60% Write 40% File and Storage Technologies (FAST 03), pages 115–130,
Berkeley,CA, USA, 2003. USENIX Association.
Ь Network Type: 10Gb [6] S. Huang, Q. Wei, J. Chen, C. Chen, and D. Feng.Improving flash-
Ь Number of nodes: 4,5 based disk cache with lazy adaptive replacement. In Proceedings of
the 29th IEEE Symposium on Mass Storage Systems and
TABLE II. RAN- READ AND WRITE TEST RESULTS OF FLASH CACHE Technologies(MSST 13), pages 1–10. IEEE, 2013.
AND SATA [7] R. Koller, L. Marmol,R. Ranganswami, S. Sundararaman,N.
Talagala, and M. Zhao. Write policies for host-side flash caches. In
Disk Type 4 Node 5 Node Notes Proceedings of the 11th USENIX conference on File and Storage
Sata 7200 1T 635ms 432ms Sata line is Technologies (FAST 13), 2013.
connected to raid [8] C. Li, P. Shilane, F. Douglis, H. Shim, S. Smaldone,and G. Wallace.
Nitro: A capacity-optimized SSD cache for primary storage. In
card, raid card has Proceedings of the 2014 USENIX Annual Technical Conference
512MB memory, (ATC14), pp 501–512. USENIX Association, 2014.
faster than bare disk [9] Dulcardo Arteaga,Jorge Cabrera,Jing Xu,Swaminathan
PCI-e as 227ms 171ms Pci-e cache cache Sundararaman, and Ming Zhao. CloudCache: On-demand Flash
Cache Management for Cloud Computing. In Proceedings of the 11th
Flash Cache for sata, size 100GB USENIX conference on File and Storage Technologies (FAST 14),
From the above data, we can clearly see that using the pages 352–369. USENIX Association, 2016.
Flash cache, the flash cache of the four nodes is 64% lower [10] Islam, Nusrat Sharmin, Wasi-Ur-Rahman, Md., Lu, Xiaoyi, Panda,
than the SATA hard disk processing time. Dhabaleswar K. High performance design for HDFS with byte-
addressability of NVM and RDMA. In Proceedings of the
V. CONCLUSIONS International Conference on Supercomputing, pp1-14,2016.
[11] J Sim. Dynamically configuring regions of a main memory in a write-
For the rapid popularization of Internet applications, back mode or a write-through mode. Advanced Micro Devices Inc
designing and developing a write-optimized distributed file California, 2014.
system that can handle a large number of small files has [12] Tanwir,G Hendrantoro,A Affandi. Early result from adaptive
become a hot topic in the field of storage research. This combination of LRU, LFU and FIFO to improve cache server
article focuses on the key technology of flash acceleration performance in telecommunication network. International Seminar on
write optimization to improve database application Intelligent Technology & Its Applications,2015,pp 429-432.
performance, and demonstrates the performance advantages
of this technology through comparative testing. The author

311

You might also like