You are on page 1of 1

H-Store:

An aims to enable automatically and transparently maintain data on the cheaper


secondary storage, allowing more data, than the available memory. Instead of maintaining an
LRU (least recently-used) list like H-Store Anti-Caching , Siberia performs offline classication
of hot and data by logging tuple accesses rst, and then analyzing them offline to predict the top
tuples with the highest estimated access frequencies, using an efficient parallel classication
algorithm based on exponential smoothing. The record access logging method incurs less
overhead than an LRU list in terms of both memory and CPU usage. Adaptive range lters that
are used to lter the access to disk. Besides, in order to make it transactional even when a
transaction accesses both hot and cold data, it transactionally coordinates between hot and cold
stores so as to guarantee consistency, by using a durable update memo to temporarily record
notices that specify the current status of cold records.

Efficient Parallel Classication Algorithm

The rst design issue is the assignment of data to the processors. The computation time at a node
is proportional to the number of records at that node. So, the best way to achieve load balance at
the root node is to fragment each attribute list horizontally into p fragments of equal sizes and to
assign each fragment to a different processor. We assume that the initial assignment of data to the
processors remains unchanged throughout the process of classication. With this approach, the
splitting decision may cause the records of a node to be distributed unevenly among processors.
If the computations for all the records of all the nodes at a level are performed before doing any
synchronizing communication between the processors, then the per-node imbalance would not be
a concern. This approach of doing per-level communications as against per-node
communications should also perform better on the parallel machines with high communication
latencies, especially because the number of nodes will be large at the levels much deeper in tree.

You might also like