Introduction This paper proposes a design for a tag storage system.

Such a system must be quite flexible to allow for several operations to be executed efficiently, including inserting, removing, and finding triplets. Due to the workloads, my design has been tailored for finding triplets quickly rather than inserting or removing triplets quickly. Solving this problem is important because it is applicable to many current systems, such as Flickr and del.icio.us.

Cache

Disk Drive

Figure 1: Cache in memory and disk drive

Overview of Storage System In my design, the storage system includes a cache in memory and a disk drive (see Figure 1). The cache is used to improve the efficiency of responses to FIND calls. The disk drive is where all of the triplets are actually stored. When fulfilling a request to a FIND call, the system first checks the cache for a matching call. If no match can be found in the cache, then the system scans a specific portion of the disk drive to discover the result.

Disk Drive Organization For numerous reasons, the most important component in the system is the disk drive. To fulfill INSERT and REMOVE calls, we must access the disk efficiently. Additionally, the cache will not have a hit for all FIND calls, so it is necessary to minimize the number of seeks when

1

finding triplets on disk. Based on the workloads, I decided to maximize the efficiency of FIND calls instead of INSERT and REMOVE calls. I provide a strong rationale for this design tradeoff in a later section. Each triplet inserted into the storage system is stored on the disk five times. For each of the five times, the triplet is indexed based on a different combination of its subject, relationship, and object (see Table 1). For example, the first row in Table 1 shows that every triplet is indexed by its subject and relationship. These combinations of subject, relationship, and object are used by a hash function (described in detail in the next section) to determine precisely where on the disk to store each copy of the triplet. With this implementation, the system is able to execute FIND calls extremely quickly because the data is indexed and optimized for each type of FIND call. Whether zero, one, or two asterisks are used in a FIND call, the data has been indexed for all cases except for (*, r, *). The system does not index each triplet based only on its relationship because there are generally so few relationships. Thus, the entire disk is scanned to find all triplets with a certain relationship. Also, note that the system does not index each triplet based on its subject, relationship, and object. Instead, the system just uses the subject and relationship to find where these triplets reside on disk.

Subject and Relationship Subject and Object Relationship and Object Subject Only Object Only Table 1: The five different ways a triplet is indexed on disk.

2

The one billion tags we need to store take up roughly 100 GB. In this system, each triplet is stored five times on disk, so roughly 500 GB of storage is used out of the 1000 GB available to the system. Each block on the disk drive begins with a metadata section about that specific block. A more detailed description of the metadata section can be found in the section about the implementation of the INSERT function. One assumption that I have made is that there is a simple mechanism for determining which parts of a block are occupied by triplets. This mechanism alleviates the need for reshuffling after removing a triplet and allows for gaps to occur between triplets.

Hash Map and Hash Function On top of the entire disk drive is an implementation of a single hash map. The hash map’s primary purpose is to minimize the number of seeks when responding to a FIND call, though it is also used to maximize the efficiency of INSERT and REMOVE calls. Each key in the hash map is one of the combinations provided in Table 1. Each value in the hash map is a number that represents a block number on the disk drive for the particular combination. Thus, information about a triplet is passed to the hash map and the respective block number is returned. For example, if Subject = “Tiger” and Object = “Golfer” is passed to the hash map, a block number for this specific subject and object pair is returned. This block number can then be used to discover the triplets with a subject of “Tiger” and an object of “Golfer”.

Use Main Memory as a Cache The main memory contains a cache (see Figure 2) of results for specific FIND calls. Each tag in the cache is a specific FIND call. Each data component in the cache contains the triplets

3

corresponding to the FIND call. An incoming FIND call is first checked in the cache by comparing it to each of the tags. If a match is found, then the data for this tag is returned. If no match is found in the cache, then the FIND call’s results are found in the disk drive. After the results are found on the disk drive, the FIND call and the results will be stored in the cache. If the cache is full, then the system will randomly choose a tag and data component to be replaced by the new entry.

Index

Tag

Data

. .
47 48 FIND(“*”,“isa”,“monkey”,0,1000) FIND(“*”,“isa”,“golfer”,0,1000) <file://marmoset.jpg,isa,monkey>,… <file://tigerwoods.jpg,isa,golfer>,…

. .
Figure 2: Main memory is used as a cache.

Implementation of API – INSERT() The INSERT function inserts the provided triplet into the disk five times in total for each of the different ways described earlier (see Table 1). For example, to index the triplet by its subject and relationship, we first pass the subject and relationship to the hash function, which determines the block that should be used to store the triplet. A similar procedure is then followed for the four other ways of inserting the triplet into the disk. After inserting the data into the disk, the cache should be updated so that newly inserted triplets appear in the results immediately.

4

Of course, it is possible that a certain block becomes full of data. In this case, the new triplet will be written into the next adjacent block with an opening. The INSERT function will also update the original block’s (the block calculated by the hash function) metadata section to reference the largest block number that has been written to. For example, if the hash function returns block 50 for an insert, but there are no openings until block 80, then block 50’s metadata section will store the number 80. At a later time, if another triplet that maps to block 50 is inserted at block 75, block 50’s metadata section will still maintain the value of 80.

INSERT(String subject, String relationship, String object) { blockNum = hash(subject, relationship); block = read(blockNum); if (block is full) { openBlockNum = findNextOpenBlock(blockNum); write(subject, relationship, object, openBlockNum); updateMetadata(blockNum, openBlockNum); } else { write(subject, relationship, object, blockNum); } updateCache(subject, relationship, object); // do the same thing as above for each line in Table 1 } Figure 3: Pseudocode for INSERT()

The likelihood that blocks fill up frequently is relatively low. Due to the duplication, we are inserting 5 billion triplets into our system. If each key from the hash function maps to only 10 values (a conservative estimate) on average, we are left with 500 million unique keys. The disk has 1000 GB of storage, which is equivalent to 250 million blocks. This calculation, with a

5

conservative estimate, indicates that on average only two keys map to the same block. Thus, only 20 values will be stored in each block, which takes up 20 * 100 bytes = 2000 bytes, well less than the 4096 bytes available on each block.

Implementation of API – REMOVE() The REMOVE function removes the provided triplet from the disk five times in total for each of the different ways described earlier (see Table 1). This is completed by using a similar process and the same hash function that was used by the INSERT function. The metadata section of the block calculated by the hash function may need to be updated to reflect the removal of the triplet. After removing the data from the disk, the cache should be updated so that recently removed triplets do not appear in the results.

REMOVE(String subject, String relationship, String object) { blockNum = hash(subject, relationship); modifiedBlockNum = deleteTriplet(subject, relationship, object, blockNum); updateMetadata(blockNum, modifiedBlockNum); updateCache(subject, relationship, object); // do the same thing as above for each line in Table 1 } Figure 4: Pseudocode for REMOVE()

Implementation of API – FIND() As explained earlier, the FIND function first checks the cache to see if the results for a specific call are available. If not, then the disk is scanned to find the results. By inserting each triplet into the disk five times, we have optimized the FIND function. For example, if the client

6

calls the FIND function with a subject, relationship, and a wildcard for the object, then we will pass this information to the hash function, which determines the block, N, to begin scanning for this data. The system will first read block N and determine if it can fulfill the response to the request. If not, the system will read into memory all of the blocks up to the block number stored in N’s metadata section. This ensures that the system has read in all of the relevant data, which means that it can handle any of the ranges (start and count) provided in a FIND call. Additionally, if the data for two keys of the hash map overlap, the system will simply have to filter out the unwanted data. A similar procedure is followed for finding triplets when different types of FIND calls are received. For three specific FIND calls, the execution is slightly different. If a FIND call is for (*, *, *) or (*, relationship, *), then the system will simply scan the entire disk. If a FIND call is for (subject, relationship, object), then the system will pass only the subject and relationship to the hash function.

7 Figure 5: Pseudocode for FIND()

FIND(String subject, String relationship, String object, int start, int count) { results = checkCache(subject, relationship, object, start, count); if (results are not null) { return results; } // hash() disregards asterisks blockNum = hash(subject, relationship, object); block = read(blockNum); done = determineIfDone(subject, relationship, object, start, count, block); if (done) { results = getResults(subject, relationship, object, start, count, block); updateCache(subject, relationship, object, start, count, results); return results; } blocks[] = read(blockNum, largestWrittenBlockNum(block)); results = getResults(subject, relationship, object, start, count, blocks[]); updateCache(subject, relationship, object, start, count, results); return results; } Figure 5: Pseudocode for FIND()

Implementation of API – SHUTDOWN() At all times, every triplet in the storage system can be found on disk. The cache in main memory will be lost when the computer is turned off. Thus, no additional steps need to be taken to support a clean shutdown.

Workload Analysis For a FIND call, the system first reads an initial block. If more results are needed, then FIND will read the next n blocks as determined by the initial block’s metadata. Thus, the total time for FIND will be 12.17 ms + 0.06 ms + 12.17 ms + n * 0.06ms. The number n should

8

maintain relatively low values. If n=10, the total time for FIND will be 25 ms. Please note that this total time is likely an overestimate for an average FIND call because many requests will be fulfilled by reading only the initial block and some requests will be fulfilled by checking the cache. In general, all FIND calls will have at most two disk seeks. For an INSERT call, the system first reads an initial block. If the initial block is full, then n more blocks are read to find an opening. I will assume that an opening is found at this stage, otherwise more disk seeks and block reads are necessary. Now the system writes the triplet to disk and, if necessary, updates the initial block’s metadata. This requires a disk seek and n blocks to be written to. Thus, the total time for INSERT will be 5 * (12.17 ms + 0.06 ms + 12.17 ms + n * 0.06 ms + 12.17 ms + n * 0.06). The 5 denotes that the system must store the triplet on disk five times. The number n should maintain relatively low values. If n=10, the total time for INSERT will be 188.85 ms. In general, most INSERT calls can be completed with 5 * 3 = 15 disk seeks, but more may be necessary if the system has a hard time finding openings for the triplet. For a REMOVE call, the system first reads an initial block. If the triplet cannot be found in the initial block, then REMOVE will read the next n blocks as determined by the initial block’s metadata. The system then writes the contents back to disk with the triplet removed and, if necessary, updates the initial block’s metadata. This requires a disk seek and n blocks to be written to. Thus, the total time for REMOVE will be 5 * (12.17 ms + 0.06 ms + 12.17 ms + n * 0.06 ms + 12.17 ms + n * 0.06 ms). The 5 denotes that the system must remove the triplet from disk five times. The number n should maintain relatively low values. If n=10, the total time for REMOVE will be 188.85 ms. In general, all REMOVE calls will have at most 5 * 3 = 15 disk seeks.

9

Design Analysis and Discussion The main design tradeoff that I made was to maximize the efficiency of FIND calls instead of INSERT and REMOVE calls. This allows data to be found extremely quickly, but INSERT and REMOVE take dramatically longer because the data is replicated five times on disk. I made this decision because the workloads were primarily made up of FIND calls. For the Flickr++ workload, 90% of the total requests are FIND calls. For the library workload, 100% of the total requests are FIND calls because the inserts are done statically before any FIND calls. Since there was such a strong focus on finding data quickly, I optimized for this case. The main goal of my design was to maximize the efficiency of FIND calls. This has been achieved by duplicating the data and using a hash function to map to a specific part of the disk. Nonetheless, there are a few limitations to my design. First, the design is not very effective when numerous keys share the same block. Second, when several thousand triplets are added at one block, all of the subsequent blocks get filled. The problem arises when new triplets are inserted because they are pushed far down the disk. This increases both the INSERT and FIND times because the data is sitting far away from where the hash function claims it is. Third, if new types of FIND calls are created and we wish to optimize for these cases, my design becomes increasingly less efficient. The total times for INSERTs and REMOVEs increases incrementally each time the system optimizes for a specific FIND call because the data is duplicated yet again. Fourth, if new types of FIND calls are created, then the system may run out of disk space. If the system needs to duplicate the data ten times, the system will likely run out of disk space.

10

Conclusion In summary, my design has been tailored for finding triplets quickly rather than inserting or removing triplets quickly. This has been achieved by duplicating the data and optimizing for specific FIND calls. The system uses a cache to improve the efficiency of FIND even further. A hash map is placed on top of the disk drive so that the system can seek to specific blocks on disk where data is likely stored. One issue that must be resolved before implementing the system is to determine a simple mechanism for determining which parts of a block are occupied by triplets. One recommendation I have is to determine how to avoid inefficient cases, where the hash function points to a certain block, but the data has been pushed to another block that is far away. Although the current system is correct and efficient, there are likely ways to further optimize it in the future.

Word Count: 2,522

11

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.