You are on page 1of 8

Published in: Proc of the 2nd. Int.Symposium on Parallel Algorithms/Architectures Synthesis, 17.-21.

March 1997 in Aizu-Wakamatsu, Japan, IEEE Computer Society Press, 1997, pp. 348 - 355

Introducing Parallelism in Multimedia Database Systems
Günther Specht, Stephan Zimmermann, Alexander Clausnitzer Department of Computer Science Technische Universität München Orleansstr. 34, D-81667 München, Germany email: {specht, zimmerms}@informatik.tu-muenchen.de

Abstract
We discuss possibilities of parallelizing multimedia database systems, especially based on our experiences with our multimedia database system MultiMAP and our parallel database system MIDAS. We descripe both initial systems in brief. Our main purpose is to examine when parallelism is of advantage and where sequential processing in multimedia databases is sufficient.

1. Introduction
The increasing amount of data and links managed by multimedia and hypermedia systems requires the use of multimedia database systems. Compared to conventional processing of data in filesystems (like used in WWW), multimedia database systems offer an efficient, index supported access as well as all the advantages of transaction protected multiuser mode and recovery. We have developed the multimedia database system MultiMAP, which is currently used in several fields of application. It is based on a relational database system. A second group in our department is working on parallelizing of relational database systems. This group has developed the parallel database system MIDAS. The following sections contain a short description of both initial systems. Section 4 discusses possibilities of parallelizing multimedia database systems. We distinguish between inter- and intra-transaction parallelism and therefore analyze query complexities down to the relational algebra level using query execution plans. Particularly, we will examine when parallelism is of advantage and where sequential processing in multimedia databases is sufficient.

2. The multimedia database system MultiMAP
2.1. The initial system:
MultiMAP is an interactive, extensible hypermedia data-

base system, in which text, images, arbitrary objects on the images, audio and video can be connected by links. Link processing and evaluation is also integrated in the system. MultiMAP runs on Unix workstations (Sun4 Sparc, Hp, etc.) using a client/server architecture, also over NFS. The main focus of MultiMAP is the support of fast and simple (mouse supported) building of applications. Using a database instead of the still very common file systems in multimedia systems has advantages, like: • integrated processing of big amounts of multimedia data, • optimized storage due to efficient access paths and index structures • multiple complex search possibilities • referential integrity of links • transaction protected multi user mode and • full recovery capability. MultiMAP implements a hypermedia engine and uses the relational database system TransBase™ as a backend for internal data maintenance. Figure 1 shows the architecture of MultiMAP. MultiMAP is conceptually based on an extension of the Dexter Hypertext Reference-Model [7], which is an acknowledged standard for hypermedia systems today. This is especially important for the power of the link concept and we will refer to this model in detail later, when discussing parallelization ideas. The link concept for MultiMAP goes beyond usual WWW-links: 1. Support of arbitrary n:m links: Not only 1:1, but arbitrary n:m links are supported. This denotes an extension of classical hypertext structures, which can only manage n:1 links. At 1:n links, an additional selection component is necessary. 2. Extension of the hypertext concept on arbitrary graphical objects: Link exits and link targets can be arbitrarily outlined objects on an image (for example the course of a river or a plot of land on a map). In particular, these do not need to be approximated by rectangles.

348

mapping out biotopes. A detailed examination [9] showed that object-oriented databases do not show the desired performance and user interactivity at huge amounts of data. The full text search is integrated in the object search and behaves like an additional dynamic link. All objects and links are managed inside the database and therefore they satisfy referential integrity and support multi user access.. The system will be used in the domains of environmental planning and mapping out biotopes. making development easier. this paper deals with parallelization ideas on the relational variant of MultiMAP. it is possible to execute full text search on all text. MultiMED: A further field of application is the multi- External Previewer (Client) Previewer Text. We present only a few of them: 1. partially with large amounts of data and high user activity. Links Hypermedia-Engine (Server) Presentation Manager Fulltext Query Manager Video-Camera Frame-Grabber Scanner. using the object-oriented database system O2 as a backend. effectiveness and performance.. Pictures. An important field of application is the multimedia processing of maps and city maps for urban information systems. Video. A Munich city guide is already completed in most parts.3. image and object names of the database. In addition to links. or administrative domains of environmental planning. Insert / Update Component LinkManager SQL-Interface-Manager Extended relational Database System TransBase TA-Manager Catalog-Manager SQL-Query-Compiler SQL-Query-Optimizer Query-Execution-Plan Interpreter Lock Manager Access Structure Manager (B-Trees.2.. We have also implemented the complete system as an object-oriented database. OCR. 2. We did this in order to obtain comparisons regarding modelling. Thus. 2. Indizes) Segment Manager Recovery Manager M M Storage-System Discs Juke Box (CD) Figure 1: Architecture of the multimedia database system MultiMAP 349 . Applications: MultiMAP is already used in a series of applications.

MultiLIB: A multimedia guide through the university library of the University of Munich and it’s branches. It is well suited to serve as testbed for the exploration of various parallel database technology and its integration into database system architecture. 3. and most applications are read-only. we will introduce MIDAS in the next section. An application for Old-Hebrew exists in the Institute for Assyriology and Hethitology of the University of Munich. A prototype has been developed in collaboration with the St. or what principles can be adopted from there for an integrated. in order to construct correct grammar and the development of text-critical editions. They are sequen- MIDAS execution system database database database Figure 2: Architecture of the parallel database system MIDAS 350 filesystem MIDAS server I/O interface T1 T2 Tn MIDAS access system MIDAS clients SQL interface .media processing of X-ray images in medicine. That leads us to the study. 3. One of the enabling technologies perceived is parallelism. 4] is a parallel relational database system. Germany. that we present here. MIDAS clients are database applications. In addition. in multimedia processing of results of language analysis. if the backend can be replaced. Thus there are peak loads in both directions. parallel hypermedia database system. MultiBHT is an application in research. and to offer support in catalogue queries. Here there are only few users. opening times of the library. relational database system. we want to describe in brief the MIDAS architecture and its features. Since MIDAS could be used to parallelize MultiMAP. text. MIDAS has a client/server architecture (see figure 2). including detail images and verbal or written medical reports. but with more intensive sessions and very variable result sets. To fit the needs of nowadays complex database applications it is necessary to increase the performance of relational database systems. In MIDAS we are already developing a parallel. In it’s different applications. MultiBHT: A third field of application lies in linguistics. Bernward hospital in Hildesheim. thus this raises the questions of how far parallel systems are suited to serve as backends for hypermedia applications. extraction of multimedia objects like images. and every access is a read/write-access. their location and access rights. The purpose is to enable the students to find books. Architecture of the parallel database system MIDAS MIDAS (MunIch Parallel DAtabase System) [3. how and when parallelizing a database system is advantageous. the system must prove to be worthwhile in very variable user environments: MultiLIB is directed to the broad public. sound and blinking requires higher CPU and net usage as in conventional databases. 4. especially to students. Hence. The MIDAS server provides a SQL interface to retrieve and manipulate efficiently the contents of a relational database. Therefore the system must be accessible from all the branches.

• Images and audios can lead to a I/O bottleneck. This kind of parallelism is usually called intra-query parallelism. The MIDAS server is a set of processes running on several nodes of a workstation cluster. 4. As an opposite to file-system based hypermedia systems.1. the evaluation of this query can take place in parallel. by using parallel joins. The MIDAS access system supports parallelism between different MIDAS clients. like in many other applications. Discussion of the parallelizing of hypermedia database systems Because MultiMAP uses a relational database technology as a backend.tial programs issuing transactions to the MIDAS server.g. This kind of parallelism is usually called inter-transaction parallelism. this is real inter-transaction parallelism and not a form of quasi-parallelism on one node. At a higher number of users and short queries. The access system generates query execution plans (QEP) which can be performed by the execution system in parallel in order to answer the queries of the clients. But because all kernels run interleaved and sequentialized on one machine. • Finally we discuss the integration of parallel compression and decompression algorithms into our client/ server system.1. We analyse the following points: • Increase of user parallelism (number of users) through real inter-transaction parallelism versus kernel replication in the old system. the internal representation of links is based on the Dexter reference model [7]. The applications of MultiMAP profit the most from inter-transaction parallelism. As a solution we will discuss a parallelization in the sense of the RAID technology. They do not contain any parallel constructs. The queries are purely descriptive (SQL). We will therefore analyse the complexities of these operations based on the Dexter model and then we will discuss if a parallelization would pay. The server is embedded into PVM (Parallel Virtual Machine). in hypermedia systems there exist two entries in the systems. • The second way to enter the hypermedia net is the fulltext search. 4. we had to transferre the Dexter model into the 351 . Inter-transaction parallelism (user parallelism): In sequential systems. first through link navigation and second directly through fulltext search. Parallelizing the link navigation In MultiMAP.2. Flexibility and scalability are the two main goals which must be accomplished by the implementation of the MIDAS server. Again. e. Each transaction on its part consists of retrieval/update queries. First of all. the access system provides a mechanism so that an arbitrary and varying number of clients can issue their queries to the MIDAS server in parallel. optimize and parallelize the queries issued by the clients. 4. Internally both entries require a complex query evaluation. the server can increase its degree of parallelism by starting new PVM tasks within the server. a simple quasi-parallelization is accomplished by starting an own database kernel for each user. PVM seems to be well suited because it provides much flexibility. • The intra-transaction parallelism parallelizes the latter also inside the queries. the maximal number of kernels that exist simultaneously is no bigger then 10 per workstation. MultiMAP can either be build directly on MIDAS or could at least use it’s technology. It is composed of two layers: The MIDAS access system and the MIDAS execution system. Interim results can be exchanged between the nodes involved in one QEP. 4. Since the execution system can work simultaneously on several QEPs. we will analyse the parallelization of the query execution plans (QEPs). projections etc. The possibility to start dynamically new PVM tasks helps to implement a scalable system. and because MIDAS offers a parallel relational database interface. Therefore a detailed analysis on when a parallelization is advantageous needs to be done. while using reentrante coding. these are MultiLIB and MultiMED. selections. today there are typically used TP-monitor techniques. Intra-transaction parallelism Usually. a software package that permits a heterogeneous collection of Unix computers hooked together by a network to be used as a single large parallel computer [2]. A MIDAS client can run on any computer having access to the MIDAS server. These new PVM tasks are dedicated to the additional work caused by new clients. Whenever a new client arrives. For that purpose. The second task of the access system is to compile.: • The most common access on the data occurs by link navigation. that were derived from a single query by the parallelizer inside the access system. In this section we will analyse their parallelization potential. Since these queries can be evaluated by server processes on different nodes of the server. The MIDAS execution system is responsible for the parallel execution of queries in accordance with their QEPs.2. because they typically have many users with read access or disjunct write access.

resolves to Atom #123 Component_Info Attributes Presentation_Spec Anchors Content ID #1 Value Link #89 Specifier Component_Spec Anchor_ID Direction Presentation_Spec Specifier Component_Spec Anchor_ID Direction Presentation_Spec #256 #5 TO resolves to #123 #1 FROM Composite Attributes #256 Component_Info Presentation_Spec Anchors Content ID #5 Value Figure 3: Modelling links within the Dexter Reference Model relational model first. presentation_spec: string) type: string) Specifier (link_UID: integer. • Search for all the corresponding link target specifiers (= selfjoin over link_UID) • Join with the relation anchor to retrieve the value (exact link target specification. the following simplifies basic relations result: Atom (UID : integer. The storable hypermedia objects are called components. The access behaviour at link navigation looks like this: • The user clicks a link marker. • Search for link sources with corresponding UID and AID in the relation specifier to determine the link UID. For identification.). presentation-spec: string content: BLOB) Composite-includes (UID_father : integer. that points to the exact target inside the component. To determine the complexity of link navigation. by a locally unique anchor-identifier (AID) and an anchor-value. [component-info-attributes as author. where each specifier contains exactly one anchor. UID_child: integer) Primary keys are underlined. it is important to take a closer look at all the joins that need to appear in the database. and convert this into the relation schema. Atoms are simple nodes. anchor_AID : integer. date etc.) */ Anchor (UID: integer. component-info-attributes as author. and for faster access we put an index on them. detail: IMAGE. AID: integer. Anchors are locally defined inside their components. we will introduce the representation of nodes and links in the Dexter reference model only as much as it is necessary for the understanding of the analysis that will follow. Thus. If we build the entity-relationship diagram for this model. There are three different types of components: atom. here a prefix-B*-tree. content: BLOB) /*here simplified as BLOB. each component has a “global unique identifier” (UID). The storage layer describes the network of nodes and links. A link is defined by two (or more) specifiers. date etc. It is important to emphasize that links are defined as entities of their own and not implicitly in the components. link and composite component. Anchors can be components or parts within a component (span-tospan links). AUDIO. direction: string presentation_spec: string) Composite (UID : integer). anchor_UID : integer. Representation of nodes and links in the Dexter Reference Model: In the following. as an input we get the AID and UID of the source anchor. This is illustrated in figure 3. MPEG. value: string) Link (UID: integer. in 352 . composite components are hierarchically composed nodes (DAGs). etc.] presentation_spec: string.

But a closer look shows. ‘BI’) INDEX-SEL ON SPECIFIER A_UID = [<INPUT>] A_AID = [<INPUT>] RELATION ANCHOR INDEX PROJ RELATION SPECIFIER Figure 4: Query Execution Plan for a link navigation query • For representation.anchor_AID ≠ s2.A_UID <>S2. a.anchor_AID = <input> AND s1. that contains the link target (Selection over UID on the relations atom or composite or link). and normally only one source anchor is used in one link. with complexity O(m * log n). For the second Link navigation in SQL: SELECTa. Thus one might consider that it would be worth to fragment the first relation for the parallel join instead of the second. since the first selection is on the source anchor. provided that the lines marked as post selections in the SQL-query above will be translated to selections and not to joins. e.link_UID = s2.A_AID] SELECT S2. we assumed it would be worth to parallelize it.A_AID <> S2. ‘BI’) AND (S1. 353 .PRE] INDEX-JOIN ON (UID. A. A_UID. that the incoming m (that is always the left side of the join in figure 4) is really always small! For the first join m would be rather 1.DIRECTION IN (‘TO’.UID. if the attributes of the relation link are relevant (e. Since the join is the most expensive operation.direction IN (‘from’.UNIQ PROJ [A.A_AID) INDEX-JOIN ON L_UID [L_UID.AID.AID. AID) PROJ [S2. i. specifier s2. huge amounts of data.UID. s2. ‘bidirect’) AND s1. anchor a WHEREs1. link type) ).A_UID.anchor_UID = a. ‘bidirect’) AND ( s1. the structure of the internal query execution plan (QEP) on the level of relational algebra operations is important. we will discuss this part only in section 4.anchor_UID = <input> /* Selection */ AND s1. since m/f * log n < m * log n/f (with f being the number of fragments). For efficiency.UID /* 2. (Because the result will contain BLOBS. Join */ AND s2. S2. a. S2.value. since 1:n links are supported by the model.A_UID OR S1. get the component.3.g.anchor_UID ≠ s2. This QEP is highly optimized.presentation_spec FROM specifier s1. A_AID] INDEX SELECT DIRECTION IN (‘FROM’.direction IN (‘to’.link_UID /* Selfjoin */ AND s2. Analysis: We get two select operations and two joins (maybe even 3.anchor_AID = a.).anchor_UID OR s1. See figure 4. there is already an index-equijoin used.AID. For an exact analysis.anchor_AID) /* Post selection */ AND s2.

it is not worth to parallelize the link navigation. all the Ngrams) of the text that occur and a word identifier (number) for them. since all single subqueries are of the same structure. but mainly during the input of new text nodes. In both procedures there are initially needed two relations. contains all the words (resp. since mostly used are 1:1 links and if 1:k links are used. marked in figure 5. called word. Then the joins can be split into parallel ones. that appear rarely in multimedia systems (as an opposite to digital library systems). At the input of a new text node. which is exactly that one of the inner nodes.AID] INDEX-JOIN ON Word-ID PROJ [Word-ID] INDEX INDEX INDEX-SEL ON WORDS [‘PARALLEL%’] RELATION DOCUMENT [‘MULTIMEDIA%’] INDEX-SEL ON WORDS RELATION DOCUMENT Figure 5: Query Execution Plan for the fulltext retrieval query “parallel% . with an fragmentation on the attribute word. AND/OR links and distance search.2.UNIQ MERGE-JOIN ON UID SORT SORT PROJ [UID. called document. A detailed analysis for the digital library system OMNIS can be found in [5]. The second one. and although it is negative. and the corresponding join relation always fits in the main memory. 354 . by a partition of that table by the attribute word-id. For the first part of the input procedure for new textnodes it would be sufficient to use an parallel section operation on the table word. besides link navigation. every word of the new text needs to be looked up in the relation word of the existing database. multimedia%” join the incoming m is in average about 1. contains all the occurrences of each word in form of the corresponding UIDs and AIDs of the components. Now. it is important to know it. If it has already been registered. the k is small (in avg. Parallelization of fulltext search As a second access point. What parallelizations and data fragmentation are of interest in our input scenario? First of all.2. it must be registered in the relation word. but also arbitrarily complex composed search queries can be processed. and is already index supported. what is the result? Since the intermediate results are always very small. The first one. what simply shows that most of the words are used several times. As an example see figure 5.8. the n-gram technique and the whole word technique. Thus. This problem is easier as the arbitrary retrieval problem. 4. fulltext search means that there can not only be searched for any word. it’s word identifier must be passed back and a new entry in the document relation needs to be made. It is worth to notice that this fragmentation would also support parallelization of complex fulltext retrieval queries. < 10) again. Both techniques are mainly used in the contents of electronic libraries [1]. including truncation. the document table is by a factor of 15 to 400 larger than the table word. If not. that can be parallelized.AID] INDEX-JOIN ON Word-ID PROJ [Word-ID] PROJ [UID. There are two different techniques for this. modern hypermedia systems offer a fulltext search. For the second part of the input procedure the selection on the table document can be parallelized. The n-fold search (where n = number of words in the text that is to be introduced) means a clear deceleration. Here occur query execution plans (QEPs) of any complexity. which we use. This result astonished us. The biggest bottleneck in the context of fulltext search in multimedia databases does not occur due to complex search queries.

if the net offers enough bandwidth. Nippl. B.3. October 1994 [4] G. Parallelization of compression and decompression One of the most time consuming operations while dealing with pictures. 1994 [3] G. (DEXA ‘94) Athen. Thomasian. R. Ramos: Continuous Retrieval of Multimedia Data Using Parallelism. Beguelin. IEEE Computer Press.4. Parallel Processing. Austria. Ghandeharizadeh. Khoshafian. Clausnitzer. Berlin 1996 [5] A. [9] G. ORNL/TM12187. L. are components of the external previewer. Jaedicke. Mitschang. using them will speed up response time. parallel compression and decompression algorithms can only be used on server side. Lyon. LNCS 1123. The surprising result is that parallelizing the link navigation is not advantageous. But then the client machine has to be a parallel machine as well. the fulltext search. C. A high performance is achieved by clever horizontal and vertical partitioning or by redundancy. Wiesener: OMNIS/Myriad Document Retrieval and its Database Requirements. Communications of the ACM. from interparallelism over intra-parallelism to I/O-parallelism. we introduced our multimedia database system MultiMAP and our parallel database system MIDAS. while level 5 also stores the parity sections interleaved on disks. video-sequences. which is too expensive today. Morgan Kaufmann Publishers... Spinger. does not carry weight at the I/O-interfaces. Vogel. IBM Research Division. Reiser. Springer. Conference on Data Engineerin. 1994. 5. Feb. R. S. Euro-Par Conference. Of course. No. S. B. G. Halasz. Hofmann: Migration Evaluation of a Multimedia Information System from a Relational to an ObjectOriented Database System. bn to the disks d1 . Since the aim of this study is finding possibilities for parallelizing the hypermedia database server 355 .g. M. Thus level 1 does not have any major advantages over levels 2 to 5. Only the storing of the finished compressed data is under transaction protection. but parallelizing the second entry. two or three disks at a time are described identically. TUMunich. GI-Jahrestagung 1996. A. Pawlowski. Zimmermann: On transforming a Sequential SQL-DBMS into a Parallel One: First Results and Experiences of the MIDAS Project. CA. E. or more often. Proc. 1994 [1] 4. where j = i mod k. M. Jaedicke. First. Levels 2 to 5 introduce additional blocks for parity-bits as error code. A Users’ Guide to PVM Parallel Virtual Machine. M. Dongarra.4. Reiser: Using PVM to implement a Parallel Database System. A. Listl. using RAID-architecture (RAID = Redundant Arrays of Inexpensive Discs). there has been proposed a parallelization through several disks (disk farms). Houston. of 2nd Int. A. Depending on the system architecture of multimedia databases. Proc. Berlin 1994 [2] A. Then we examined the different possibilities of parallelization for the MultiMAP system. The error code for levels 2 to 4 are stored on a separate disk (data stripping plus parity disk). since we are using small and cheap machines as clients (previewer) in a local distributed environment. References B. There are already some algorithms published on parallel JPEG and MPEG compression [10. 37. A. 2. but those are kept in main memory). loading short tuples like used in link navigation. images. compression and decompression can be integrated in the multimedia database server. Proc. Mirrored disks lead to an increased storage cost and in the hypermedia applications there are only few access conflicts on the same page (exception: access pages. M. M. Specht. Technical Report. pp. 1996. dk: bi --> dj. of the 1st European PVM Users Group Meeting. Schwartz: The Dexter Hypertext Reference Model. Los Alamitos. Lehn. Proc. J. audio. 6]. Bozas. Research Report RC 20243. Sunderam. NY 1995 [11] A. 30-39 (first published in 1990) [8] S. Proc. Thereby. S. 5th Int. Conclusion We have examined the different parallelization possibilities for hypermedia database systems. Yorktown Heights.. Menon: Performance Analysis of RAID 5 Disk Arrays with a Vacationing Server Model for Rebuild Mode Operation. Bozas. by cyclically distributing consecutive blocks b1 . including an optimized caching [11]. September. Baker: MultiMedia and Image Databases. Level 1 defines mirrored disks. P. A.. We expect from RAID level 5 the most favourable disk parallelization for our hypermedia applications. Listl. Zimmermann: Accelerating Documwent Managemant Systems by Parallel Database Technology. Geist. Bayer. Mitschang. J. Oak Ridge National Laboratory. Manchek. The RAID architecture has 5 different levels: Level 0 is a datastripping. S. but as a preprocessor. TX. Both are based on relational database technology. e. software. B. 1992 [7] F. They are especially important for the inputcomponent. Conference on Database and Expert Systems Applications. The latter has the benefit that less data is transported via the net. 1996 (in german) [10] P. since the creation of compressed movies is time consuming and thus not implemented as a long transaction now. is very promising. A. J. Compared to that. Klagenfurt.V. 1996 [6] S. videos or audios in multimedia-databases is the compression and decompression. Vol. Viscito: A Parallel MPEG-2 Video Encoder with Look-Ahead Rate Control. managed by the database system. of Int. As a solution. Parallelization of the I/O-accesses A frequent bottleneck in multimedia database system is the simultaneous disk access for retrieving huge amounts of data. Tiwari.