You are on page 1of 8

A Concept of Distributed Storage and Parallel Processing for Large-Size Images

N.L. Kazanskiy1 , S.B. Popov1


Image Processing Systems Institute of the RAS, Samara, Russian Federation

V.A. Soifer2
Samara State Aerospace University named after acad. S.P. Korolyov (SSAU), Russian Federation

Abstract A distributed image approach to image data management in distributed storage and parallel processing systems is suggested. Within the proposed concept, a decentralised method for dynamic balancing and management of parallel computing, and visualization solutions for large-size distributed images are suggested. Keywords: image processing, parallel processing, distributed image, image data partitioning, dynamic load balancing

1. Introduction Currently, there is a growing interest to parallel and distributed image processing systems. Primarily this is due to the fact that there is a vital need for treating largesize images. There is a stable tendency to increase the size of the generated images. Satellite large-size images of the Earths surface are becoming more available, there are several satellites, which provide information in real time in the public domain[11][17][19]. Modern electron microscopes provide the gigabytes of high resolution data. Increasing the size of the image causes problems in their processing, storage and data transfer[7]. This is primarily due to the fact that a steady increase of the performance of personal computers has formed an interactive way of using the most popular tool in image processing: processing parameters or the time of completion of iterative procedures are selected by the user visually. At the same time, the amounts of data processed are comparable with, if not exceed, the main storage capacity of up-to-date high-performance workstations, whereas popular universal software systems for image processing recommend that the double amount of data on the largest image under processing should take no larger than 75% of the main memory capacity available. When this condition is violated, the run time required even for simplest operations on the largesize image becomes unacceptably long for the interactive processing. When conducting the real research, a representative set of large-size images can occupy a signicant part of the permanent storage capacity, that can signicantly reduce
1 Image

Processing Systems Institute of the RAS, Samara State Aerospace University named after acad. S.P. Korolyov (SSAU), Russian Federation
2 Samara

the eciency of the entire computer. If the work is performed by a team of researchers, it is inevitable duplication of all images or part of ones[5]. Multistage processing creates a number of auxiliary images which could exceed the original by several times, exacerbating the problem with free space on the disk. A centralized deployment of large-size images on specialized data warehouses imposes extra requirements on network bandwidth[16][20]. Note that usually a single request for image visualization generates signicant trac because the all of data is being requested and transmitted despite the fact that actually an essentially smaller part of data may be displayed. Image processing on the user computer increases the trac at least twice. Image processing directly on the storage server involves a substantial increase in computing power of the server, and, consequently, the cost of storing data on it. Image processing using distributed or parallel systems addresses many of the aforesaid problems. However, most researchers solve the problems of designing parallel image processing systems in isolation from the issue of how and where the image data is stored . In the meantime, the effectiveness of a distributed system for processing large-size image is largely determined by the data access methods in image processing software and the visualization convenience of the processing results. In this paper, by analyzing the benets and bottlenecks of the popular parallel image processing systems, a new approach to organizing data in distributed systems for image processing and storage is suggested. This approach is based on the concept of the distributed image. Within the concept framework, a decentralized method of dynamic balancing and management of parallel computing process, and visualization solutions for large-size distributed images are discussed.
September 3, 2010

Preprint submitted to Elsevier

2. Distributed image concept: implementation challenges The eciency of an image processing system is best characterized by the user request response time. The response time is composed of the time for retrieving data from the storage device, the image processing time (if required), and the time for transmitting the resulting image data to the users computer for the subsequent visualization or processing. Bottlenecks of parallel image processing systems are the preliminary stage of transmitting the original image data to the distributed system computers and the nal stage of merging processed fragments in a whole image [4]. Attempts to parallelize or signicantly reduce these necessary but unproductive stages of the distributed image processing led to the idea of the distributed image [21]. In this approach, the data on the images under processing is stored directly on the computers performing the parallel processing. In the parallel program, each task performs processing of the image fragment which is located on a computer that is running the task, the result of processing stored here as a part of the new distributed image obtained as a result of all the tasks involved in the parallel processing. Even though this idea seems apparent, it has not become widely spread because researchers have failed to oer satisfactory solutions to the following challenges. How to perform an optimal (or close to optimal) image data partition for a priori unknown subsequent processing tasks? How to solve the problem of load balancing of processing computers, with the data partitioning being pre-performed? How to ensure an adequate level of fault tolerance of the distributed storage of image fragments? How to ensure the virtual integrity of the distributed image? How to ensure an acceptable level of interactivity of the visualization of distributed images? On the one hand, when creating the distributed image, the data partitioning is performed in advance. On the other hand, neither the subsequent image processing tasks nor the optimal subimage size to ensure load-balancing of processing computers are known a priori. Besides, if operating in the shared mode, it becomes impossible to predict the actual load of a particular computer, which requires either the dynamic balancing in the conditions of a pregiven data decomposition or data redistribution between the computers during processing, which adversely aects the overall processing eciency. If the dynamic balancing is characterized by frequent and considerable load variations while processing, the data redistribution may come 2

to naught the gain due to the preliminary data allocation between the processing computers. The availability of an algorithm for load balancing of cluster computers for a centralized data distribution manager proposed in [20] was reported to outweigh prospective benets from paralleled data input/output in the case of preliminary image distribution between the computers, because the said variant of data distribution makes the above load balancing algorithm inoperative. Below, we discuss a selection of a decomposition technique well suited for the majority of parallel image processing operations. 3. Image processing description formalism In most applications of image processing, including computer vision, we can single out the following three types of operations [3]. 1. Low-level operations. While performing transformations on all image pixels, these operations form either a modied image, or a vector structure, or even a single value. The calculations are normally local in nature, being dened as a set of operations required to form an image pixel, a vector component, or a resulting value, respectively. Among examples are ltration, contrast and edge enhancement, geometric image transforms, histogram construction, deriving various statistical parameters, and the like. 2. Intermediate-level operations. Here belong diverse operations on image segmentation resulting in structural description of images. For example, lists of descriptions of object boundaries in the image. 3. High-level operations. These operations include semantic processing of diverse structural descriptions derived at the intermediate level, producing some sort of solution. By way of illustration, mention may be made of pattern recognition operations, semantic scene interpretation, and the like. Note that it is the implementation of a sequence of lowlevel operations on the set of original images that serves as a starting point in many typical applications in image processing and computer vision systems. In these applications, major costs are associated with the low-level operations which require considerable computational and storage recourses. The response time of the entire application software is in most cases determined by the time it takes to perform the low-level operations. In the general form, the procedure of image processing is presumed to involve either an originally prescribed sequence of solution steps or a possibility to decompose the solution technique into sub-techniques realizable as lowlevel operations. In each step, the solutions are being processed and/or solution elements are produced. Formally, this process can be represented as follows.

In the I -th step, the transform I is used to build necessary solution elements generally represented as a set of parameters, {r}I , and a set of images, {a}I {a} , {r} {a} =
i=0 I I

= I {a} , {r} , q ,
I1

I1

{a}i , {r} =
i=0

{r}i

where {a} , {r} , q are sets of input images, input parameters, and characteristic parameters of the transform F I , respectively. Any of the above sets can be empty at any 0 0 particular step. {a} , {r} are the original sets of images and parameters for the problem under analysis. The solution steps containing corresponding elements of solution are conveniently represented as a solution graph of the problem. The set of solution steps (operations F I of the algorithm) is put in one-to-one correspondence with a set of nodes. If the solution element of a particular operation is utilized in another operation, an edge is drawn linking the corresponding nodes, outgoing from the original node. The number of edges outgoing from each node equals the number of images and parameters formed at this step. In the situation when the solution element serves as an argument for several operations, there may be even more outgoing edges. When discussing problems of organizing the computational process in image processing systems there is no need to describe in detail specic steps of the processing technique, as it is the concern of the application designer to develop specic solution steps. What we are concerned with is the information type of the image processing program which denes in which way the program handles the image. Most researchers single out the following types of lowlevel operations on the image[13][15][18][19]: Point operations (PO). Local neighborhood operations (LNO) or local operations in a sliding window. Global operations (GO) usually based on the 2DFourier, or a similar, transform. Global reduce operations, or operations for computing image parameters, including statistical. Geometrical transforms. In a more general classication, operations are divided into global and local. In what follows, we will use the notation used by Ritter in image algebra [14]. Global image processing methods are characterized by the fact that for each pixel value c(x) F of the resulting image c FX to be computed all pixel values a(x) F of the original image a FX are employed: c(x) = x (a), x X 3

where x is the transformation operator at point x; X is the spatial domain of processing images; a, c FX are F -valued images on X. Global reduce operations are aimed at obtaining an estimate (or description) of the image under processing. The estimate may contain some integral characteristics of the original image, geometric characteristics of various objects found in the image, assignment of the original image or fragments thereof to a denite class. The reduce transform determines a necessary parameter r over an image a FX : r = a = a(x)
xX

With local image processing methods, each output pixel is being formed by analyzing the corresponding neighbourhood in the input image. Local processing also involves point operations and sliding window processing. The point processing involves both unary and binary operations. Using the unary operations, each pixel a(x) in the original image a FX is transformed into the corresponding pixel c(x) of the output image c FX c(x) = (a(x)), x X. With the binary operations, each pixel of the output image c FX is formed as a result of an operation on the corresponding pixels of two original images, a FX and b FX c(x) = (a(x), b(x)), x X. Operations of local sliding-window processing form an image c FX each pixel (a value c(x) F in a point x X) of which is a result of the transformation performed over the corresponding neighbourhood Q(x) of the original image a FX c(x) =
yQ(x)

a(y), x X,

were Q(x) = {z Z : z =(x1 + y1 , x2 + y2 ), y Y} is the arbitrarily neighbourhood of a point x X formed by a template set Y. In this case, the most popular processing mode is taking a weighted sum of neighbouring pixels, with the processing window being in the form of a square centred at the current pixel. c = a q = {(x, c(x) : c(x) =
yY

a(x + y) q(y)},

were a point set Y has the same dimension as a spatial domain X of a processing image and is dened as following Y = {(y1 , y2 , , yn ) : |yi | ki Z}, that is, as the processing window appears rectangular neighbourhood, symmetrically located relative to the current frame, the size of rectangular window for each coordinate equal 2ki + 1. For plain images when k1 = k2 = 1, the

local processing is performed by a 3 3 sliding window, when k1 = k2 = 2 by a 5 5 window, and so on. There is a special class of local neighbourhood operators called recursive neighbourhood operators (RNO). In the RNO, each pixel c(x) of the resulting image not only depends on a neighbourhood Q(x) of the original image a FX , but also on the pixels of a neighbourhood of the resulting image c FX . Such operations are usually implemented using the following iterative procedure: a(p) = a(p1) q, p = 1, 2, , P, in which P processing operations are fullled in sequence on the original image a(0) FX . Geometric transformation can hardly be referred to as a global as well as to local operations. The geometric transformation is given by some function of coordinate transformation y = G(x), which assigns to a certain point x X (with coordinates (x1 , , xn )) of the original image a FX the point y Y of the resulting image c FY with coordinates (y1 , , yn ). In the general case, every pixel c(y) of the resulting image c FY is formed as a result of some aggregation of values of such original image points that are spatially close to the coordinate x = G1 (y) c(y) = ({a(x + z) : z {0, 1}; x = G1 (y) }) where x denotes the oor function which returns the largest integer that is less than or equal to x. This operation can not be called global because for the calculation of some resulting image pixel need no more than four pixels of the original image. Nor can it be named local since these four points in the general case is not in the local neighbourhood of the resulting pixel. Thus, although when looking at the image as data, two-dimensionality comes to the foreground, the majority of image processing algorithms and methods are essentially sequential. On the one hand, this shows itself in the intrinsic computation structure of the algorithm. A very popular technique is progressive scanning of image pixels and local transformations (point operations or slidingwindow processing) [3, 4]. On the other hand, a considerable proportion of complex processing methods can be implemented through sequential application of some complete typical operations on the image. It is this circumstance which allows diverse image processing software systems developed for universal computers to be eciency applied to solving a wide range of research and applied problems of video-information processing and analysis. Fortunately, the structure of the majority of low-level algorithms makes possible a natural parallelization of the algorithms based on their intrinsic regularity.

4. Parallelization of image processing Below, we consider major parallelization variants of low-level image processing operations. A most natural way to perform parallel processing is through the decomposition of data based on the output image [15]: the resulting image pixels are broken down into nonoverlapping fragments, with each fragment being formed in parallel at a separate unit of the multiprocessor cluster or distributed-system computer, afterwards the fragments may be fused into an integral image. The image can be decomposed in the following ways: onedimensional decomposition in terms of a single coordinate, two-dimensional decomposition [21]. The choice of the original image decomposition depends on the processing operation type [16]. Point processing. In parallel point-processing algorithms, the breakdown of the original image(s) fully corresponds to that of the output image. The eciency of point processing, both for a single operation and a sequence of operations, is independent of the way in which the image is decomposed into fragments. Local processing over sliding window. These algorithms can easily be made parallel provided that the original image is broken down into overlapping fragments. The size of the overlapping images depends on that of the sliding window. The decomposition may be both in the one- and two-dimensional form. Because this type of processing results in a distributed image containing non-overlapping fragments, for the (RNO) operations to be sequentially executed, the overlapping fragments should be interchanged prior to proceeding to the subsequent stage. The waiting time during which data on the distributed image are matched may be dierent for dierent variants of the decomposition, depending on the number of fragments and on the communication environment parameters. For example, for a small number of fragments and high latency value, the one-dimensional decomposition is preferable. However, when the number of fragments increases, the volume of synchronization data transmitted is signicantly smaller for the two-dimensional decomposition. Global reduction operations. Being additive for the most part, the reduction operators are practically insensitive to the way in which the image under processing is broken down into fragments. However, with the aim of determining some parameters it would be useful to have overlapping regions of image fragments. Global processing operations. With these operations, the parallelization can most easily be implemented through a decomposition of the output image and replication of the entire original image in all computers involved in the processing. If it turns out impossible to replicate the image in full for whatever reason, there will be an intensive data exchange between the processing tasks implemented in parallel. In this case, an optimal choice of data distribution often depends on the operation algorithm employed and it appears very dicult to make the choice in advance. 4

Geometric transforms. These include scaling and rotation operations, and non-linear, general-form coordinate transformations. An optimal way to minimize eorts related to data transmission, while performing the generalform geometric transform on the distributed image, is by decomposition into square blocks. However, with a smallangle image rotation, the one-dimensional decomposition is more eective. However, in both cases, it would be useful to break-down the original image into overlapping fragments. For low-level operations implemented in parallel through the data decomposition, the most popular way of organizing computations has been by use of a management program combined with centralized storage of the images under processing. Using the dedicated management program, the original images are decomposed, with the fragments distributed between processing tasks and subsequently assembled into the resultant image. According to a specic set of operations required by the processing problem, the management program either assembles a complete image from interim images and then distributes it according to a new decomposition scheme, or performs interchange between overlapping fragments found in processors if the decomposition is to remain unchanged at the subsequent processing step. The eciency of the above-described approach to parallel or distributed image processing is signicantly degraded by bottlenecks, which include (1) the link to the image storage subsystem in the situation when tasks involved in fragment processing almost simultaneously send a request for data, and (2) the management program itself that performs the image decomposition, distributing fragments between the processing tasks and assembling the resultant image [17] [20]. It is vitally important to obviate or at least essentially reduce an important but time-waisting stage of distributed image processing that consists in dispatching the original image data to distributed system computers, followed by the subsequent fusion of the processed fragments into a resultant image. This can be implemented by pre-distributing the image data between the processing computers, with the datasets being stored where they are processed. Each fragment found on a separate network computer is being processed individually, with the processing result being stored on the same computer as part of a new distributed image obtained as a result of operation of all computers involved in the distributed processing. Thus, in the distributed systems described above the distributed image is a major storage element. 5. Distributed image structure The distributed image represents a data structure that prescribes a technique and parameters for image decomposition into fragments, the list of computers where the fragments are stored, their location and storage format. 5

Figure 1: Formation of distributed image data structure

In essence, it describes in which way the resultant image can be built from the fragments contained therein. Summing up the above analysis of image decomposition variants for dierent processing operations, it should be noted that decomposition of the distributed image into overlapping fragments seems to be preferable. In the core of the proposed approach is the idea that the proper size of fragment overlap is dictated, instead of parameters of the a priori unknown subsequent processing problem, by the necessity of ensuring a balanced loading of computers and an adequate level of fault tolerance regarding the distributed storage of image fragments. The distributed image is dened as a set of overlapping image fragments. A fragment stored in each of M computers is formed as follows. All image lines are divided into 2M same-size blocks. A distributed image fragment on the m-th computer contains two major blocks numbered 2m 1 and 2m. These two major blocks dene a variant of decomposition into nonoverlapping fragments. For obtaining the decomposition into overlapping fragments, which would simultaneously provide a minimal fault probability, two additional so-called shadow blocks are stored on a given computer. These are copies (shadows) of adjacent major blocks on neighboring computers, numbered 2m 2 and 2m + 1. Note that the lower major block is being stored as ashadow on a computer with smaller number and the higher on the computer with larger number. See Fig. 1. It is the data regarding the major blocks which is formed on the computer in the course of distributed image processing. Once processing has been completed, the shadow data is interchanged. Using the above distributed image data structure it becomes possible to retrieve the image following a storage unit fault, additionally allowing the major block pixels to be formed in most operations without the need of data interchange between neighboring computers in the course of distributed image processing. Thus, the distributed image represents an array of overlapping image fragments, stored in a distributed manner.

On each storage unit, the fragment contains major and shadow data. Being peering data in terms of reading, in the course of distributed processing a particular processor only produces major block pixels, with the shadow data acquired at the end of processing from the computers responsible for processing the corresponding major blocks. 6. Algorithm of dynamic computational load distribution The distributed image decomposition principle we propose makes possible the realization of an original algorithm for dynamic computational load distribution when performing point processing (PO) or sliding-window local processing (LNO and RNO). Below, we discuss the methods idea looking into the interaction of two neighboring processors, m-th and (m+1)th, when forming the resultant image fragment contained in the data blocks with numbers 2m and 2m+1. Note that having the entire information needed, each processing unit is able to do all this work independently (or almost all in the case of the sliding-window local processing). The m-th unit proceeds to form the 2m-th block lines starting from the rst line in increasing order; whereas the (m + 1)-th unit starts to form the (2m + 1)-th block lines from the last block line in decreasing order. After a certain number of lines have been formed, the neighboring units inform each other how long it has taken to do the work. Based on this information, the prognosticated image line number is computed at which the coprocessing will be nished. Thus, while processing, the neighbouring units advance to each other, reporting their respective computation rates when reaching preset stages. Note that the prognosticated image line at which the processes meet is being continuously updated depending on the current load of the processors. Thus, all the processes stop practically simultaneously, with the time delta being not larger than the time it takes to process a single line. Each unit is simultaneously involved in two such processes, alternately forming higher block lines in increasing order and lower block lines in decreasing order. Thus, with the resultant image formed in full, the data of the new distributed image will be distributed nonuniformly between the computers. Then, storage units of the newly formed image will interchange their data in the background mode in order for the distributed image structure to be adjusted to a required form. Note, however, that the user can get the requested data straight away once the processing is completed. From which time and how often does the processing unit need to inform the neighbor on the amount of the current processing done? The simplest variant may be as follows: once the rst line is processed the data message is sent containing the processing time. Based on the information acquired from 6

Figure 2: Distributed processing of image lines on four processing units

the neighboring unit it becomes possible to prognosticate the number of line at which the processes will meet. The greater is the number of lines processed prior to sending the message, the higher is the prognosis accuracy. Note, however, that since the performance of the neighbouring units may dier signicantly, this number cannot be very large. Let us consider an idealized non-balanced computation system one unit of which has a > 1 times higher performance than that of the others. With M same-performance processing units, the time it takes to process an image containing N lines is T0 = (N/M ) , where is the per-line processing time for a typical unit, with each unit processing N/M lines. If one of the units has a times higher performance, the processing time becomes T1 = N , M 1+

with the higher-performing computer processing N/(M 1 + ) lines and the others N/(M 1 + ) lines each. Since each unit accounts for 2N/M lines, if the magnitude exceeds 2(M 1) opt = , M 2 the processing time will be dened by the lower performing units and equal to T2 = N (M 2) , M (M 1)

with N (M 2)/M (M 1) processed lines per unit. Thus, if the distributed system consists of four units (M = 4),

plement the error-free data compression, which is critical when used in a distributed image processing system. 8. Image visualization Visualization of large-size images requires an acceptable interactivity level, which means that within 1-2 seconds the user must be able to preview a scaled-down version of the full image in a visualization application window of typical size of 1200 1000 pixels. In doing so, the storage system needs to form an adequate, several-times thinned image (e.g. earth images should be thinned up to 25 times on each coordinate). For large-size distributed images, the problem can be addressed using so-called image outlines available at each or some units of the distributed storage system. In this case, the visualization application, calling a fastest-access unit for the users computer, obtains data necessary for the image preview. The number of storage units containing the image outlines can be chosen based on the conguration of the local user network sharing the distributed storage system or an accessible redundancy data level for a given system. When viewing the image in full-size mode, only a chosen image fragment is displayed for the user. In the scrolling mode, data spooling takes place. To achieve a user-friendly interactivity level, it is highly recommended that spooling is performed in the pre paging mode. In this case, it is most convenient to decompose the image into rectangular fragments [5]. Overlapped fragments make scrolling smoother. 9. Conclusion 7. Fault-tolerance of distributed images An essential benet of the distributed storage systems is their fault tolerance, i.e. the ability to ensure data recovery following a distributed system unit fault. This is fullled by means of data redundancy in the form of overlapping image fragments. The redundancy required for data recovery following the fault of a storage unit can be minimized using the 1D decomposition in the form of horizontal strips with half-height overlap. This results in a two-fold increase of the total data volume per image in the distributed storage system. The 2D decomposition with the same-size overlap results in a four-fold increase in the data volume stored, though being unable to oer an increased overall fault tolerance. An extra redundancy is associated with the necessity to store not only the image fragment but also its outline, designed to ensure an acceptable visualization interactivity level. The redundancy of the distributed storage system can be essentially reduced through hierarchical image compression techniques that support multiresolution [3]. With available variants of the techniques, it is possible to im7 We have proposed a concept of distributed image data organization that addresses most of the current challenges. It allows the load to be dynamically distributed when doing distributed image processing with the aid of a decentralized balancing algorithm. An adequate level of fault tolerance of distributed image fragment storage is achieved. For most operations, the decomposition employed allows a new image to be formed without data transfer between the neighboring computers while doing the distributed image processing. The data organization discussed allows the distributed image to be visualized with an adequate interactivity level. In further articles we plan to consider in more detail issues of implementing the dynamic distribution of computational load between units of the distributed system of large-size image processing, distributed image faulttolerant storage and visualization. This work was nancially supported by the RussianAmerican program Basic Research and Higher Foundation (BRHE) and RFBR grant No. 10-07-00533.

Figure 3: Relative processing time reduction as a function of performance of a single unit

the proposed algorithm is able to uniformly distribute the load between the computers even when one unit has a three-times higher performance compared to the others. A half of the image lines will be processed by the higherperforming unit and the remaining units will each be responsible for 1/6. The line distribution between the processing units for this case is depicted in Fig. 2. With increasing number of processing/storage units, the performance dierence that can be compensated for without time loss gradually decreases to a two-fold. The reduction of image processing time with increasing performance of one unit for a varied number of computing units in the distributed system is given in Fig. 3. The advantage of the proposed algorithm is that it is fully decentralized, ensuring a uniform computational load distribution, provided that the performance ratio of the neighboring units is not larger than two.

References
[1] W.E. Alexander, D.S. Reeves, and C.S. Gloster. Parallel Image Processing with the Block Data Parallel Architecture. Proceedings of the IEEE, 84(7):947968, 1996. [2] M. Aritsugi, H. Fukatsu, and Y. Kanamori. Several partitioning strategies for parallel image convolution in a network of heterogeneous workstations. Parallel Computing, 27(3):269293, 2001. [3] D. Ballard and C. Brown. Computer Vision. Prentice Hall, 1982. [4] W. Caarls, P.P. Jonker, and H. Corporaal. Skeletons and Asynchronous RPC for Embedded Data- and Task Parallel Image Processing. IEICE Transactions on Information and Systems, E89-D(7):20362043, 2006. [5] A. Clematis, D. DAgostino, and A. Galizia. A Parallel IMAGE Processing Server for Distributed Applications. In Parallel Computing: Current & Future Issues of High-End Computing, Proceedings of the International Conference ParCo 2005, pages 607614, Genova, 2006. [6] P. Czarnul, A. Ciereszko, and M. Fraczak. Towards Ecient Parallel Image Processing on Cluster Grids using GIMP. In Lecture Notes in Computer Science, Vol. 3037, pages 451458, Berlin / Heidelberg, 2004. Springer. [7] M.V. Gashnikov, N.I. Glumov, S.B. Popov, V.V. Segreyev, and E.A. Farberov. Software System for Transmitting Large-Size Images via the Internet. Pattern Recognition and Image Analysis, 11(2):430432, 2001. [8] B. Gennart and R.D. Hersch. Computer-Aided Synthesis of Parallel Image Processing Applications. In Proceedings Conf. Parallel and Distributed Methods for Image Processing III, SPIE International Symposium on Optical Science, Engineering and Instrumentation, number July, pages 4861, Denver, Colorado, 1999. [9] P.P. Jonker and W. Caarls. Application Driven Design Of Embedded Real-Time Image Processors. In Proceedings of ACIVS 2003 (Advanced Concepts for Intelligent Vision Systems), pages 18, Ghent, Belgium, 2003. [10] P.P. Jonker, J.G.E. Olk, and C. Nicolescu. Distributed bucket processing: A paradigm embedded in a framework for the parallel processing of pixel sets. Parallel Computing, 34(12):735746, 2008. [11] G. Klimeck, M. Mcauley, R. Deen, F. Oyafuso, G. Yagi, E.M Dejong, and T.A Cwik. Near Real-Time Parallel Image Processing using Cluster Computers. In Space Mission Challenges for Information Technology, 2003. [12] A. Merigot and A. Petrosino. Parallel processing for image and video processing: Issues and challenges. Parallel Computing, 34(12):694699, 2008. [13] C. Nicolescu and P.P. Jonker. EASY-PIPE - An Easy to use parallel image processing environment based on algorithmic skeletons. In Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001, pages 1151 1157. IEEE Comput. Soc, 2001. [14] G.X. Ritter and J.N. Wilson. Handbook of Computer Vision Algorithms in Image Algebra. CRC Press Inc, BocaRaton, 1996. [15] F. J. Seinstra, D. Koelma, and J.M. Geusebroek. A software architecture for user transparent parallel image processing. Parallel Computing, 28(7-8):967993, 2002. [16] J. Serot and D. Ginhac. Skeletons for parallel image processing: an overview of the SKIPPER project. Parallel Computing, 28(12):16851708, 2002. [17] Z. Shen, J. Luo, G. Huang, D. Ming, W. Ma, and H. Sheng. Distributed computing model for processing remotely sensed images based on grid computing. Information Sciences, 177(2):504518, 2007. [18] V.A. Soifer, editor. Computer Image Processing, Part I: Basic concepts and theory. VDM Verlag Dr. Muller, 2010. [19] V.A. Soifer, editor. Computer Image Processing, Part II: Methods and algorithms. VDM Verlag Dr. Muller, 2010. [20] J.M. Squyres, A. Lumsdaine, and R.L. Stevenson. A toolkit for parallel image processing. In Parallel and Distributed Methods

for Image Processing II, Proceedings of SPIE 3452, pages 69 80, San Diego, California, USA, 1998. [21] R.-i. Taniguchi, Y. Makiyama, N. Tsuruta, S. Yonemoto, and D. Arita. Software platform for parallel image processing and computer vision. Proceedings of SPIE, 3166(2):210, 1997.

You might also like