Professional Documents
Culture Documents
(Lecture Notes in Computer Science 525) Thomas Ottmann (Auth.), Oliver Günther, Hans-Jörg Schek (Eds.) - Advances in Spatial Databases_ 2nd Symposium, SSD '91 Zurich, Switzerland, August 28–30, 1991 P
(Lecture Notes in Computer Science 525) Thomas Ottmann (Auth.), Oliver Günther, Hans-Jörg Schek (Eds.) - Advances in Spatial Databases_ 2nd Symposium, SSD '91 Zurich, Switzerland, August 28–30, 1991 P
is associated to Qy, and p; is associated to Q,,i< Lp. For example, in Figure 3 where N=3, the prefix P=010110=(Lp=6, Vp=22) is associated to the quadrant number 31. This number is obtained by associating (2, to the first bit (from the left ), Qpto the second one,Q, to the third one, and so forth; thus, #(P) = Q+ Qg+Qs+ Lp /2=31. = Lp.Vp), The hashing function used, is defined as : For any prefix P=p,p:Ps... UNI J if iis even *Qj,rif i is odd Lp HP) = B YE p*Q; where | 7 t Q, ist i ‘The quadtree can be implemented as a list which allows a sequential access, or as a tree which permits a direct access to a specific node. The hashing function described above permits to implement the Fl-Quadtree as a list of prefixes and allows a direct access. Thus, for any node the access is performed in just one logical disk access. 3.2 Insertion Algorithm As the Fl-Quadtree is a full structure, it is not necessary to store explicitly the prefixes. They are only used to compute the node addresses on disk, where their corresponding Imlds fields have to be stored, In an Imlds field, a single bit is associated to each Imld, For instance, the use of one byte to code the Images_Id field, allows us to insert eight images in the meta-image. If we suppose that one field size is composed by o: bytes, then the meta-image size is ci bytes, and it could range from 1 to an upper bound of 80: images. It is penalizing to fix in advance the number of images that could store the meta-image; therefore, we consider that the meta-image is dynamic. When it is empty or it contains less than 8 images, just one byte is associated to each field. If there are 8 images, the Fl-Quadiree is reorganized and two bytes are associated to each field. In the general case, if there are 80. images, the Fl-Quaduree is reorganized and (al) bytes are associated to each Imlds field. To insert an image, we have to code it in a linear quadtree and for each prefixe P of the quadtree, the identifier of the image is inserted in s(P). We note that the coding is performed using the notion of the active nodes [Shaffer87}. 4 Fuzzy Search In this section, we expose and analyze the fuzzy search capability provided by the structure. First, we give an idea about the complexity of the displacement of the pattern through the FI- Quadtree, then we expose and analyze the principle of the fuzzy search. We consider a pattern composed of M_pref prefixes. These prefixes address a set of quadrants having a total of M_pix pixels. The patter is contained in a minimal rectangle having L*I pixels size. It is coded asa linear quadtree of M_pref prefixes. We have to find the image that contains it. 4.1 Moving in the FI-Quadtree For a pattern included in a L*! minimal rectangle, the total number of translations through the meta-image is (2N-L+1) * (N-I+1)+1. Therefore, the larger this rectangle is, the less is the number of translations. This operation is very costly {Walsh88, Touir90a]. However, note that the translation can be done simultaneously with the filtering [Touir90b]. Such a process allows the filtering to be stopped in a given position, if it is shown that the pattern cannot exist in that position of the Fl-Quadtree. This means that no image contains the pattern in such a position,29 The translation is then interrupted and it is performed again in another position, To reduce the number of translations, the user can indicate a particular region that could contain the pattern, If this region is (R * r) pixels size, the number of translations is then reduced to (RLH) * HHL 4.2 "Microscopic" Filtering In this section, we present the principle of the filtering. First we give the definition of a naive global matching between a pattern and an image, then we introduce a new definition which is more precise. An image I contains a pattern M if all the prefixes of M match with some prefixes of I. This definition is not sufficient and not precise. Indeed, let us suppose that we find a pattern repretentd by (0000001010, 0000011000, 0000100, 0000111), and it exists an image having a prefix P=0000, we note that all the prefixes of M match with P, and consequently that I contains M. We note that the detection is not precise. To improve the detection and the filtering, we introduce two matching criterias corresponding to two filtering levels. The first is a microscopic level (or prefix level) represented by Definition, and the second is a macroscopic level (or pattern level) represented by Definition2. 1 419243..4iq= (La,Vq) and P = p,P2Ps,.Pip= (Lp, Vp) be two prefixes of M and I respectively; we say that Q matches with P to within a factor 8, if Lq=Lp+8,820 matchg(Q,P)= | and Ge (let beet) We note that : if & = 0, Q and P represent the same quadrant; if § = 2, the quadrant associated to Q is included into the one of P and it has a quarter size of the one of P; and so forth. The more § is important, the more the matching is impresice, 43° Filtering Distance Ina second definition, we will take two new parameters into consideration. N_pref defines'the number of the prefixes of an image that match the pattern, and N_pix defines the number of pixels addressed by the set of N_pref prefixes. Definition 2: We define the filtering ratio (or the filtering distance) between M and I by: a1. (Nopref | Nopix aM) = 5 * Ge {_pref * M_pix 7” Note that if d(M,I) = 1, the search is an exact one, Furthermore, because for any image I, M_pref 2 N_pref and M_pix 2 N_pix, d(Mil) < 1. Thus, the precision of the search could be defined by the user. This means that the user gives a precision coefficient K (0 < K < 1), and the system searches any image I that verifies d(M,I) 2 K. According to these definitions, we can compare the filtering distance of two images Il, 12 to a given pattern M. We note d(M,I1) > d(M,12) if Gee Noi), > (Fae ( Neping4.4 Fuzzy Filtering Algorithm According to these two definitions, and the definition of the hashing function #,, the principle of the search algorithm is: for a given position of the pattern, two variables N_pref, N_pix are associated to each image, and initialized to zero. a- for each prefix P of the pattern, resulting from the translation do access to xt (P) and modify (N_pref, N_pix) associated to each Image_Identifier contained in 3 (P) field; b- select the candidate image CI where d(M,CI) = { MAX(d(M,D); I belongs to the set of the inserted images); ¢- insert CI in the set of the candidate images; d- select a new position of the pattern and go to stepl. select the image SI (solution of the problem) where d(M,SI) = {MAX(d(M,CI)); where CI belongs to the set of candidate images 4.5 Improvement of the Fuzzy Search As introduced above (Section2), fuzzy search consists in displacing the pattern among the FI- Quadtree and for each position, in performing a filtering and in selecting a candidate image. Thus, before beginning the search in a new position, the filtering that corresponds to a previous position has to be achieved and a candidate image is selected. This method is not well adapted to perform this operation, and it generates an important I/O complexity. Indeed, if the searched pattern has almost the same size that the main memory buffer associated to the Fl- Quadiree, the number of disk accesses is nearly equal to the number of positions where the pattern has to be searched. To solve the problem of excessive number of disk accesses, we execute the filtering in some steps. We filter the pattern in a set of positions, only for the Fl-Quadtree nodes that are in the main memory; if a filtering is not achieved and needs a disk access, it is temporarily interrupted and a new filtering is performed in another position of the pattem, in order to finish this partial filtering later. 5. Experimental Results ‘The Fl-Quadtree has been implemented as a part of the development of an Intelligent Search System in Graphics and Images [Cheiney90]. This implementation is realized on a Sun_Sparc workstation. Alll the inserted images are 1024x1024 pixels size, We check the performance of the insertion, deletion and the search operations. The measurements consist in computing the execution time and the number of disk accesses for each operation, In order to obtain more reliable estimation of the processing time, (I)each operation is applied many times and the average of all the obtained results is taken, (2) the inserted images are very disparate and the number of prefixes that compose each of them varies between 25 and 500,000. 5.1 Insertion ‘The experimental results of this operation show that the insertion of a complex image (which is composed with more than 100,000 prefixes) requires an important processing time. This complexity is due to (1) the reading of the linear quadtree of the image to be inserted and (2)3t the processing and the updating the Fl-Quadtree. We check this operation with different size of the buffer assigned to the Fl-Quadtree. Figures 4 & 5 (where the buffer size is successively equal to 2048, 8196 and 16384 bytes) show that the processing time of this operation varies linearly with the complexity of the image, whereas the /O complexity varies with the complexity of the image and with the distribution of the quadrants into the image. Time (See) avo ° © — 100000 200000#Prefixes 400000 Soooon —®=—=—100000 200000 Prefixes 400000 $00000 Figure 4; Variation of the Time Figure 5: Variation of the 10 With the number of prefixes with the number of prefixes 5.2 Fuzzy search Itis difficult to give an exact idea about the complexity of this operation using the experimental results. Indeed, (1) the complexity of the search varies with the number of the positions where the pattern has to be searched before reaching the solution; so, the searched pattem could be found immediately or checked in each possible position without finding any solution. (2) The complexity of the search depends on the number of prefixes that compose the searched pattern; the search of two patterns composed respectively by 30 and 500,000 prefixes have not the same complexity. Thus, we test this operation with patterns composed with various number of prefixes, and after searching in various positions noted P_i, i=1, 2 and 3. Figure 6 shows the general behavior of this operation with the size of the searched pattern and P_i . 20 g] eH os i] i= 8 100 o © 100000 200000 Prefixes 400000 $0000 Figure 6: Variation of the complexity of the search with the number of prefixes 6. Conclusions iin contribution of this paper is the proposal of a new Quadtree-based data structure allowing a fuzzy search of pattems in an image database. We have investigated different types of manipulations (insertion, searching and data organization) within this structure, and we have32 shown that it is well adapted for content-oriented retrieval and fuzzy search. We have supposed that the inserted images are binary ones and that the processing is a sequential one. In a forthcoming work, we will investigate a parallel processing for the fuzzy search using this data structure, where the behavior of this problem will be analyzed. 7. References [Ang89] CH. ANa,H. SAMET, "Node Distribution in a PR Quadtree”, In Proceedings 1st International ‘Symposium on Large Spatial Databases, Santa Barbara, USA, July 1989. [Chang88] S.K. CHaNo, C.W. Yan, T. Apr, D. Dimmrnorr, “An Inelligent Image Database System", IEEE Transactions on Software Engineering, Vol.14, N°S, 1988. [Chang89] S.K, CHANG, E, Juncsrr, Y. Li, * The Design of Pictorial Database Based Upon the Theory of Symbolic Projections", In Proceedings 1st International Symposium on Large Spatial Databases, Santa Barbara, USA, July 1989. [Cheiney90) LP, Cuxney, B. KenvEnv Tinage Data Storage and Manipulations for Multimedia Database Systems, In Proceedings 4th Intemational Conference on Spatial Data Handling, Zurich, Swineland, July 1990. [Gargantini82] I. Garcantimi: "An Efficient Way to Represent Quadtrees", In Communications of the ‘cM, Vol .25, N° 12, 1982, [Meyer-Wegener89] K. MEYER, V.Y, LUM, C.T. Wu :"Image Management in Multimedia Database System’ In Proceedings rir 12 2.6 Working Conference on Visual Daabate Systems, Tokyo, Japan, ‘April 1989. [Kedem82] G, KepEm, "The Quadtree-CIF tree: a Data Structure for Hierarchical on-line algorithms", In Proceedings 19" Design Automation Conference, Las Vegas, USA, June 1983. [Morton66] G.M, Morton, " A Computer Oriented Geodetic Database and a New Technique in File Sequencing” IBM Lid,, Ottawa, Canada, 1966. {Orenstein86] J. ORENSTEIN : "Spatial Query Processing in Object-Oriented Database System", In Proceedings ACM SIGMOD'86 International Conference on Management of Data, Washington, USA, May 1986. [Samet84] H.SaMer, "The Quadsree and Related Hierarchical Data Structures", In ACM Computing Surveys , Vol.16, N°2, 1984. [Samet90a] H. Samer: "Applications of Spatial Data Structures", Addison-Wesley, 1990. {Samei90b] H. SAMET: "The Design and Analysis of Spatial Data Structures", Addison-Wesley, 1990. {Shaffer87] C.A. SHAFFER, H, SAMET, “Optimal Quadtree Construction Algorithms", In Computer Vision, Graphics and Image Processing, Vol 37,N°3, 1987. {Tamoura84) H.TaMoURA, N.YoxOYA: "Image Database System: A Survey", In Pattern Recognition, Vol.17, N°1, 1984. [Touir90a] A. Tout, B. KERHERVE: "Shape Translation in Images Encoded by Linear Quadsree", In Proceedings IFIP TC 5.10 Working Conference on Modeling in Computer Graphics, Tokyo, Japan April 1991. [Touiz90b] A. TouR: "Search Algorithms in Image Databases", Internal Report ENST/INF/BD/90_10 [Walsh88] T.R. WALSH: "Efficient Axis-Translation of Binary Digital Pictures by Blocks by Linear Quadiree Representation” Computer Vision, Graphics and Images Processing Vol .4i, N°3, 1988. [Woelk86] D. WoELK, W. Kim, W. LUTHER, "An Oriented Approach to Multimedia Databases",In Proceedings ACM SIGMOD'86 International Conference on Management of Data, Washington, USA, May 1986.Meta-Knowledge and Data Models‘THE IMPORTANCE OF METAKNOWLEDGE FOR ENVIRONMENTAL INFORMATION SYSTEMS! F, J. Radermacher Forschungsinstitut fiir anwendungsorlentierte Wissensverarbeitung (FAW) Helmbholtzstr. 16, D-7900 Ulm Abstract ‘This paper deals with the importance of metalnowledge as a key topic for new and challenging applications of information systems. To make this point tangible, a number of interesting applications, particularly in the fleld of environmental information systems, are discussed. With regard to these applications, difficulties resulting from the distribution of information, from the implicit usage of knowledge by people involved, and difficulties concerning the transformation of data into information are addressed. The insights given here essentially build on a number of projects and workshops on the topic dealt with at the FAW in Ulm. 1, The problem framework ‘Typical modern information systems process an abundance of data available from many sources, but metaknowledge about that data is usually either not available at all or not available in an explicitly given form. Often, enormous amounts of low-level data must somehow be aggregated to obtain meaningful insights. This task generally entails the application of methods within a particular model framework as a means of creating new data from available data. New data may address new questions, the clarification of ambiguities, or even the treatment of inconsistencies between different sources of (low-level) data. Alternatively, information may be aggregated as a basis for meaningful answers to ‘This paper strongly builds on the articles (15, 28], given in the references.36 overarching questions posed, for example, by high-level decision makers. Such questions occur frequently in connection with modern information systems |29]. They are addressed in a number of FAW projects, particularly topics in the area of environmental information systems [9, 11, 19, 28]. This is an economically and politically important field and relevant to both the public and government |2, 16, 19]. In the environmental area, the FAW projects ZEUS |13, 14, 18, 20] and WINHEDA [4, 28] deal with the integration of different data sources, using a number of particular models. Applications in ZEUS deal with water management and particularly with the identification of sites for ground-water monitoring stations [18]. The fleld of GIS [10] also has relevance for these projects and for the FAW Project RESEDA [32], which deals with remote sensing. Remote sensing will eventually Produce terabytes per day of valuable image data that cannot reasonably be processed with Present techniques [27]. But in the long run, remote sensing constitutes one of the few Tealistic hopes for organizing a regular monitoring of the state of the environment worldwide. This holds similarly for automization in chemical water analysis, ¢.g., pursued in the FAW Project WANDA [35, 37]. The environmental area is also a ficld where the availability of metaknowledge will be of crucial importance. Data from many sources and quite different modeling frameworks will have to be integrated, and any broad automation in this respect will work only if formalized metaknowledge is available for all data bases and model bases involved, This is also particularly true for any natural language access to data bases, as done in the FAW project NAUDA. In all of these areas, the FAW is active in pursuing ways to help bring about the formulation of organizational and standard frameworks that in the long run will make such metaknowledge available [11, 15]. 2. The crucial importance of environmental monitoring and encountered problems with heterogeneity Given the dangers to the state of the earth, due, above all, to overpopulation, increasing giobal consumption, and increasingly dangerous types of waste, one of the most urgent problems we must address is environmental monitoring on both a local and a global scale 116, 36). A prominent example of such an effort is the environmental information system (UIS) of the State of Baden-Warttemberg [2, 11, 21, 26] which addresses the (sem!-) automatic access to enormous bodies of information available on the status of the environment and tries to integrate this information with kmowledge concerning administrative processes and responsibilities in this domain. Similar advances have also ‘been made on a worldwide scale. For instance, the United Nations has initiated the UN37 Environment Program, which includes the Earth-Watch Programs Global Environment Monitoring System (GEMS), the Decentralized Environmental Information System (INFOTERRA), and the International Register of Potentially Toxic Chemicals (IRPTC). These UN activities are complemented by similar German and other European programs. Of Particular importance to the topics addressed here is the recent HEM initiative (22), within GEMS, whose focus 1s the harmonization of environmental measurements. In fact, this initiative is the first attempt on an international scale to fully address the topic of metaknowledge management of dissimilar sources of information on the environment. The HEM initiative reflects the negative experiences over the last decade with uncoordinated, non-standardized approaches to data collection. Whenever a clear model framework was missing, graveyards of uncomparable data have resulted more often than new knowledge sources. 8. Integration of distributed heterogeneous data bases ‘The next fundamental step in the integration of data base systems as part of information systems technology is the integration of distributed heterogeneous data bases (which is Precisely the topic of the FAW project WINHEDA, referred to later), In fact, in a recent report to the National Science Foundation (NSF) [3] the US data base community stated that this task will be one of the greatest challenges in the data base field for the next decade. It is a field in which the US data base community is trying to maintain its present strong position in data base technology and where many activities are going on worldwide [1, 3, 5, 10, 31, 33, 34, 38], Important topics discussed are concerned with non-standard data base systems, autonomy, extendability, cooperation, and federation of data base systems (for a number of clarifying examples from applications, cf. {34)). The paramount importance of the integration of distributed heterogeneous data bases results directly from the nature of many applications. That this now comes so strongly into the view of research programs is due to the progress in data base technology and, even more, in communication facilities, Particularly computer networks. Major technical problems result from different data structures, conceptual schemata, query languages, and network protocols [34]. But, the hardest problems with integration are not technical in nature; rather, they are due to the semantics of concepts and attributes used. This aspect also includes the possibility of inconsistent or contradictary data in the different data sources |1, 3, 34]. Addressing semantic differences is difficult and quite different from more technical issues.38 ‘The problem requires knowledge about the nature of the data stored in different places. Future automated solutions will require the representation and availability of particular Imowledge that then must be appropriately used. One of the fundamental questions that will arise in this framework is whether concepts and attributes can be translated into a standardized frame of reference of a modest, manageable size [15, 23, 28], or whether an almost complete computer understanding of language, coupled with an efficient representation of general and common-sense knowledge of the world will be required. The latter approach 1s followed in the CYC project [25] at Microelectronics and Computer Technology Corporation (MCC) in Austin: a bold undertaking to achieve the described goal via the integration of up to 100 million pleces of knowledge statements in a huge integrated Imowledge base and processing framework. 4, Dealing with metaknowledge When different information sources have to be integrated and accessed automatically, the availability and proper use of metaknowledge concerning the different sources is essential [8, 15, 24]. This aspect is particularly evident in attempts to integrate inconsistent or even contradictory data, as is often the case with the integration of heterogencous distributed data bases. In this context, metaknowledge concerns the precise definition and classification of the data involved in a kind of self-explanatory way, where again the necessity for a proper reference framework or a system such as the one being developed in the CYC project at MCC should be mentioned. Relevant aspects of the metaknowledge needed may be the quality of the data, its origin, the forecasting potential, the updating sequence, and soon. Note that such metalmowledge {s also essential in providing for any advanced natural language access to such systems, which is the topic of the FAW project NAUDA. This results from the need to answer questions more complicated than the information stored directly in the data base Itself [11, 12, 33, 39]. Examples of such questions might be: what kind of information is stored in the data base? What Is the quality of the data? What kinds of questions can, or cannot be answered with a certain precision based on particular data? Furthermore, metaknowledge will have to include information on physical access paths to information systems, and also on access rights to knowledge sources. Within the HEM meta data base design project [22], mentioned above, that has been initiated by the UN, aspects of metalmowledge will include: the name of the data base,39 geographic scope, data content, keywords, date of inception, update frequency, measurement techniques, classification standards, accuracy, quality control, level of detail, geographic referencing, responsible organization, contact name, and conditions of access. 5, Data and information In many public discussions, it has become common to emphasize the difference between data and information [15]. We often speak of "graveyards of data" which we cannot handle, while on the other side we identify a huge lack of real information. It is hard to formalize this difference though. Generally, we mean with data quite simple and elementary aspects in some context. Usually, the nature of basic data is quite clear, and such data is available in great number. Typically, one might think of certain values for chemical substances or the number of citizens living in particular areas of a town. Usually, with regard to a fact, there exists a clear method for identifying whether such a basic value is correct or not, be it concentrations of chemical substances or persons living somewhere or not. This also means that we have the feeling that data 1s something solid, something clear, something not questionable, that can be obtained or that can be infered in a standardized way. Contrary to that, information seems to be something on a higher level, something that cannot be easily measured or obtained, rather something that has to be compiled in a difficult and not so obvious way from many items of data with respect to particular questions. For the most part, decision makers need information to make their decisions, and not plain data. For instance, they might seek information concerning the quality of the air in a particular area or information concerning the development of the social structure of some residental area over time. Notions like quality, social structure, development, and so on, present the hardest Problems. Usually, such relevant information is closely related to problems and tasks which must be addressed in a particular context. Certainly, it is often difficult to distinguish whether something is data or information. But if one looks into the historical context of gaining information from data in the area of environmental information systems, it has been the case that there were experts with a great and detailed insight into the nature of the particular data involved in that process, the way this data was generated, and the way this data was used for providing answers. Intuition and personal insight of those users of the data and users of the methods have guaranteed Teasonable results to a considerable extent. From that point of view, the strong involvement of experts from the enviromental administration in any formulation of answers for questions40 from the political realm is well-justified, though this procedure usually requires a lot of time. Recently, with all the increasing technical options and the expanding requirements from society, new developments became unavoidable. 6, Greater distances between data and information These new applications seek to integrate ever-larger and more distributed sets of data to deal with always more ambitious applications in situations that are characterized by a high degree of expectation and personal involvement in society and politics. It is therefore necessary to involve the data in processes of aggregation which come from vastly different Places and islands of data. Unfortunately, given the distributed character, there is always less pre-knowledge available concerning the origin of the data, the implicitly used models, and the potential of available algorithms. Information is missing with people and in the data itself. This results in extreme limits with respect to a sound integration of the data. Actually, it 1s a disturbing observation for our society that often it is impossible to gain needed additional information concerning old data at all; the information providers are no longer there or cannot remember. Given this characteristic situation, the quest for meta- information 1s the need to store additional information with every data item, such as information concerning data sources, on how to find related data, how to connect it with other data, physical access methods, and what the access rights are. ‘The point of view taken here with respect to a solid use of data comes from philosophical epistemology [30] and sees the semantics of data coming from a modeling context that tells us how algorithms might process the data. Following such a recursive process, necessary information for applications should eventually be obtained from higher levels of the data hierarchy and be composed of sound applications of algorithms to recursively obtained data. ‘The essential aspect are therefore developments in the field of model management, which should lead to environments for the right kind of integration of models, algorithms, and data in the sense of a model-driven application of algorithms to data in order to generate and further process such new data. This is certainly not a topic to be dealt with in the data base context alone [34]. Other contributions will have to come, for instance, from model management approaches [6, 7, 16, 17, 30].ro 7. Information for high-level decision makers in the environmental field In the field of environment, it seems particularly difficult to be prepared from the data acquisition point even for some of the most important topics that might come up In the future [2, 21, 26}. Also, the used algorithmic methods and the background models involved are particularly difficult. We know, for instance, that distribution models for ground water pollution that use simple linear interpolation methods usually will lead to strongly incorrect results. One also has to to take into account that there is high political involvement in this topic and that many kinds of responsibility are involved. Consequently, there is much controversy surrounding the interpretation of data. On the other hand, it is a fleld that many private citizens and politicians want to be engaged in and where they want to find a proper course of action. So we should offer to them, whenever possible, the ability to exploit huge data sets with the kind of methods capable of extracting answers to interesting questions; and the answers will often lead to follow-up questions. If we do not have this basis, we must take into account that a number of questions will never be asked, reducing the quality of the decision making (19, 29]. In this sense, this is an area where we have to be particularly concerned with handling these questions. This means we have to deal with distributed heterogeneous data sources, and also, that models, algorithms, and data have somehow to be integrated. It is therefore a particularly challenging fleld to study the questions of metaknowledge handling. Certainly, the work on the Baden-Warttemberg Environmental Information System is particularly innovative in this respect. The FAW is especially glad to be involved in such basic and challenging work, and to bring the topics presented here into this framework, In particular, the projects such as ZEUS, RESEDA, WANDA, WINHEDA, and NAUDA at the FAW provide many insights and practical achievements, and continually offer valuable feedback on what can, and cannot be accomplished in this area. ‘Summary (1) The real tough modeling and algorithmic tasks in environmental information processing cannot be automated at present. From that point of view, personal information systems, such as the issue-based information system approach (IBIS), are of importance [14}. (2) Chemical information can be represented but is very special in character (35). (8) Remote sensing constitutes a great hope for systematic monitoring but has particularly challenging requirements, The coupling with GIS is necessary in that respect. (4) The topic of distributed heterogeneous data bases has to be addressed with much moreae emphasis. In doing so, the metaknowledge aspect is of crucial importance, as it is in natural language access systems. A major insight says that if one does the right kind of design of a data base in the first place, this makes it much casier to deal with the heterogeneity and with the natural language access. (6) Object-oriented geoinformation systems coupled with tools from Al, statistics, and decision theory will result in considerable steps forward in the fleld of environmental information systems. Such progress will underline that by now mathematics and computer science have a great potential to offer when research for the environment has to be organized. Acknowledgments I would like to thank the many friends, colleagues, co-workers, and project partners who contributed to the ideas given in this text, by many discussions and through intensive Project work. References 1, Alonso, R.; Garcia-Molina, H.: Salem, K.: Concurrency Control and Recovery for Global Procedures in Federated Database Systems, IEEE Data Engincering 10, No. 3, 5-11, September 1987 2, Baumhauer, W.: Umweltpolitik in Baden-Warttemberg am Beispiel des Umweltinforma- tlonssystems, BDVI-Forum 3/1989 8. Brodie, M. et al.: Database systems: Achievements and Opportunities, Report of the NSF Invitational Workshop on Future Directions in DBMS Research, 1990 4. Endrikat, A. Michalski, R.: The WINHEDA Prototype: Knowledge-Based Access to Distributed Heterogeneous Knowledge Sources, FAW Technical Report, FAW-TR-91012, 1991 5. Garcla-Molina, H.; Wiederhold, G.; Lindsay, B.: Research Directions for Distributed Databases, in: ACM SIGMOD Record - Special Issue on Future Directions for Database Research, W. Kim (Ed.), Vol. 13, No. 4, 1990 6. Gaul, W.; Schader, M. (Eds.): Data, Expert Knowledge and Decisions, Springer Verlag, Berlin-Heldelberg-New York, 1988 7. Geoffrion, A.: Structured Modeling, UCLA Graduate School of Management, Los Angeles, 1988 8. Greenberg, B.V.: Developing An Expert System for Edit and Imputation, ECSC-EEC- EAEC, Briissel, 1989 9. Ginther, O: Data Management in Environmental Information Systems; Proceedings 5. Symposium fir den Umweltschutz; Informatik-Fachberichte, Springer-Verlag, Berlin- Heidelberg-New York, 1990 10. Ginther, O.; Buchmann, A: Future Trends in Spatial Databases, in: IKEE Data Engineering Bulletin - Special Issue on Directions for Future DBMS Research and Development, W. Kim (Ed..), Vol. 18, No. 4, 1990