Requirements for a Bioinformatics

Infrastructure in Germany for future
Research with bio-economic Relevance

Recommendations of the BioEconomyCouncil

Requirements for a Bioinformatics
Infrastructure in Germany for future
Research with bio-economic Relevance

Recommendations of the BioEconomyCouncil

.

action.Table of Contents Summary 6 Introduction 7 Bio-economic potential of modern biosciences 7 Bioinformatics topics 8 Recommendations 11 a) Infrastructure 11 b) Optimisation of the use of computing capacities 14 c) Development of long-term strategies for research. and funding 15 Attachment 16 Current bioinformatics facilities – Examples of potential expertise centres 16 Glossary 30 4|5 .

in order to • i mprove the utilisation of local resources through comprehensive resource plan- ning • i mprove the conditions for transferring data via cloud computing (costs. . well-equipped. Thomas Hartsch. Alexander Goesmann. to meet bio-economic requirements. Norbert Reinsch. Wolfgang Wiechert. provision. and analysis (data management). security. and specialised centres of expertise and a comprehensive coordinat- ing body with the following responsibilities: • N  etworking and funding local centres of expertise for the purposes of ensuring the development of technology • Increasing knowledge transfer between biology research and bioinformatics • Establishing standards for storing and analysing data • Making the necessary software tools freely available and standardising interfaces • Development of long-term strategies for research. The central topics in the area of bioinformatics are the development of flex- ible pipelines running in parallel for input. bioinformatics will play an important role. Frank Oliver Glöckner. in order to • Improve the conditions for joint public-private funding of collaborative projects • Promote the sustainability of available data resources • Optimisation of the use of computing capacities. the application of biological knowledge and methods gleaned from the biosciences will be of increasing economic relevance. The bioinformatics infrastructure needs to be expanded in or- der to enable further research. as well all those who participated in the workshop “Bioinformatics”. Chris-Carolin Schön. Klaus Mayer. all of whom made a vital contribution to developing these Recommendations. and funding. action. In this regard. labour) • provide supercomputers for specific applications 1   The BioEconomyCouncil would like to thank the members of the Steering Committee Alfred Pühler. the enhancement of statistical methods of analysis (data analysis). Eric von Lieres. In order to shape these topics as best as possible. as well as the use of research findings. and optimisation of predictive models (data processing). the following action is needed: • E  stablishment of a bioinformatics infrastructure consisting of a number of local. and Ralf Zimmer.Summary Summary 1 In the coming years.

and so-called OMICS technologies – and bioinformatics tools for linking and analysing the generated data enable a deep understanding of biological interrelationships. This permits the development of. the development of new and useful biotechnology processes. The combination of new research tech- nologies – such as next-generation sequencing. and more accurate orientation of crop protection and veterinary medicine. inter alia. to complex in- teractions that take place within an ecosystem. The data available today and in the future increasingly permit a comprehensive modelling of both the process of central metabolism and select individual synthesis pathways. systems biology approaches for the targeted supplementation and optimisation of current breeding processes. high-throughput precision phenotyping. economic use of biologi- cal resources. global exchange. This ranges from detailed knowledge of the genetic make-up of individual species or individual or- ganisms. the targeted improvement in the breeding of agricultural crops and farm ani- mals. biology re- search has in recent years become a science generating massive amounts of data. such models can serve as the basis for the targeted redesign of entire metabolic pathways in technologically useful organisms. and analysis of this data volume.Introduction Bio-economic potential of modern biosciences Biology in flux With the emergence of a broad spectrum of new methods and technologies. to mechanisms for expressing their phenotypic characteristics. which is currently insufficient for meeting the needs of research. This includes. In addi- tion. a deeper understanding of evolutionary interrelationships also will contribute to the discovery and use of new biological potentials with the aid of biodiversity research. 6|7 . The simultaneous development of bioinformatics constitutes the prerequisite for the storage. for instance. In so-called “synthetic biology”. is increasingly emerging as the limiting factor for the optimal future use of the entire bio-economic potential of modern biosciences. The bioinformatics infrastructure. From basic research to applied science The growing understanding of the mechanisms underlying the expression of characteris- tics in organisms is generating new possibilities for sustainable. The various fields of biology re- search – from basic research to applied research – show a similar need for action in the area of bioinformatics. Sustainable economic concepts for extracting biocatalysts and bioactive agents from various organisms can also be developed by exploiting the recently created possibilities for directly accessing the ge- netic material of microorganisms not able to be cultivated on a laboratory scale.

other countries. Some countries.. over- arching meta-analyses. are currently developing them. to data analysis. integration of disparate data types. provision. Data management In the area of data management. data maintenance and structuring. In order to meet these challenges. g. Intuitive tools are needed for visualising and examining the data. modelling and simulation of complex systems are playing an increasingly important role. primary data from sequencing analysis by using suitable methods for data reduction and compression. the challenge is not only to manage the exponentially growing amount of data but also to take into account the great heterogeneity of primary data. and the transformation of findings and techniques from basic research into applied research and development. it is essential that systems be created not just for storing and structuring the generated data but also for making such data available for analysis and interpretation. In view of the enormous amount of data. such as Sweden (BILS). At the international level as well. this makes it all the more urgent to push forward the development of a German bioinformatics infrastructure. the main challenge is to develop efficient strategies for reducing the complexity and volume of. efforts are already under- way to establish overarching infrastructure programmes in order to achieve better net- working and data exchange (e. but in particular to the competitiveness of German research. population genetics. biometric analysis of data. such as the Netherlands (NBIC). well-organised bio- informatics structural programmes in place. g. and analysis. In particular. . it is necessary to develop flexible pipelines run- ning in parallel for input.Introduction The trend-setting significance of bioinformatics has already been recognised in many European and non-European countries. which comprises. ELIXIR in Europe). Bioinformatics topics The bioinformatics spectrum ranges from the fundamental problematic of data manage- ment. e. already have comprehensive. In this regard. and France (ReNaBi). in particular. With respect also to international ex- change. Examples of this include statistical and quantitative genetics. Switzerland (SIB).

and in other are- as. the initiative “GABI / Plant Biotechnology of the Future” has already resulted in the development of outstanding. as well as biological resources and analysis tools. In the area of microbiology. on the other. efficient. process. scalable software for their efficient use. physiology. In the competence networks for agricultural research (CROP. bioinformatics has already been successfully integrated in agricultural and biosciences research collaborations. molecular biology. using genome analysis and precision phenotyping produces high-dimensional data volumes. statistical approaches for data analysis have to be developed in parallel with data management. and breeding. net. internationally recognised expertise in a number of areas in green bioinformatics.. identification. Today. are. whose optimal use can be ensured only by bringing together all existing information. biodiversity research. Germany has imple- mented a recognised. Worthy of mention in the area of animal research are the projects of the FUGATO (Functional Genome Analysis in Animal Organisms) initiative. g. there are already a number of individual data sets containing molecular and phenotypic information. on the one hand. biotechnology.SENSe. exemplary university system for training young bio-informatics researchers. In the area of plant research. Data processing The rational and data-driven selection. Existing expertise centres Some of the problems discussed here are currently being worked on.Data analysis New. biotechnological processes governing conversion of materi- als. and analysis data from plant and animal research) form the basis for interdisciplinary and translational research. such as yield and resource efficiency. statistically valid estimates about propagated data uncertainties and. This requires innovative bio- informatics concepts designed to establish knowledge bases that ensure the linking of individual databases and thus the integration of heterogeneous data as well. This has created excellent conditions for es- tablishing a multi-tiered green bioinformatics platform. Both aspects are of fundamental significance for the combination of two especially relevant optimisation strategies: Necessary for the optimal planning of new experiments or the targeted perfor- mance optimisation of. and validation of suitable models similarly require the availability of local and central computer resources. They are es- sential for the creation of functional models and simulation approaches. g. as well as sensor. The genetic analysis of complex features. sophisticated solutions and systems are already in place. robust predictions about new measurements. PHENOMICS. Moreover. The integration and comparative analysis of data and results from various research and application areas (e. 8|9 . e. which constitute a fundamental building block for future bio-economic action. as well as the development of customised. and Synbreed). whose optimal use requires the continual refinement of statistical methods of analysis. such research initiatives as GenoMik/PathoGenoMik and the European excellence net- work “Marine Genomics Europe” have in recent years made an important contribution to the successful development of microbial genome research in Germany and Europe.

Cologne. first. existing bioinformatics facilities be strengthened and then. Bonn. In the field of plant research. the Jacobs University. Düsseldorf. . In the field of the modelling of biochemi- cal networks and supercomputing applications in systems biology. Plant Bioinformatics at Gatersleben-Halle. the Bavarian State Research Center for Agriculture in Grub. Tübingen-Hohen- heim region.Introduction Therefore. it seems appropriate that. the Jülich Research Centre likewise has an international reputation. The Bremen region – with the Max Planck Institute for Marine Microbiology. and the Animal Breeding and Genetics Department at the University of Göttingen. the institutes of the so-called “ABCD/J” region (Aachen. Jülich). and the Max Planck Institute of Molecular Plant Physiology in Golm. where needed. The Center for Biotechnology (CeBiTec) at the University of Bielefeld would make a suitable expertise centre in the field of biotechnology. and the Alfred Wegener Institute for Polar and Marine Research in Bremerhaven – constitutes a potential expertise centre for the field of environmental microbiology and biodiversity research. with regard to the development of expertise centres. the devel- opment of new centres be promoted. In the field of ani- mal research. there are already five potential expertise centres: the Munich Bioinformatics Centre. the Center for Marine Environmental Sciences at the University of Bremen. mention should be made of the data centres Vit Verden. the Leibniz Institute for Farm Animal Biology in Dummerstorf.

evolving. and (iv) have permanent structures for educating young researchers and training users (see Attachment). (iii) are firmly embedded in the research environment through national and international collaborations. is already proving to be an important structural component. The key to developing a modern. with regard to technical equipment. supercomputing capaci- ties should be set up for special applications and issues. through the centralised provision of bioinformatics knowledge and services. a coordinated. on the other. well-equipped. on the one hand. and made available for broad use in research projects. and efficient bioinformatics infrastructure lies in the establishment of a two-track organisational structure. The tasks of the expertise centres would furthermore include the broad creation of com- petences in the bioinformatics analysis of genomic and postgenomic data. Potential locations for expertise centres would be those that (i) already have an estab- lished reputation in their field. pro- vides for a comprehensive body for networking and coordinating these centres. it must be taken into account that specialised individual solutions are often required in light of the fact that highly specialised analyses are sometimes undertaken. sufficient computing resources and professionals. It should be noted here that the planned bioinformatics infrastructure is not to be reserved strictly for tasks with bio-economic relevance. Rather.Recommendations a) Infrastructure In order to ensure that biological data has the most efficient. such as software and databases. smaller research groups and newcomers to the field can immediately be put into a position to generate new bio- logical knowledge from the data. they strengthen the network between the institutions. and sustainable use for research and commercial application. In addition. which also takes into account the aspect of translating research results from basic research to application. locally where possible. local expertise centres ensure the devel- opment of bioinformatics approaches to solving specific problems. In addition. In particular. and specialised expertise centres and. enhanced. Close collaboration between experi- mental and data-generating structures. it needs to be investigated to what extent cross-networking with other fields of life sciences might be expedient. 10 | 11 . This leads to a long-lasting strengthening of genome research in Germany. joint training units have proved to be a superb means of transferring knowledge between the institutions involved. which. bioinformatics technologies used on a wide basis are to be maintained. In various net- works. without themselves first needing to create their own bioinformatics infrastructure. In addition. (ii) have available. Local expertise centres By bundling know-how and technical facilities. networked bioinformatics in- frastructure should be developed. long-term. in addition to bioinformatics tools. along with development of bioinformatics com- petence. Through long-term support for and networking of local expertise centres. has a number of local. They provide the nec- essary computing capacities.

and to develop concepts for the sustainability of available data resources. The development of standard operating procedures. as well as the search for new enzymes and processes for bio- technological applications. While the main task of the expertise centres is providing specific tools for analysing re- search data from various fields of the biosciences. to develop standards for storing and analysing data. The comprehensive coordinating body The comprehensive body acts as the coordination. uniform interfaces. the training and fostering of young researchers should be expanded. biology and bioinformatics research institutions. and other us- ers and interest groups. As coordinated by the compre- hensive body. . as well as in basic research. contact. Access to reference data sets that have been verified by experts (biocuration) is increasingly proving to be the key technology for high-quality analysis of biological data. By promoting the exchange of information between the various centres. all expertise centres take part in the development of mutual foundations of bioinformatics and moreover serve as a point of contact for specific issues.Recommendations In addition. This will make it easier to create ties between experts in ge- nome research and bioinformatics. the overarching body promotes the de- velopment and use of jointly needed tools and standards. in graduate schools. and conscien- tious data documentation by the excellence centres are to be coordinated by the compre- hensive body. as well as between these and other national and international points of contact from research and industry. a common foundation is created for addressing the variety of bioinformatics issues and tasks. This will speed up comparative analyses and ensure the quality of analyses on a lasting basis. and information interface be- tween expertise centres. By networking local bioinformatics expertise centres. for in- stance. The same applies for the provision of specialised databases and tools for genome and biodiversity research. it is possible to promote in a target- ed manner the development of technology in the various fields of applied biology.

In addition. In establishing the comprehensive body. a funding model has to be found that permits an institution such as this to be promoted in the long term. in order to be able to ensure support for researchers. transfer to a broader-based institution with a permanent staff should be sought. This could be achieved by setting up an oversight group. In this way. a high standard of education in bioinformatics. Moreover. and communication to the public at a high level. Similar to the way this is currently being practiced in the Netherlands (NBIC) and Sweden (BILS). 12 | 13 . intermediary point of contact for bioinformatics issues. with companies. as well as the necessary re- sources for the expertise centres. In the long term. it is first necessary at the organisational level to have a small circle of persons declare themselves willing to take on the responsibil- ity of developing the concept of the comprehensive body and to create the requisite net- working with the research institutions that are potential candidates for expertise centres. it can contribute to strength- ening the transfer of knowledge between biology research and bioinformatics. The ac- tivity of this group should initially focus on the development of a network structure and the coordination of development and standardisation projects. In order to set up a coordinated structure. the comprehensive body will act as the initial. the first step should be to develop a “lean” coordi- nation structure. A single point of contact such as this would also be capable of promoting exchange and collabora- tion with companies and public research institutions. and with international institutions. Initial funding is to be sought from BMBF. in the field of science. which initially establishes the necessary network between the existing centres. close ties between the overarching authority and the bioinformatics centres could be achieved by having the members from the expertise centres form the core of the oversight group. whose members would coordinate the activities of the various centres and oversee further development.

supercomputer centres can be seen as a very good complement to the necessary management and expansion of bioinformatics hardware capacities. but not as a technical solution in and of themselves.Recommendations b) Optimisation of the use of computing capacities The rapidly accelerating generation of data in recent years and the resulting requirements in the area of data analysis. In addition. . In order to be able to use not only local computers and computer clusters but also central supercomputers. this means that all involved working groups need to carry out comprehensive resource planning. Use of external resources – supercomputers In the research environment. in particular. The capacity of local computer clusters can essentially be improved through alternating or simultaneous performance of tasks from other fields that require significant computing power. the typical computing needs of individual working groups are not continuous but rather characterised by peak loads with regard to time. Today. Analogous developments are to be observed with simulation methods for model-based data analysis and experiment planning in statistical genetics and systems biology. the necessity of having to repeatedly transfer large amounts of data to the cloud creates considerable added efforts. The lack of data security is considered to be highly problematic. where collaborations with private companies are involved. Use of external resources – cloud computing In order to be able to manage peak loads that occur irregularly. In ad- dition. large amounts of external com- puting capacity can be leased (cloud computing). since slots have to be accessed dur- ing specific time windows. The use of local resources is especially advantageous where the data is to be pro- cessed interactively. cannot be managed with standalone computers. and the security of confidential data normally can- not be ensured. memory and computing capacities for complex calculations are widely available through time slots on supercomputers. However. this triggers higher costs. the needed software tools must be freely available and the interfaces must be standardised. or only at great effort. it often is advantageous to use the capacities of central computing centres. Use of local resources Local servers or computer clusters make sense for covering basic needs and are very com- mon. supercomputers are only partially suitable for unplanned peak loads. However. However. in comparison to local servers and clusters. From the standpoint of bioinformatics. the continual enhancement of cloud technologies with respect to security and performance has a high priority. Therefore. particularly for sequencing analyses and for statistical com- parisons of genotypes and phenotypes.

Sustainability of data resources The concepts regarding the sustainability of available data resources are of fundamen- tal importance for all downstream analytical and knowledge-generating processes. proteomics. and metabolomics. it is possible to employ “controlled vocabularies” or ontologies yet to be created. transcriptomics. such as for high-throughput sequencing. such as the description of the experiment. 14 | 15 . In addition to data generated through experiments. efforts must be made to secure increased joint public-private funding of collaborative projects. and the analytical methods used. and funding Bringing academic research together with private companies With respect to the future role that bioinformatics will play in the bio-economy. it must also be ensured that the acquired data are available for a wide spectrum of applications and for long periods and are retrievable. however. origin and nature of the biological material used. These range from the problem- atic of publication of research data from public-private partnerships to patent issues. there are still numerous obstacles standing in the way of collabora- tion between academic research and private companies. In order to make such a description uniform and more eas- ily accessible for analytical methods. For instance. Currently.c) D  evelopment of long-term strategies for research. in addition to ensuring researcher access to state-of-the-art equipment and the technologies needed to generate data. action. these include their metadata.

comprehensive expertise in analysing. evolutionary. The participating groups are closely networked with national and international consortiums and initiatives in Europe and the US. Collaboration with companies under public-private partnerships. and molecular biotechnology. wheat. Leibniz computing centre. depicting. quantitative genetics. rye. Comprehensive participation in projects of GABI / Plant Biotech- nology of the Future initiative. medicago. brachypodium). the International Barley Genome Sequencing Consortium (IBSC). Education The two institutions together ensure the education of talented young scientists through the TUM courses of study in bioinformatics. such as the International Wheat Genome Sequencing Consortium (IWGSC). and the DFG special research field “Molecular mecha- nisms regulating yield and yield stability in plants”. Extensive expertise in the fields of statistical genetics. Development of statistical methods for the analysis of quantita- tive characteristics. sunflower). and for the resolution of important yield-determining mechanisms. both institutions take part jointly in seminar series and summer schools. agricultural sciences. and biologically functional issues with regard to model organisms and agricultural crops and farm animals. expanded by ongoing appointment procedures for population genetics (W2) and biostatistics (W3). for the functional analysis of biodiversity. Staff About 45 individuals in the fields of bioinformatics. plant and animal breeding. ongoing appointment procedure for population genetics and biostatistics. Networking Coordination/participation in two long-term research collaborations: AgroClustEr Synbreed. corn. sponsored by BMBF. animal and plant breeding. and molecular biology. and providing genomes of classic plant model organisms (arabidopsis. Software R package synbreed for genomic prediction of complex phenotypes Databases No information Computer infrastructure Powerful computer cluster at HMGU. and the European plant genome infrastructure plat- form transPlant. In order to develop interface competence between undergraduate and graduate students and ensure interdisciplinary networking between various departments. as well as cultivated plants (rice. biology. Analysis of next- generation sequencing data and their correlation with genomic.Attachment Attachment Current bioinformatics facilities – Examples of potential expertise centres a) Plants Munich Bioinformatics Centre Involved institutions Institute of Bioinformatics and Systems Biology at the Helmholtz Centre in Munich (HMGU) Technical University of Munich (TUM) Specialisation Inter alia. tomato. barley. and Munich Center of Advanced Computing .

There are also collaborations with industry (BASF Plant Sciences. have been developed: Alida (Documentation of data analyses). Bayer Crop Sciences. In addition. FBASimVis (flux balance analysis). Education Neben dem seit 1999 existierenden. den Bachelor. databases and data integration of laboratory data management. Standort Köthen) und den Bachelorstu- diengang Informatik (Hochschule Harz. diversity studies. systems biology (modelling of metabolism and flow analyses). KGML-ED (KEGG pathway editor). DNA signals.und Masterstudi- engang Biotechnologie (Hochschule Anhalt. the Netherlands. Iran. Turkey. image analysis (microscopy images. with a total of 29 researchers (7 HH and 22 DM) Software The following tools.DOI). Sweden. and information retrieval.and primary data (ISA-TAB). tools for metabolite identification. Jstacs (li- brary for statistical analyses and sequence classification). Boehringer. high-throughput phenotyping of plants). MetaCrop (information system for metabolism in plants) and MassBank (mass spectrometry refer- ence data). Promotionen in Bioinformatik (MLU Halle) sowie umfangreiche Lehrtätigkeiten für Bioinformatik-Module (Uni Kiel). the UK. and PHHMM (analysis of array-CGH data). CAMERA. Finland. SBGN-ED (Sys- tems Biology Graphical Notation Editor). MotifAdjuster/MiMB/ Dispom (transcription factor binding sites annotation and prediction). KWS. CentiBin/CentiLin (centrality analyses in networks). Japan. with an additional 4 budgeted (HH) and 3 outside-funded (DM) researcher positions IPB: a group with four researchers (group head. molecu- lar phylogeny.und Masterstudiengang Bioinfor- matik (MLU Halle). nun auslaufenden Diplomstudiengang Bio- informatik (MLU Halle) gibt es einen Bachelor. Rdisop. 2 HH and 1 DM position) IPK: working groups (of which 2 are purely DM-funded) in the field of plant bioin- formatics. HIVE (integrative analysis of multimode data). data integration. the proteomics initiative (mzML. MiToBo (Microscope Image Analysis Toolbox). Switzerland. Canada. bioconduc- tor packages (xcms. visualisation and visual data analysis of biological data. RNA- Seq. mzR). applied informatics in the fields of metabolomics and mass spectrometry. 16 | 17 . Staff MLU: 3 professorships. Databases Participation in the development of public databases in the field of plant bioin- formatics (selection): GBIS (Federal central genebank information system). TraML) and the representation of biologi- cal and experimental meta. including cross-domain data analysis. active collaborative work is continuing for standards in systems biology (SBML. Standort Wernigerode). Russia. inter alia. and many other companies involved in plant breeding). and ChIP-Seq). Israel. Spain. development of databases.Plant Bioinformatics at Gatersleben-Halle Involved institutions Institute of Computer Science at the Martin Luther University (MLU) in Halle- Wittenberg Leibniz Institute of Plant Biochemistry (IPB) in Halle Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) in Gatersleben Specialisation Analysis of next-generation sequencing data (assembly. analysis of biological networks. IAP (image analysis of high-throughput phenotyping data). LAILAPS (search engine for information retrieval for user-specific relevance analysis). Vanted (analysis of OMICS data in the network context). SBGN). Networking Numerous national and international collaborations with countries such as Greece. France. Austria. the citability of (primary) research data (DataCite . the US. Australia.

Hohenheim: In Hohenheim. GenomeMapper). molecular modelling (BALL). education in bioinformatics and statistical genomics consists of the BSc and MSc programmes “Agricultural Science”. with emphasis on plant breeding. including: Metagenomics (MEGAN package). short-read assembly (LOCAS). there is one cluster at the University and one at the Max Planck campus. Staff ZBIT: 14 working groups in various areas of bioinformatics Software ZBIT: Numerous software packages for use in green bioinformatics are being devel- oped in Tübingen. SMP comput- er (8x4-core Opteron/256 gigabyte main memory). which is maintained by staff with permanent positions. Max Planck Institute for Intelligent Systems. and NGS transcriptome analysis (rQUANT). . integrated next- generation sequencing analysis (SHORE package). high-performance cluster (1840 cores/~2. There are currently about 220 undergraduate and graduate students and 50 PhD candidates studying at various schools. Additional external computing capacities are used via collaborations. R-hy- pred) an analysis of breeding programmes (PLABSTAT. NGS aligner (QPALMA. cis-elements. proteomics (OpenMS). which. simulation of geno. R-selectiongain. Education ZBIT: In 1998 the University of Tübingen established the first course of study in Germany for bioinformatics. Hohenheim: In Hohenheim. systems biology (BN++ [BioMiner]. hierarchical storage manage- ment (HSM) system (~65 terabyte/9 terabyte online access). beginning in 2012. Galaxy server (gene prediction. the MSc programme “Crop Sciences”. Hohenheim: A number of software packages for statistical genomics were devel- oped in Hohenheim. education in bioinformatics consists of BSc/ MSc/PhD programmes. There is a research collaboration between the Institute of Plant Breeding (Schmid) and the Max Planck Institute in Tübingen (Weigel). will be main- tained by an employee with a permanent position. Friedrich Miescher Laboratory Friedrich-Miescher-Laboratorium Hohenheim: University of Hohenheim (Agricul- tural Sciences faculty). and PhD programmes. University Hospital in Tübingen. Max Planck Institute for Devel- opmental Biology. PALMapper. with emphasis on plant and animal breeding. There are currently about 100 undergraduate and graduate students and 20 PhD candidates in this field. central SAN memory network) Bioinformatics activities in Tübingen and Hohenheim Involved institutions Interdisciplinary Center for Bioinformatics Tübingen (ZBIT): Eberhard Karls Univer- sity in Tübingen.Attachment Computer infrastructure High-performance cluster (90 nodes/200 gigabyte main memory). 3D visualisation station. phylogenies (SplitsTree).and phenotypes (phenosim. and IPB computing cloud (650 CPU cores. Today. the departments within the Institute of Plant Breeding operate a cluster with more than 100 nodes.). R-mvngGrAd) Databases No information Computer infrastructure ZBIT: In Tübingen.17 TB RAM). etc. State Research Centre for Plant Breeding Specialisation ZBIT: Various areas of bioinformatics Hohenheim: Statistical genomics Networking Hohenheim: Various international collaborations in connection with GABI/ PLANT2030 and SYNBREED. Examples include: Genetic mapping (PLABQTL).

TagFinder. Golm.. bioinformatics-oriented employees in the Mathematical Modelling and Systems Biology working group (Prof. Selbig): 10+ employees (1 chair. 2 PhD candidates. Postgres 18 | 19 . Commercial software (CLC. network reconstruction from OMICS data. bioinformatics group. R. Humboldt University in Berlin. public domain soft- ware (R. 4 postdocs. Perl. in the context of breeding. Aberdeen University. Exchange with the region’s bioinformatics groups (Leibniz Institute of Plant Bio- chemistry in Halle. IPK Gatersleben. Windows. Ludwig-Maximilians-Universität München. the IPK in Gatersleben.). Uni Potsdam: U. including: MPIMP bioinformatics: 9 employees (1 group leader. Selbig) Specialisation OMICS data management and analysis: Development of databases to manage OMICS data. g. central bioinfor- matics infrastructure group). the IMB in Aachen.Max Planck Institute of Molecular Plant Physiology in Golm Involved institutions Max Planck Institute of Molecular Plant Physiology (MPIMP. development and provision of methods for functional classification of RNA Networking With working groups of MPIMP and the University of Potsdam that are conducting experiments.. University of Vienna. Matlab. 2 PhD candidates). MATLAB. Java. University of Potsdam: Bioinformatics working group (Prof. the National Institute of Biology in Ljubljana. in particular. RNA: Studies of sequence-structure-function relationships of RNA molecules. GoBioSpace) experiments. Ludwig Maximilians Uni- versity in Munich.Net. as well as bioinformat- ics-oriented employees in numerous working groups. MeV. student employees). Web programming. specifically. Genome-wide association studies: Development of tools for detecting genotype- phenotype associations. C. National Institute of Biology Ljubljana. 1 system administrator. databases: SQL. development and application of statistical methods of OMICS data analysis. Statistica. Specialised software for analysing proteomics (IOMACS) and metabolomics (inter alia. C#. a. 8 Europäische Partner eines EU-MC-ITN Education University of Potsdam: inter alia. marker identification. Systems biology: Analysis of OMICS data against the backdrop of signalling and metabolic pathways. Operating systems: Linux. e. MySQL. MPIMB in Berlin-Dahlem). Huisinga). metabolomics data and next-generation sequencing data. Numerous national and international contacts via projects: MPIMP: inter alia. etc. 2 programmers. chair Prof. non-coding RNA (miRNA). Uni- versity of Erlangen. programming languages/environments: Python. 6 postdocs. 8 European partners of a EU-MC-ITN Staff MPIMP: About 300 employees. Mathematica). several bioinformatics-oriented employees in other working groups Software A variety of self-developed standalone and Web-based software tools for OMICS data analysis (MetaGeneAlyse and pcaMethods for statistical data analysis. University of Potsdam Golm.

TROST (potato water-stress data). genomic models. and proteins). Animal Breeding and Genetics Department Involved institutions Georg August University in Göttingen. association mapping. breeding organisation for cattle. GWDG) in Göttingen Specialisation Scientific computing with diverse data structures. Bioinformatics researchers Staff About 120 employees (departments. Fortran. SAS) Databases Oracle Computer infrastructure Dispersed server systems. ChlExDa (Chlamydomonas experimen- tal data). programmers. Computer infrastructure MPIMP: 12 servers. NGS Small- Reads-DB. system maintenance) Software Proprietary software development (JAVA. FUGATO projects. GABI-PD. processing of data from high-throughput genotyping. state inspection associations. Research Data Processing Company (Gesells- chaft für wissenschaftliche Datenverarbeitung mbh. 48TB central hard-drive memory b) Animals Vit Verden Involved institutions VIT is a service computer centre for organised animal breeding (e. GABI primary database. number of cores: 88 Uni-Potsdam: 10 host computers with 96 computing cores. horses. AraNet (expression correlation networks in model plants). Linux clusters Georg August University in Göttingen. GC/MS data). ChlamyCyc (Chlamydomonas metabolic pathways. sheep. genes. processing of animal breeding- specific data. RLooM (RNA loop structures). database experts. high-through- put phenotyping. Centre for Statistics at the University of Göttingen . Programmers. 10 workstation comput- ers. DFG-GRK 1664 “Scaling Problems in Statistics”. g.Attachment Databases Golm metabolome database (GMD. population genetics Networking Synbreed. and pigs. and next-generation sequencing. 40TB disk space. genotyping laboratories) Specialisation Computer applications and genetically statistical analyses in the area of animal husbandry and animal breeding Networking German university institutes and research facilities in the area of animal breeding Other national and international computing centres and research institutions in the area of animal breeding Education Engineering degrees (specialisation in animal breeding).

and other partners in the field Education R courses for the International Leibniz Graduate School DiVa and FBN PhD candidates Biomarker lab for students in molecular biotechnology and computer science at the University of Rostock Gene set enrichment as part of the course Molecular Bioinformatics II under the master’s courses of study Molecular Life Sciences and Computer Science at the University of Lübeck Linear module and mixed linear models in the master’s course of study Animal Sci- ences at the University of Rostock 20 | 21 . breeding associations. BLAST. R. 2.1 GHz clocking. Chair for Systems Biology and Bioinformatics Institute for Neuro. and physics Staff Department head.and Bioinformatics. processing of raw data (e. SAMtools) Sequencing databases (e. g. Maple) Common statistics software (Statistica. ZPLAN+. 18196 Dummerstorf Specialisation ntegrative bioinformatics with farm animals (cow. SPSS. Biostatistics Group. mathematics. g. University of Lübeck and Institute for Animal Breeding and Husbandry. evaluation of genetic parameters.000GB RAM GWDG systems for data security and server administration Various department-based servers. VCE. a total of more than 5. Phylip. functional characteristics. C++. PHRAP. Aas. Christian Albrechts University in Kiel Institute for Bioengineering and Food Science. etc. AMD Opteron with 24 cores. 128GB RAM Leibniz Institute for Farm Animal Biology Involved institutions Leibniz Institute for Farm Animal Biology (FBN).) Animal breeding-specific software (e. Wilhelm-Stahl-Allee 2. 10 PhD candidates. and behavioural characteristics Statistical genomics. g. PHRED) sequenc- ing comparisons and multi-alignment (e. breeding planning. evaluation of breeding potential. Python. ASReml. 2 program- mers Software Mathematic software (Mathematica. PEST. 2 research associates. Fortran Databases MySQL and Oracle databases at GWDG Computer infrastructure GGWDG: Cluster with several parallel computers (Intel Xeon and AMD Opteron systems) with batch systems. University of Life Sciences. g. e. genetic statistics. performance testing organisations.000 cores and more than 18. graduate researchers in agricultural science. populations genetics in breeding Ontology for behavioural characteristics Networking Phenomics competence network University of Rostock.Education PhD researchers in agricultural science. SAS. g. Norway Vit Verden. 2 postdocs. bwa. EMBL) Programming languages R. …) Software for sequencing data. pig) specifically for performance characteristics.

Breeding value of the above-mentioned breeds in 44 characteristics Database systems: Oracle mySQL . SAS.. ASReml. Vienna State Office for GeoInformation and Land Development in Kornwestheim Bavarian State Board of Trustees for Animal Processing Producers Education No information Staff 6 researchers. respectively) at the University of Rostock (collaboration agreement) Bavarian State Research Center for Agriculture in Grub Involved institutions Bavarian State Research Center for Agriculture in Grub Specialisation Genomic evaluation of the breeding potential of cattle and pigs Genome-wide association studies involving cattle and pigs Networking Technical University of Munich Christian Albrechts University in Kiel University of Hohenheim ZuchtData GmbH. MiX99.Attachment Staff Working group Biomathematics and Bioinformatics with four researchers Working group Animal Breeding and Genetics with four researchers Junior working group Integrative Bioinformatics for Cattle (2 DM positions for five years) Post-doctoral position for ontology development (1 DM position for five years) Software Simulation of genotype distributions and phenotypes typical for farm animals Algorithm development for integrative bioinformatics for farm animals Ontology development with emphasis on animal behaviour Databases Project database for the phenomics competence network Project data bank for integrative bioinformatics for cattle Computer infrastructure FBN currently has five computer servers with a total of 124 nodes.. Beagle. Fleckvieh und Braunvieh in Deutschland. findhap V2. 15. teilweise Illumina 777K Bead-Chip (ca. ca.000 Genotypen Fleckvieh. 2 programmers Software Software used: R. Software developed in-house: under Fortran and Perl Databases Genotypes: Alle Genotypen der Rinderrassen. hauptsächlich Illumina 54K Bead-Chip. 5.000 Schweinen der Rassen Deutsche Landrasse und Deutsches Edelschwein. In addition. it is possible to use external computer capacities (2 clusters with 30 and 10 nodes. DMU. ca. 1.000 Genotypen Braunvieh.500). Illumina 60K Bead-Chip Phenotypes: LPerformance data for the above-mentioned breeds in 44 characteris- tics since 1990. Heritage data for the above-mentioned breeds since 1950. 2 PhD candidates. Genotypen von 2.

Education In the area of teaching. A. and 64GB of RAM under AIX5L c) Microbiology and biotechnology The bioinformatics technology platform at the Center for Biotechnology (CeBiTec) at the University of Bielefeld Involved institutions The “Bioinformatics Resource Facility” (BRF) – a research-oriented service and development institution resulting from the DFG’s bioinformatics initiative (2001) – administers the computers used by CeBiTec units. Select projects: GenoMik. Staff In addition to the director position. which are all funded by the University. through the integration and new develop- ment of database applications for efficient storage of generated primary data. ranges from the analysis of microbial genomes and metagenomes. Currently. GABI-Kat.700 external users are registered. CLIB Graduate Cluster. and it supports various large-scale projects in genome research. NuGGET. Goesmann – Computational Genomics). to animal cell cultures (e. a BRF coordinating committee was formed for planning and shaping the enhancement of the CeBiTec computer infra- structure. Stoye – Genome Informatics. Marine Genomics Europe. in particular. bioinformatics education at the University of Bielefeld consists of bachelor’s and master’s courses of study in “Bioinformatics and Genome Research” and “Natural Sciences Informatics”. with approximately 55% of them coming from Germany. Hofestädt – Bioinformatics & Medical Informatics) and other working groups (T. Networking In addition to the deployment of developed systems in various genome and post- genome projects at CeBiTec. 3 student assistants are continually deployed for routine activities. AnnoBeet. 22 | 23 . SysMap. 1 Linux workstation with Oracle-DB. Specialisation The spectrum of research projects.Computational Metagenomics & Single Cell Genom- ics. numerous external partners also make use of the Biele- feld infrastructure under national and international collaborations. and plants (wall cress. R. R. 1 IBM 550Q with 2 Power5 Quadcore Proz. J. 19 or 64 GB of RAM per computer. g. GenoMik-Transfer. Chinese hamster ovary cells). the BRF has 6 permanent employee positions for system administration. Giegerich – Practical Informatics. Sczyrba . Dual-Xeon 4. a total of approximately 500 internal and more than 2. working closely with the system administrators. sugar beets. Connection to 4TB hard-drive server. GK Bioinformatik. GenoMik-Plus. Bioinformatics education is under the auspices of 4 chairs (E.and 6-core 16. most of which are focused on biotechnology. PathoGenoMik. algae. Nattkemper – Biodata Mining & Neuroinformatics. Computer infrastructure Windows workstation computers (standard 4 GB of RAM)5 Linux workstations under Debian. yeasts). grape- vine. Baake – Biomathematics & Theoretical Bioinformatics. with this being funded from CeBiTec’s budgeted resources. to the process- ing of fungi (in particular. Grain Legumes. For the purposes of joint representation of the interests of the various CeBiTec working groups. as well as through the implementation of software for analysing larger amounts of genome-based data. rapeseed). as well as the master’s course of study “Genome-Based Systems Biology”. In addition. A.

RAMEDIS. South America.D.. QAlign. REGANOR. For GPU-based approaches. CPA. BIOIMAX.Attachment Software In addition to administration and enhancement of the technical infrastructure. Databases DAWIS-M. the Bielefeld Bioinformatics Server (BiBiServ) provides further bioinformatics applications for anonymous users. EDGAR SARUMAN.. a Convey HC-1ex system with a full complement of RAM is also available. more than 350 bioinformatics workstations are provided today with minimal administrative effort. This equipment corresponds to an investment of more than EUR 6 million over the past 10 years from DFG. etc. which have been furnished with up to 96 CPU cores and a maximum of 1024GB of RAM. PathAligner. BACCardI. For special bioinformatics applications with. BRF deploys three tape systems with a possi- ble final capacity of 16PB. In particular. STCDB. PathFinder. BioDWH. GISMO. BMBF. r2cat. RNAshapes. Through the use of energy- saving and inexpensive terminals. g. Another important component of the Bielefeld bioinformatics infrastructure is the virtual work environment based on Sun Ray thin clients. For instance. The emphasis is on DNA sequencing analysis and genome annotation (SAMS. etc. RNAcast. it offers an additional possibility for providing external users with newly developed tools on an established platform.024 CPU cores). Computer infrastructure CeBiTec’s hardware park today comprises a computing output of approximately 25 TeraFLOPS (796 CPUs. This has enabled BRF to operate virtually without interruption and in an extremely stable manner for 12 years. Genalyzer. and EU projects. for diverse DNA sequencing analyses. 4. This is used. e. such as read mapping with the software SARUMAN developed in Bielefeld. the suitability of this type of virtual workstation for worldwide use was proved through successful operation of the terminal in WAN in. inter alia. These systems are connected with one another through an integration layer called “BRIDGE”. each with two NVIDIA Tesla M2070-GPU cards. RNAhybrid. 4 servers with a total of 12 TimeLogic DeCypher-FPGA cards are deployed to accelerate BLAST analyses. Genlight. A computer cluster is available for processing primary data and for additional high-throughput analysis. In addition to the above-described Web-based programme packages and the virtual work environment. Conveyor). For the independent development of FPGA-based algorithms. i. Coryne-RegNet. high-throughput analysis in the area of transcriptomics (EMMA). VANESA. GenDB. inter alia.4PB. With its Web services-based pro- gramming interfaces. For the purposes of long-term archiv- ing of raw data and daily data security. Also worthy of mention here are special computers that have been tailored specially to the specific needs of bioinformat- ics. and metabolomics (MeltDB). proteomics (QuPE). CardioVINEdb. including the reconstruction of meta- bolic pathways (CARMEN). PASSTA. and Asia. the BRF works in the field of applied bioinformatics by actively developing software solutions for high-throughput analyses in the field of genome and postgenome research. BIIGLE. were procured. Gecko. three IBM iDataPlex servers. Europe. Myco-Reg- Net. as well as from special grants and the University’s own budgeted funds. as well as general data management and visualisation (ProMeTra). . e. such as the annotation of genomes or metagenome analyses. very high memory needs. TACOA. Other software tools used at the Institute for Bioinformatics at CeBiTec: CARMA. various application servers are available. an online storage capacity of 433TB and a gross back-up capacity of approximately 1.

International Tomato Annotation Group. University of Toronto: BAR Viewer. bioinformatics is now slated to be included in biology. B-IT Center. Max Planck Institutes in Golm and Tübingen. International Medicago Genome Annotation Group.molevol. Wageningen University. University of Düsseldorf: Metagenomics.de/~bioinf/). phenotype and mapping database. An additional professorship for statistical genetics is planned. Perth Plant Energy Biology Center SUBA. International Arabidopsis Infor- matics Consortium. Wiechert). International Plant Phenotyping Network. Bonn. INRA France. inter alia. Léon. University of Bielefeld.d) Plants and microbiology Bioinformatics in the ABCD / J Region Involved institutions Technical University (RWTH) of Aachen University of Bonn University of Düsseldorf Jülich Research Centre Max Planck Institute for Plant Breeding Research in Cologne Specialisation RWTH Aachen: RWTH Aachen has bolstered itself in the field of white systems biology (Prof. and the medical department has called for the formation of a bioinformatics group. University of Bonn: MSc Life Science Informatics. metabolic networks Networking Existing collaborations with. Prof. Tomato/potato trait. bioinformatics modules in the biotechnology course of study. IPK. Fraunhofer Institute (Aachen Fraunhofer FIT: Life Science Informatics. University of Düsseldorf: An extensive bioinformatics curriculum is available for biologists (http://www. Jülich Research Centre: European Plant Phenotyping Network (EPPN). bioinformatics in MSc Crop Science. 24 | 25 . Ewert heads the modelling work for the Agricultural Faculty at the Univer- sity of Bonn (especially yield modelling). Max Planck Institute for Plant Breeding Research in Cologne: Expertise in the field of plant breeding and genetics. iPlant. University of Bonn: With the Institute for Crop Sciences and Protection of Resources and Prof. PlantsDB. Blank). at IBG1 (Prof. In addition. the University of Bonn has expertise in the field of breeding poten- tial. there is great expertise in white biotechnology and also in modelling of networks and biotechnological processes. Bonn: Fraun- hofer SCAI-Bioinformatics) Education RWTH Aachen: To date. MIPS Munich.

At IBG1: a modelling department (biochemical networks and biotechnology pro- cesses) At IBG2: a modelling group (structure-function models root and sprout) Associated with BioSC: The Max Planck Institute for Plant Breeding Research in Cologne with three groups with direct access to green bioinformatics (Dr. Koornneef).DB (correlation databases) University of Köln: Aramemnon database (group of Prof. function predictions http://afawe. coupling of phenotype-genotype databases Computer infrastructure Jülich Research Centre: Supercomputing center. AHRD. 13CFLUX (substance flow analysis). capacities at the Jülich Plant Phenotyping Center (JPPC) are to be expanded further. permanent University of Düsseldorf: Two bioinformatics chairs within the Computer Sciences course of study Jülich Research Centre: For the planned successor to the Gabi primary database: follow-up funding of 2 FTE through the research centre in connection with the appointment commitment to Prof. phenotype databases. Software RWTH Aachen: Mercator (MapMan annotation). Flügge) University of Bonn: AFAWE. In connection with the DPPN (German Plant Phenotyping Network): access to the IT structure at IBG2 (Prof. CSB. Jimenez-Gomez (adaptive genomics and genetics). Stich (quantitative crop genetics) as well as with Prof. Dr. R packages for ChIP-chip/ ChIP-Seq (ChipR). Usadel at FZJ will likewise comprise additional staff from basic funding. and Dr. The working group of Prof.de Jülich Research Centre: Gabi primary database successor.mpg.mpipz. Usadel. Corto University of Bonn: Function annotation: PhyloFun. Robin (microarray analysis). system administrator. aggregators/workflow tools for Web services University of Düsseldorf: PhlyoPythiaS Jülich Research Centre: OMIX (network editor). a position as research associate/professor is to be filled University of Bonn: Junior researcher group leader as research associate. CADET (chromatography) Databases RWTH Aachen: MapMan (functional classes). Schurr). as well as clusters at IBG1 and IBG2 . R-Robin (RNA seq analyses). PageMan (visualisation of OMICS data). MapMan. Schnee- berger (NGS mapping).Attachment Staff RWTH Aachen: Beginning in mid-2012.

AWI / MARUM: Long-term archiving and publication of biological environment data. as well as at the ecosystem level. OAI-PMH. In addition. EU projects EuroFleets (Towards an alliance of European research fleets). Using various metadata standards and protocols (OGC-CS. as central network architect.04. AWI: Current bioinformatics applications at AWI include the modelling of ecologi- cal niches for diatoms (project of the Hustedt diatom collection). Azadinium spinosum). on the one hand. and ontologies. In addition to sequencing analysis and classification (binning). the Max Planck Institute for Marine Microbiology/Jacobs University is involved in a number of current research projects: BMBF project MIMAS (Microbial Interactions in MArine Systems). terrigenous carbon in the sea).e) Environmental microbiology and biodiversity research Bremen infrastructure for environmental microbiology and biodiversity research Involved institutions Max Planck Institute for Marine Microbiology in Bremen Jacobs University in Bremen Center for Marine Environmental Sciences (MARUM) at the University of Bremen Alfred Wegener Institute for Polar and Marine Research (AWI) in Bremerhaven Specialisation Max Planck Institute for Marine Microbiology/Jacobs University: The bioinformat- ics focus of the Max Planck Institute for Marine Microbiology/Jacobs University is microbial diversity and genome research. with PANGAEA® acting. the Max Planck Institute for Marine Microbiology/Jacobs University plays an active role in the development of metadata standards. to the ELIXIR project (European Life Sciences Infrastructure for Biological Information). in particular. the Max Planck Institute for Marine Microbiology/ Jacobs University is involved in various national and European infrastructure pro- jects: DFG project CIBAS (Center for integrative Biodiversity Analysis and Synthesis). Biotechnology).009). In addition. It also coordinates the EU project Micro B3 (Biodiversity. as well as molecular characterisation of ecosystems in sea ice (MacSeaIce project) and biotechnology applications of cold-adapted in-situ oil-degrading marine bacteria. Networking Max Planck Institute for Marine Microbiology/Jacobs University: With its bio- informatics expertise. in order to improve the exchange of data and the interoperability of data and databases. PANGAEA has played an active role in recent years in developing geodata infrastructures and relevant standards. close contacts have been created to the European Bioinformatics Institute (EBI) and. exchange formats. Bioinformat- ics. as “data and metadata dis- tributor” and. ABCD). PANGAEA played a critical role in the development of the citability of data and the creation of DataCite. and the breakdown of toxic pathways in dinoflagellates (shellfish poisoning.biocon. EMBRC (European Marine Biological Resource Centres). Data are typically made available via central portal services. the recording of time variations of microplankton algae communities in the face of global changes. SAW-Leibniz project ATKIM (degradability of arctic. portal operator. 26 | 27 .1016/j. and BioVeL (Biodiversity Virtual e-Laboratory). various portals and search engines are provided with content from PANGAEA®.2010. on the other. and MIRRI (Microbial Resource Re- search Infrastructure). Since 2009 services have been provided that enable dynamic cross-referencing of data and article. DiGIR.org/10. g. the emphasis is particularly on the development and operation of reference databases (SILVA project) and the integration of diversity and function data with environmental parameters (Megx project). transcriptomics studies of adaption/acclima- tion in higher organisms. In addition. including from Science Direct (e. EuroMarine (Integration of European Marine Research Networks of Excellence). EU projects MAMBA (Marine Metagenomics for New Biotechnological Applications).doi. Also deserving of mention: Participation in genome sequencing projects for key organisms.: http://dx. In addition. and broker between various e-infrastructures. g. e.

Dublin Core. EMODNET Bio and Tara- Oceans. PANGAEA employs the data warehouse software from Sybase (IQ). The ARB and SILVA database project was established more than 20 years ago in order to meet this challenge. CoralFish (CP). In order to be able to analyse this flood of data. micro-satellite marker design (STAMP) Databases Max Planck Institute for Marine Microbiology/Jacobs University: SILVA: The Euro- pean database for ribosomal RNA sequences (www. Education Bachelor’s course of study in applied computational mathematics with specialisa- tion in bioinformatics at Jacobs University. ribosomal RNA has become the gold standard. as well as through programme research (e. Crispus.de/projects). harmful algal blooms. ESONET/EMSO. through research collaborations (e. on a national level. 1 group leader AWI / MARUM: 5 postdocs. 1 team assistant. Collaboration by students on projects in connection with internships. and student assistants. genome sequencing consortiums (Micromonas. and others) AWI: In recent years. standardisation (MetaBar. P. roughly 2. it was actively involved in more than 140 national. classification. HYPOX (CP). AWI: Comparative genomics (Phylogena). S. g. CDinFusion). DIF.). INTERDYNAMIK and SOPRAN – for a complete list. C3-GRID). HYPOX. EUR-OCEANS and EUROMARIN. it can support any number of metadata standards (ISO19xxx. g. Glaciecola). latissima. ARB and SILVA are internationally recognised tools for processing. comparative metagenomics (MGMCMC). online tutorials. PANGAEA maintains broad collaborations with scientific publishers (Elsevier. and in the past 15 years. sea ice meta-transcriptome). cylindrus. data integration (Megx. curating. Hyas. Springer. as well as. European. TaxSOM.7 million sequences as of January 2012). pseudonana. and international projects (currently IODP (NSF). Darwin Core etc. Ch.de) Microbial biodiversity research is principally based on the analysis of marker genes. EPOCA. and because of its modularity. Th. doubling every 12 – 18 months (currently. specialised reference databases and software tools are of critical importance. and for years the num- ber of publicly available rDNA sequences has been growing exponentially. Staff Max Planck Institute for Marine Microbiology/Jacobs University: 9 postdocs. Wiley. Periodic bioinformatics workshops and on-site training of users. E. 1 group leader AWI Computing Centre/Bio/Bioinformatics: 4 Postdocs Software Diversity and phylogeny (ARB/SILVA).arb-silva. 9 PhD candidates. 2 master’s students. coastline research. which is primarily used as a preliminary step in compiling data products. and transcriptome sequencing projects (Krill. and a master’s course of study in marine microbiology at the International Max Planck Research School. BIOACID. siliculosus. annotation (JCoast). E. 2 technicians. . see www. In addition. and analysing rDNA sequences in biodiversity research and for industrial quality control and medical diagnostics.net) AWI / MARUM: In recent years. TaxoM- eter). In addition. EPOCA (CP). IODP. F. guided research modules.pangaea. The software is used for various projects (inter alia. huxleyi. binning (TETRA. brachycara. In this regard. CARBOCHANGE. microalgae communities (Pyloassigner). 2 technicians. inter alia. 3 data managers. ESONET (NoE). primarily next-generation sequencing and microarrays). EUROBASIN (IP). Bioinformatics course and internship at the University of Bremen. ecological chemistry) and junior researcher groups (PLANKTOSENS). AGU. the PANGAEA® group has developed open-source software (Schindler & Diepenbroek 2008) for building portals and connecting data providers. AWI has developed and expanded an extensive research and applications profile in modern OMICS methods (today. Marine Genomics Europe Network of Excellence). EMSO (CP).Attachment Networking AWI / MARUM: PANGAEA is an accredited world data centre in both the ICSU World Data System (WDS) and the WMO Information System (WIS).

de) The broad spectrum of WDC-MARE databases. such as EMBL-EBI/ENA and PANGAEA. Archivspeicher für Sequen- zanalyse. 1 SMP nodes 16-core Opteron. metagenomic annotations. The focus is on geo-referenceable data from the fields of oceanography. genome annotation. receiving the majority of its funding from project data management and the development of geodata infrastructures.net): Megx. genome assembly. 28 | 29 . In addition. paleoceanography. marine geology. The GSC also oversees the development of ontologies.and environmental sciences. The Max Planck Institute for Marine Microbiology/Jacobs University is in charge of the GSC and administers the central databases for the standards and specifications issued to date and a GSC reference implementation in XML.5 billion data points on approximately 40. the Consortium was recently able to publish the MIMARKS (Minimum Information about a MArker gene Sequence) standard and the MIxS (Minimum Information about any (x) Sequence) specifications. 160-core Intel E7-883. The operational platform is the information system PANGAEA. which are distributed across the entire gamut of geo-.net was developed in 2005 as the first integrated database in the field of environmental microbiology. The close networking of Megx. metagenomic Markov chain Monte Carlo Bayesian statistics. 24-core dual-Opteron cluster. in combination with intuitive visualisation of results. 32GB RAM. the Genomic Standards Consortium (GSC). ocean modelling. ocean/ sea ice/paleoclimate models. This resulted first in the MIGS (Minimum Information about a Genome Sequence) and the MIMS (Minimum Information about a Metagenome Sequence) standards for genome and metagenome information. AWI: PLANKTONNET biodiversity platform. bio. and it permits concentrated access to microbial genome information and biodiversity in the context of the environment. permanent überwacht). Annotation.3 TFlop/s.000 different parameters from all of the world’s seas and continents. In this regard. assembly/ mapping 454-ILLUMINA genomics/transcriptomics. The system currently contains approximately 450. phylogenetic placement of 454 sequencing data. large-scale niche modelling. provides users with a dynamic look at biodiversity and function in the context of the environment. and transcriptomics. global environmental parameters are generated on the fly from oceanographic data sources. took it upon itself to draft guidelines for a compact yet representative number of desirable additional data for sequencing information. aid in the research of global environmental changes. Sun (Oracle) Secure Global Desktop (Web-based) and Sun-Ray thin clients for distributed work on virtual workstations. data assimilation. 56TB GFS file system. Phylogenie.net (www. through its Environment Ontology. After several more years of development. 2. 1 SGI UV100 20-blade. AWI: 12-node vector computer NEC SX8R. Datenbanken und Services.gensc. 2PB archive storage SL8500 (LTO/3). g.56TB RAM. 3. high-throughput phylogenetic placement of 454 sequencing data. Hustedt Diatom Research Centre (collec- tion data) Standardisation and MPI-Bremen / Jacobs University: Genomic Standards Consortium (www. phylogeny. transcrip- tomics annotation and mapping. 96TB file system with InfiniteStorage. AWI / MARUM: PANGAEA® – Publisher for Earth & Environmental Science (ICSU World Data Center) (www. for habitat classification.000 data sets with more than 6.pangaea. Use of services: Internationally via Web pages and Web services.megx. Collaboration with companies via Bremen-based Ribocon GmbH (spun off in 2005 by the Max Planck Institute). e.net with public sequencing and environmental data repositories.org) ontologies Founded in 2005 in Oxford. Webserver. netapp scalable storage systems. Galaxy workflows are used at AWI. and marine biology. Computer infrastructure Max Planck Institute for Marine Microbiology: 500 Cores als Cluster mit 60Tb Stor- age (ausfallsicher. composed of international researchers. Megx.

Cloud computing : Cloud computing describes the approach of making abstracted IT in- frastructures (e. species. network capacities. Cropsense : Network for complex sensor technology for crop research. and data management. a pan-European initiative to develop a permanent European bioinformatics infrastructure. Biocuration : Comprises the translation and integration of biological data in a database. ELIXIR : European Life Sciences Infrastructure for Biological Information. g. Genomics : Field of research that looks at organisms at the level of their genome data. Genome : The entirety of an organism’s genetic information. which is supported by the Swedish Research Council. Biodiversity : Concept that describes the diversity of life on the three levels of ecosystems. and genes. or even finalised software) available via a network in a manner that is dynamically adapted to needs. enabling the data to be linked with scientific literature and other data sets.Attachment Glossary BILS : Bioinformatics Infrastructure for Life Sciences. GenoMik : Research and sponsorship initiative “Genome Research on Microorganisms – GenoMik” launched in 2001 by the German Federal Ministry of Education and Research (BMBF) in order to create the structural and substantive conditions for the use of the po- tential of microorganisms by way of global. Computer cluster : A number of networked computers. Biocatalysts : Biocatalysts are polymer biomolecules that accelerate biochemical reactions in organisms by lowering or (less frequently) raising the activation energy in reactions. . High-throughput precision phenotyping : Automated methods by which a large number of phenotypings are performed with high throughput. computing capacity. The objective of “clustering” is usu- ally to increase computing capacity or availability as compared with individual computers. FUGATO : Research programme on functional genome analysis in animal organisms spon- sored by the German Federal Ministry of Education and Research (BMBF). A fourth level is considered to be the diversity of interrelationships within and between the other three levels. which is termed functional biodiversity. Data integration : Bringing together of data from a variety of different sources. genome-based research approaches. data storage. GABI/Plant Biotechnology of the Future : Research programme in the field of future-ori- ented plant biotechnology sponsored by the German Federal Ministry of Education and Research (BMBF) and private companies. Decentralised national research in- frastructure for bioinformatics in Sweden. breeding.

Next-generation sequencing : New methods for DNA sequencing. Postgenomic data : Biological data that analyse cellular activities in their entirety and thus go beyond the purely genetic level of data collection. and support. Metabolome : The entirety of an organism’s metabolites. Diagnosis. Proteome : The entirety of all proteins expressed in an organism at a certain time. which make increased throughput possible. teaching. OMICS technologies : All-encompassing description of technologies used to analyse the entirety of an organism’s particular system level. 30 | 31 . They are used to exchange “knowledge” in digital and formal form between application software and services. Phenotyping : Quantitative analysis of key functions and structures of organisms and bio- logical systems and the underlying physiological. and genetic mechanisms. Metabolomics : Field of research that looks at organisms at the level of their metabolites. Ontology : Formally structured. Metadata : Data that contain information about other data. linguistic depictions of a set of terms and the relationships between them in a given subject matter. PathoGenoMik : Guideline of the German Federal Ministry of Education and Research for the funding of research projects within the ERA-NET PathoGenoMics “Transnational Pathogenomics: Prevention.Knowledge bases : Special databases for knowledge management. a Dutch bioinformatics network with expertise in the areas of research. or all metabolites (metabolomics). all genes (genomics). all proteins (proteomics). all transcripts (transcriptomics). Primary data : Sequencing data of DNA. and Monitoring of Human Infectious Diseases” as part of the framework programme “Biotechnology – Using and Shaping Op- portunities”. RNA. Phenomics : Competence network for agricultural and nutrition research sponsored by the German Federal Ministry of Education and Research. Model-based data analysis : Statistical data analysis using models that are tailored to the respective problem and that seek to identify possible mechanisms of the underlying pro- cesses. It represents a systems-biolog- ical approach to the genotype-phenotype depiction of the farm animals cattle and pigs. g. and protein molecules. Proteomics : Field of research that looks at organisms at the level of their proteins. e. molecular. Treatment. NBIC : Netherlands Bioinformatics Centre.

together with the in- volvement of university. SIB : Swiss Institute of Bioinformatics. nano-biotechnology. Synthetic Biology : Field bordering on molecular biology. institutional and industrial collaboration partners. . Systems biology : Bioscience whose objective is understanding the complex and dynamic biological processes of cells and organisms in their entirety. Group of researchers from plant and animal breeding. Supercomputers are often employed for computer simulations in the area of high-performance calculations. engineering. with the objective of constructing bio- logical systems and microorganisms with the aid of standardised building blocks. molecular biology. a federation of bioinformatics research groups of leading Swiss universities and the Swiss Federal Institute of Technology. Synbreed : Competence network sponsored by the German Federal Ministry of Education and Research for establishing an interdisciplinary centre for genome-based breeding re- search involving crops and farm animals. bioinformatics. which can access shared periphery equipment and a partially shared main memory. Transcriptome : The entirety of all transcripts expressed in an organism at a certain time. a French bioinformatics network structure. and human medicine. Supercomputer : The fastest computer of its time.Attachment ReNaBi : Réseau National des plates-formes Bioinformatiques. and information technology. organic chemistry. Standard operating procedures : Procedures describing what happens during a process. Transcriptomics : Field of research that looks at organisms at the level of their transcripts. A typical feature of a modern super- computer is its large number of processors.

Dr. Klaus Mayer Munich Information Centre for Protein Sequences (MIPS) / Helmholtz Centre in Munich Prof. Dr. Ralf Zimmer Ludwig Maximilians University in Munich 32 | 33 . Dr. Alexander Goesmann CeBiTec / University of Bielefeld Dr. Dr.Members of the “Bioinformatics Workshop” Steering Committee Prof. Eric von Lieres Jülich Research Centre Dr. Chris-Carolin Schön Technical University of Munich Prof. Dr. Thomas Hartsch GeneData AG Dr. Wolfgang Wiechert Jülich Research Centre Prof. Frank Oliver Glöckner Max Planck Institute for Marine Microbiology/Jacobs University in Bremen Dr. Alfred Pühler (Chairman) CeBiTec / University of Bielefeld Prof. Dr. Norbert Reinsch Leibniz Institute for Farm Animal Biology (FBN) in Dummerstorf Prof.

Professor of Soil Protection and Recultivation at the Dr. Mettenleiter Prof. h. Thomas Hirth Director of the Water Research Institute at the Head of the Fraunhofer Institute for Interfacial University of Alberta in Edmonton. Middle East and Africa. Dr. Folkhard Isermeyer Scientific Executive Director of the German President of the Johann Heinrich von Thünen In- Research Centre for Geosciences at the Helmholz stitute. Dr. Hüttl (chairman) Prof. Dr. Dr. Dr. Dr. and Fisheries in Braunschweig Academy of Science and Engineering (acatech). Dr.c. Federal Research Institute for Rural Areas. Dr. Dr. Manfred Schwerin Chairman of the Executive Board of the Jülich Professor of Animal Breeding at the University Research Centre of Rostock.c. Andreas J. Thomas C.c. member of the Senate of the National Academy of Science and Engineering (acatech) Prof.c. Büchting the German Biotechnology Industry Association (deputy chairman) (Deutsche Industrievereinigung Biotechnologie) Chairman of the Supervisory Board of (DIB) KWS SAAT AG Prof. (deputy chairman) Federal Research Institute for Animal Health. Dr. Switzerland Prof. Helmut Born Secretary General of the Deutscher Bauern­ Prof. Hannelore Daniel Technical University of Munich. Stefan Marcinowski Brandenburg University of Technology in Cottbus Member of the Board of Executive Directors of BASF SE. on Professor of Molecular Biology. Centre in Potsdam. Dr. One Equity Partners Europe. Achim Bachem Prof. Dr. Alfred Pühler (ZEF) at the University of Bonn Centre for Biotechnology / University of Bielefeld Prof. Christian Patermann Advisor on knowledge-based bio-economics to Prof. Max Planck Riems Island Institute of Molecular Plant Physiology and the University of Potsdam Dr. Reinhard F. (German Farmers Association) R&D Director for Europe. . Alexander Zehnder (permanent guest) Prof. Dr. h. Chairman of the Leibniz Institute for Farm Animal Biology (FBN) in Dummerstorf Dr. Horgen. Dr. Canada Engineering and Biotechnology and the Institute for Interfacial Engineering at the University of Stuttgart. Wiltrud Treffenfeldt verband e. Dow Europe. Utz-Hellmuth Felcht Dr. Prof. Chairman of the Management Board of Dr. Dr. mult. h. Dr.Members of the Bio-economy Research and Technology Council Prof. Dr.c. Dr. Joachim von Braun the State of North Rhine-Westphalia (deputy chairman) Director at the Center for Development Research Prof. Holger Zinke Managing Director. h. V. Bernd Müller-Röber President of the Friedrich Loeffler Institute. Chairman of BRAIN AG Munich. Fritz Vahrenholt Chair for Nutrition Physiology CEO of RWE Innogy GmbH Prof. h. President of the National Forestry. Dr.

Berlin Printed by Brandenburgische Universitätsdruckerei ISSN 1869-1404. The BioEconomy- Council is solely responsible for the content of the recommendations. Detailed bibliographic data can be found at http://dnb. Claus Gerhard Bannick ( Head ) Dr.d-nb. ISBN 978-3-942044-66-0.de. Berlin ( 2012 ) Registered address Charlottenstraße 35 – 36 10117 Berlin Design and layout by Oswald + Martin Werbeagentur. (print edition). Katja Leicht ( academic research assistant ) Petra Ortiz Arrebato ( assistant ) Ulrike von Schlippenbach ( academic research assistant ) Dr. as well as the National Academy of Science and Engineering (acatech) for administrative support. Martin Schmidt (student research assistants) PUBLICATION DETAILS Publisher Published by the BioEconomy Research and Technology Council ( BÖR ) © BÖR. ISBN 978-3-942044-67-7 (online version) The German National Library lists this publication in the National Bibliography. 34 | 35 . Special thanks are owed to the outside experts who provided valuable information for this paper. Andrea George ( academic research assistant ) Dr. The BioEconomyCouncil ’s work is supported by an administrative office : Dr. Eva Wendt ( academic research assistant ) Julian Braun. Elke Witt ( academic research assistant ) Dr.The BioEconomyCouncil would like to thank the German Federal Ministry of Education and Research for its fund- ing.

de .biooekonomierat.de Internet: www.: 030 767718911 Fax: 030 767718912 E-Mail: info@biooekonomierat.Publisher Forschungs. Berlin ( 2012 ) Contact Geschäftsstelle des BioÖkonomieRats Charlottenstraße 35 – 36 10117 Berlin Tel.und Technologierat Bioökonomie ( BÖR ) © BÖR.