Professional Documents
Culture Documents
Textbook Data Analytics and Management in Data Intensive Domains Leonid Kalinichenko Ebook All Chapter PDF
Textbook Data Analytics and Management in Data Intensive Domains Leonid Kalinichenko Ebook All Chapter PDF
https://textbookfull.com/product/data-analytics-for-engineering-
and-construction-project-risk-management-ivan-damnjanovic/
https://textbookfull.com/product/data-analytics-in-digital-
humanities-hai-jew/
https://textbookfull.com/product/big-data-analytics-for-
intelligent-healthcare-management-1st-edition-nilanjan-dey/
https://textbookfull.com/product/guide-to-data-analytics-aicpa/
Data Management, Analytics and Innovation: Proceedings
of ICDMAI 2020, Volume 1 Neha Sharma
https://textbookfull.com/product/data-management-analytics-and-
innovation-proceedings-of-icdmai-2020-volume-1-neha-sharma/
https://textbookfull.com/product/big-data-analytics-in-
cybersecurity-first-edition-deng/
https://textbookfull.com/product/data-management-analytics-and-
innovation-proceedings-of-icdmai-2018-volume-2-valentina-emilia-
balas/
https://textbookfull.com/product/understanding-azure-data-
factory-operationalizing-big-data-and-advanced-analytics-
solutions-sudhir-rawat/
https://textbookfull.com/product/data-pipelines-pocket-reference-
moving-and-processing-data-for-analytics-1st-edition-james-
densmore/
Leonid Kalinichenko · Yannis Manolopoulos
Oleg Malkov · Nikolay Skvortsov
Sergey Stupnikov · Vladimir Sukhomlin (Eds.)
Data Analytics
and Management
in Data Intensive Domains
XIX International Conference, DAMDID/RCDL 2017
Moscow, Russia, October 10–13, 2017
Revised Selected Papers
123
Communications
in Computer and Information Science 822
Commenced Publication in 2007
Founding and Former Series Editors:
Phoebe Chen, Alfredo Cuzzocrea, Xiaoyong Du, Orhun Kara, Ting Liu,
Dominik Ślęzak, and Xiaokang Yang
Editorial Board
Simone Diniz Junqueira Barbosa
Pontifical Catholic University of Rio de Janeiro (PUC-Rio),
Rio de Janeiro, Brazil
Joaquim Filipe
Polytechnic Institute of Setúbal, Setúbal, Portugal
Igor Kotenko
St. Petersburg Institute for Informatics and Automation of the Russian
Academy of Sciences, St. Petersburg, Russia
Krishna M. Sivalingam
Indian Institute of Technology Madras, Chennai, India
Takashi Washio
Osaka University, Osaka, Japan
Junsong Yuan
University at Buffalo, The State University of New York, Buffalo, USA
Lizhu Zhou
Tsinghua University, Beijing, China
More information about this series at http://www.springer.com/series/7899
Leonid Kalinichenko Yannis Manolopoulos
•
Data Analytics
and Management
in Data Intensive Domains
XIX International Conference, DAMDID/RCDL 2017
Moscow, Russia, October 10–13, 2017
Revised Selected Papers
123
Editors
Leonid Kalinichenko Nikolay Skvortsov
Federal Research Center Federal Research Center
“Computer Science and Control” “Computer Science and Control”
Russian Academy of Sciences Russian Academy of Sciences
Moscow Moscow
Russia Russia
Yannis Manolopoulos Sergey Stupnikov
Open University of Cyprus Federal Research Center
Latsia “Computer Science and Control”
Cyprus Russian Academy of Sciences
Moscow
Oleg Malkov Russia
Institute of Astronomy
Russian Academy of Sciences Vladimir Sukhomlin
Moscow Moscow State University
Russia Moscow
Russia
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This CCIS volume published by Springer contains the proceedings of the XIX Inter-
national Conference Data Analytics and Management in Data-Intensive Domains
(DAMDID/RCDL 2017) that took place during October 9–13 in the Lomonosov
Moscow State University at the Department of Computational Mathematics and
Cybernetics. The DAMDID series of conferences was planned as a multidisciplinary
forum of researchers and practitioners from various domains of science and research,
promoting cooperation and exchange of ideas in the area of data analysis and man-
agement in domains driven by data-intensive research. Approaches to data analysis and
management being developed in specific data-intensive domains (DID) of X infor-
matics (such as X = astro, bio, chemo, geo, med, neuro, physics, chemistry, material
science etc.), social sciences, as well as in various branches of informatics, industry,
new technologies, finance, and business contribute to the conference content.
Traditionally DAMDID/RCDL proceedings are published locally before the con-
ference as a collection of full texts of all regular and short papers accepted by the
Program Committee as well as, abstracts of posters and demos. Soon after the con-
ference, the texts of regular papers presented at the conference are submitted for online
publishing in a volume of the European repository of the CEUR Workshop Proceed-
ings, as well as for indexing the volume content in DBLP and Scopus. Since 2016, a
DAMDID/RCDL volume of post-conference proceedings with up to one third of the
submitted papers that were previously published in CEUR Workshop Proceedings have
been published by Springer in their Communications in Computer and Information
Science (CCIS) series. Each paper selected for the CCIS post-conference volume
should be modified as follows: the title of each paper should be a new one; the paper
should be significantly extended (with at least 30% new content); the paper should refer
to its original version in the CEUR Workshop Proceedings. CCIS is abstracted/indexed
in DBLP, Google Scholar, EI-Compendex, Mathematical Reviews, SCImago, and
Scopus.
The program of DAMDID/RCDL 2017, as with the previous editions of these
conferences, alongside the traditional data management topics reflects a rapid move
into the direction of data science and data-intensive analytics. The program this year
included carefully selected invited keynote talks related to rapidly developed DID. The
respective plenary sessions were also aimed at attracting the attention of researchers in
the selected DID. A preconference plenary session on October 9 included two talks: the
keynote talk by Stefano Ceri, Professor of Database Systems at Dipartimento di
Elettronica, Informazione e Bioingegneria (DEIB) of Politecnico di Milano, and the
invited talk by Zoltan Szallasi, MD, senior research scientist, the Children’s Hospital
Informatics Program, Harvard Medical School. The session was devoted to the
development of methods and techniques for genomes and diagnostics in various
application domains (from health care to criminalistics). Stefano Ceri considered the
implementation issues of the new-generation DNA sequencing techniques in the
VI Preface
European project GeCo applying big data technologies; in the talk by Zoltan Szallasi,
an overview of approaches to the genomic-based diagnostics in various application
domains was given. In more detail, in the tutorial given by Zoltan Szallasi on October
10 the application of genomic diagnostics in cancer immunotherapy was presented. The
problems of data deluge in astronomy and approaches to their solution were considered
in the keynote talk by Giuseppe Longo (Professor of Astrophysics at the University of
Naples Federico II). On the basis of their talks, Zoltan Szallasi, Stefano Ceri with
co-authors, and Giuseppe Longo with co-authors provided invited full papers for this
CCIS volume.
The conference Program Committee reviewed 75 submissions for the conference
and eight submissions for the PhD workshop. For the workshop, five papers were
accepted and three were rejected. For the conference, 47 submissions were accepted as
full papers, 12 as short papers, two as posters, and two as demos, whereas 12 sub-
missions were rejected. According to the conference program, these 59 oral presen-
tations (of the full and short papers) are structured into 19 sessions including: Data
Analysis Projects in Astronomy; Semantic Web Techniques in DID; Special
Purpose DID Infrastructures (two sessions); Distributed Computing; System Efficiency
Evaluation; Data Analysis Projects in Neuroscience; Specific Data Analysis Tech-
niques; Ontological Models and Applications (two sessions); Heterogeneous Database
Integration; Text Analysis in Humanities (two sessions); Data Analysis Projects in
Various DID; Organization of Experiments in Data-Intensive Research; Digital Library
Projects; Knowledge Representation and Discovery; Approaches for Problem Solving
in DID; and Applications of Machine Learning. Although most of the presentations are
dedicated to the results of research conducted in organizations in the territory
of the Russian Federation including Kazan, Moscow, Novosibirsk, Obninsk, Omsk,
Orel, Pereslavl-Zalessky, Saint Petersburg, Tomsk, Yaroslavl, Zvenigorod, the
DAMDID/RCDL 2017 conference also had international features. This move is
witnessed by 12 talks (four of them are invited) prepared by the notable foreign
researchers from such countries as Armenia (Yerevan), Bahrain (Manama), Belarus
(Minsk), Bulgaria (Sofia), Germany (Dusseldorf, Kiel), UK (Harvel), Greece
(Thessaloniki), Italy (Milan, Naples), and the USA (Harvard).
For the proceedings 19 papers were selected by the Program Committee (16 peer
reviewed and three invited papers) and after careful editing they are included in this
volume structured into seven sections comprising Data Analytics: two papers;
Next-Generation Genomic Sequencing (Challenges and Solutions): two papers; Novel
Approaches to Analyzing and Classifying of Various Astronomical Entities and
Events: six papers; Ontology Population in Data-Intensive Domains: three papers;
Heterogeneous Data Integration Issues: four papers; Data Curation and Data Prove-
nance Support: one paper; Temporal Summaries Generation: one paper. Of these, eight
papers (more than one third of the total number of the papers selected) were prepared
by foreign researchers (from Bulgaria, Germany, Greece, Italy, UK, USA).
DAMDID/RCDL 2017 would not have been possible without the support of the
Russian Foundation for Basic Research, the Federal Agency of Scientific Organizations
of the Russian Federation and the Federal Research Center Computer Science and
Control of the Russian Academy of Sciences. Finally, we thank Springer for publishing
this proceedings volume, containing the invited and selected research papers, in their
Preface VII
CCIS series. The Program Committee of the conference appreciates the possibility to
use the Conference Management Toolkit (CMT) sponsored by Microsoft Research,
which provided great support during various phases of the paper submission and
reviewing process.
General Chair
Igor Sokolov Federal Research Center Computer Science and Control
of RAS, Russia
Organizing Committee
Elena Zubareva Lomonosov Moscow State University, Russia
Dmitry Briukhov Federal Research Center Computer Science and Control
of RAS, Russia
Nikolay Skvortsov Federal Research Center Computer Science and Control
of RAS, Russia
Dmitry Kovalev Federal Research Center Computer Science and Control
of RAS, Russia
Evgeny Morkovin Lomonosov Moscow State University, Russia
Irina Karzalova Federal Research Center Computer Science and Control
of RAS, Russia
Yulia Trusova Federal Research Center Computer Science and Control
of RAS, Russia
Evgeniy Ilyushin Lomonosov Moscow State University, Russia
Dmitry Gouriev Lomonosov Moscow State University, Russia
Vladimir Romanov Lomonosov Moscow State University, Russia
X Organization
Supporters
Coordinating Committee
Igor Sokolov (Co-chair) Federal Research Center Computer Science and Control
of RAS, Russia
Nikolay Kolchanov Institute of Cytology and Genetics, SB RAS,
(Co-chair) Novosibirsk, Russia
Leonid Kalinichenko Federal Research Center Computer Science and Control
(Deputy Chair) of RAS, Russia
Arkady Avramenko Pushchino Radio Astronomy Observatory, RAS, Russia
Pavel Braslavsky Ural Federal University, SKB Kontur, Russia
Vasily Bunakov Science and Technology Facilities Council, Harwell,
Oxfordshire, UK
Alexander Elizarov Kazan (Volga Region) Federal University, Russia
Alexander Fazliev Institute of Atmospheric Optics, RAS, Siberian Branch,
Russia
Alexei Klimentov Brookhaven National Laboratory, USA
Mikhail Kogalovsky Market Economy Institute, RAS, Russia
Vladimir Korenkov JINR, Dubna, Russia
Mikhail Kuzminski Institute of Organic Chemistry, RAS, Russia
Sergey Kuznetsov Institute for System Programming, RAS, Russia
Vladimir Litvine Evogh Inc., California, USA
Archil Maysuradze Moscow State University, Russia
Oleg Malkov Institute of Astronomy, RAS, Russia
Alexander Marchuk Institute of Informatics Systems, RAS, Siberian Branch,
Russia
Igor Nekrestjanov Verizon Corporation, USA
Boris Novikov St. Petersburg State University, Russia
Nikolay Podkolodny ICaG, SB RAS, Novosibirsk, Russia
Aleksey Pozanenko Space Research Institute, RAS, Russia
Vladimir Serebryakov Computing Center of RAS, Russia
Yury Smetanin Russian Foundation for Basic Research, Moscow
Vladimir Smirnov Yaroslavl State University, Russia
Sergey Stupnikov Federal Research Center Computer Science and Control
of RAS, Russia
Konstantin Vorontsov Moscow State University, Russia
Viacheslav Wolfengagen National Research Nuclear University MEPhI, Russia
Organization XI
Program Committee
Karl Aberer EPFL, Lausanne, Switzerland
Plamen Angelov Lancaster University, UK
Alexander Afanasyev Institute for Information Transmission Problems, RAS,
Russia
Arkady Avramenko Pushchino Observatory, Russia
Ladjel Bellatreche LIAS/ISAE-ENSMA, Poitiers, France
Pavel Braslavski Ural Federal University, Yekaterinburg, Russia
Vasily Bunakov Science and Technology Facilities Council, Harwell, UK
Evgeny Burnaev Skoltech, Russia
George Chernishev St. Petersburg State University, Russia
Yuri Demchenko University of Amsterdam, The Netherlands
Boris Dobrov Research Computing Center of MSU, Russia
Alexander Elizarov Kazan Federal University, Russia
Alexander Fazliev Institute of Atmospheric Optics, SB RAS, Russia
Sergey Gerasimov Lomonosov Moscow State University, Russia
Vladimir Golenkov Belarusian State University of Informatics and
Radioelectronics, Belarus
Vladimir Golovko Brest State Technical University, Belarus
Olga Gorchinskaya FORS, Moscow, Russia
Evgeny Gordov Institute of Monitoring of Climatic and Ecological
Systems SB RAS, Russia
Valeriya Gribova Institute of Automation and Control Processes FEBRAS,
Far Eastern Federal University, Russia
Maxim Gubin Google Inc., USA
Natalia Guliakina Belarusian State University of Informatics and
Radioelectronics, Belarus
Ralf Hofestadt University of Bielefeld, Germany
Leonid Kalinichenko FRC CSC RAS, Moscow, Russia
George Karypis University of Minnesota, Minneapolis, USA
Nadezhda Kiselyova IMET RAS, Russia
Alexei Klimentov Brookhaven National Laboratory, USA
Mikhail Kogalovsky Market Economy Institute, RAS, Russia
Vladimir Korenkov Joint Institute for Nuclear Research, Dubna, Russia
Sergey Kuznetsov Institute for System Programming, RAS, Russia
Sergei O. Kuznetsov National Research University Higher School
of Economics, Russia
Dmitry Lande Institute for Information Recording, NASU, Russia
Giuseppe Longo University of Naples Federico II, Italy
Natalia Loukachevitch Moscow State University, Russia
Ivan Lukovic University of Novi Sad, Serbia
Oleg Malkov Institute of Astronomy, RAS, Russia
XII Organization
Data Analytics
1 Introduction
Data mining and analysis is nowadays well-understood from the algorithms side. There
are thousands of algorithms that have been proposed. The number of success stories is
overwhelming and has caused the big data hype. At the same time, brute-force
application of algorithms is still the standard. Nowadays data analysis and data mining
algorithms are taken for granted. They transform data sets and hypotheses into con-
clusions. For instance, cluster algorithms check on given data sets and for a clustering
requirements portfolio whether this portfolio can be supported and provide as a set of
clusters in the positive case as an output. The Hopkins index is one of the criteria that
allow to judge whether clusters exist within a data set or not. A systematic approach to
data mining has already been proposed in [3, 16]. It is based on mathematics and
mathematical statistics and thus able to handle errors, biases and configuration of data
mining as well. Our experience in large data mining projects in archaeology, ecology,
climate research, medical research etc. has however shown that the just described
situation of ad-hoc and brute-force mining is their main approach. The results are taken
for granted and believed despite the modelling, understanding, flow of work and data
handling pitfalls. So, the results often become dubious due to these misconceptions and
pitfalls.
Data are the main source for information in data mining and analysis. Their quality
properties have been neglected for a long time. At the same time, modern data man-
agement allows to handle these problems. In [15] we compare the critical findings or
pitfalls of [20] with resolution techniques that can be applied to overcome the crucial
pitfalls of data mining in environmental sciences reported there. The algorithms
themselves are another source of pitfalls that are typically used for the solution of data
mining and analysis tasks. It is neglected that an algorithm also has an application area,
application restrictions, data requirements, results at certain granularity and precision.
These problems must be systematically tackled if we want to rely on the results of
mining and analysis. Otherwise analysis may become misleading, biased, or not pos-
sible. Therefore, we explicitly treat properties of mining and analysis. A similar
observation can be made for data handling.
Data mining is typically not only based on one model but rather on a model ensemble
or model suite. The association among models in a model suite must be explicitly
specified. These associations provide an explicit form via model suites. Reasoning
techniques combine methods from logics (deductive, inductive, abductive, counter-
inductive, etc.), from artificial intelligence (hypothetic, qualitative, concept-based,
adductive, etc.), computational methods (algorithmics [6], topology, geometry, reduc-
tion, etc.), and cognition (problem representation and solving, causal reasoning, etc.).
In this paper, we use a model-based approach towards data mining and data analysis.
The models function as mediators in investigation, reasoning, communication and
understanding processes. In this context, each model consists of a ‘normal model’ and a
‘deep model’. Normal models represent the parts of the models, which their users are
aware of, while deep models compound of the users implicit influences, assumptions,
prejudices and expert knowledge given by their culture, (academic) domain or personal
beliefs. Normally only normal models are in focus of presentations and discussions.
They are therefore mostly well-understood and elaborated. Deep models in contrast, are
addressed rarely. They form an implicit (typically discipline specific) consensus for the
ways of handling and interpreting the normal models. In effect, it can be hard for
researchers to interpret normal models of foreign domains in the way the models creators
expected. In addition, certain reasons for (design) decisions might be unreproducible for
people outside the domain. We therefore propose an approach for data mining and
analysis, which explicitly includes the handling of deep models in its design.
First, we introduce our notion of the model. Next we show how data mining can be
designed. We apply this investigation to systematic modelling and later to systematic
data mining. It is our goal to develop a holistic and systematic framework for data
mining and analysis. Many issues are left out of the scope of this paper such as a
literature review, a formal introduction of the approach, and a detailed discussion of
data mining application cases.
Remark. A previous version of this paper has been presented on the 19th ‘Data
Analytics and Management in Data Intensive Domains’ conference (DAMDID) in
2017. Under the title ‘Data Mining Design and Systematic Modelling’ that version has
also already been published in the corresponding CEUR workshop proceedings [34].
In contrast to the previous version, this version has been revised. Section 5 was
added to strengthen the practical relevance of the approach and other parts were
shortened to keep the paper in a comfortable length for reading. In result the focus of
this version lies a bit more on deep models.
Deep Model Guided Data Analysis 5
1
The quality can for example be characterised through quality characteristics [27] such as correctness,
generality, usefulness, comprehensibility, parsimony, robustness, novelty etc.
6 Y. O. Kropp and B. Thalheim
Problem solving and modelling considers, however, typically six aspects [15]:
1. Application, problems, and users: The domain consists of a model of the applica-
tion, a specification of problems under consideration, of tasks that are issued, and of
profiles of users.
2. Context: The context of a problem is anything what could support the problem
solution, e.g. the sciences’ background, theories, knowledge, foundations, and
concepts to be used for problem specification, problem background, and solutions.
3. Technology: Technology is the enabler and defines the methodology. It provides
[22] means for the flow of problem solving steps, the flow of activities, the dis-
tribution, the collaboration, and the exchange.
4. Techniques and methods: Techniques and methods can be given as algorithms.
Specific algorithms are data improvers and cleaners, data aggregators, data inte-
grators, controllers, checkers, acceptance determiners, and termination algorithms.
5. Data: Data have their own structuring, their quality and their life span. They are
typically enhanced by metadata. Data management is a central element of most
problem solving processes.
6. Solutions: The solutions to problem solving can be formally given, illustrated by
visual means, and presented by models. Models are typically only normal models.
The deep model and the matrix is already provided by the context and accepted by
the community of practice in dependence of the needs of this community for the
given application scenario. Therefore, models may be the final result of a data
mining and analysis process beside other means.
Comparing these six spaces with the six parameters we discover that only four
spaces are considered so far in data mining. We miss the user and application space as
well as the representation space.
Our approach presented so far allows to revise and to reformulate the model-oriented
data mining process on the basis of well-defined engineering [14, 24] or alternatively
on systematic mathematical problem solving [21]. Figure 2 displays this revision. We
realize that the first two phases are typically implicitly assumed and not considered. We
concentrate on the non-iterative form. Iterative processes can be handled in a similar
form.
classifiers and rules, dependences among parameters and data subsets, predictor
analysis, synergetics, blind or informed or heuristic investigation of the search space,
and pattern learning.
2
https://www.sfb1266.uni-kiel.de/en.
3
“Scales of Transformation - Human Environmental Interaction in Prehistoric and Archaic
Societies”.
14 Y. O. Kropp and B. Thalheim
In the CRC 1266 we are currently developing and deploying a method to support
the approach of this paper by implementing such a storage possibility. The next
paragraph will shortly describe its main idea, based on so called ‘viewpoints’.
4
the phases before the actual data mining process.
5
The structure of that database is not topic of this paper, but roughly bases on the schema presented in
[33].
6
https://www.djangoproject.com/.
7
short: aDNA.
Deep Model Guided Data Analysis 15
DNA sequence data, but these are hard to interpret by non-physicians. In a view-
point for archaeologists, this raw sequence data might be needless and therefore not
integrated. The other way round, in the research of aDNA experts might the exact
stratigraphy of excavation sites be needless too. Another example could be scien-
tists who investigate distribution patterns of pottery. They do not need all available
information of the found pottery (like weight, dimensions, description of ornaments
etc.), but just the coordinates of its excavation and its classification.
This description of the view structures eventually also includes viewpoint
specific labels for types, properties etc. Such a specification might be necessary as
in different academic disciplines and domains the same labels for different
objects/subjects or different labels for same objects/subjects can be in use.
4. the selection (of instances) - Another quite important information concerns the data
in the views. In contrast to the previous point, this one deals with instances. Even if
a type is adequate for a viewpoint, because it fits to the research focus, this does not
necessarily imply that also all data tuples of this type are adequate. Further filtering
of the available data might be necessary. Such requirements are typically based on
thresholds on properties/attributes or metadata. Common examples from archaeol-
ogy originate from restrictions of the research focus. Referring back to the previous
example about the scientists who investigate distribution patterns of pottery, these
researchers might for a specific context just be interested in pottery from a specific
region or (ancient) time.
5. the adaptation/adjustment (of values) - Finally there can be need for (minor)
transformations concerning the instance values. This can be necessary if the values
from different sources are in different forms and shall be mapped to the standard
form of the viewpoints domain. Such mapping or transformation rules can for
instance be used to change the unit of a property, by adapting its values (e.g. the
unit of an attribute is transformed from centimeter to meter by multiplying its values
with 0.01). Another example could be the transfer of coordinates to a different
coordinate reference system.
To create such specifications for viewpoints it is useful to use the systematic approach
presented in Sect. 4 for collecting the required information. This standardizes their
construction and eases the process.
6 Conclusion
The literature on data mining is fairly rich. Mining tools have already gained the
maturity for supporting any kind of data analysis if the data mining problem is well
understood, the intentions for models are properly understood, and if the problem is
professionally set up. Data mining aims at development of model suites that allows to
derive and to draw dependable and thus justifiable conclusions on the given data set.
Data mining is a process that can be based on a framework for systematic modelling
that is driven by a deep model and a matrix. Textbooks on data mining typically
explore algorithms as blind search. Data mining is a specific form of modeling.
Therefore, we can combine modeling with data mining in a more sophisticated form.
Another random document with
no related content on Scribd:
1.E.5. Do not copy, display, perform, distribute or redistribute
this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must, at
no additional cost, fee or expense to the user, provide a copy, a
means of exporting a copy, or a means of obtaining a copy upon
request, of the work in its original “Plain Vanilla ASCII” or other
form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.