You are on page 1of 5

------------------------ Welcome to GeoKettle -----------------------Release 2.

5
Open source spatial ETL tool for corporate data integration.
What is GeoKettle?
-----------------GeoKettle is a powerful, metadata-driven Spatial ETL tool dedicated to
the integration of different spatial (or not) data sources for building
and updating geospatial files, databases, data warehouses and web services.
GeoKettle enables the Extraction of data from data sources, the
Transformation of data in order to correct errors, make some data cleansing,
change the data structure, make them compliant to defined standards, and
the Loading of transformed data into a target DataBase Management System
(DBMS) in OLTP or OLAP/SOLAP mode, GIS file or Geospatial Web Service.
GeoKettle is a spatially-enabled version of the generic ETL tool Kettle
(Pentaho Data Integration). GeoKettle also benefits from Geospatial
capabilities from mature, robust and well know Open Source libraries like
JTS, GeoTools, deegree, OGR and, via a plugin, Sextante.
GeoKettle is particularly useful when a user wants to automate complex and
repetitive data processing without producing any specific code, to make
conversions between various data formats, to migrate data from one DBMS to
another, to perform some data feeding tasks into various DBMS, to populate
analytical data warehouses for decision support purposes, etc.
This special distribution of Kettle includes extensions which enable the
use of geospatial (GIS) data. Like Kettle, GeoKettle is released under
the GNU Lesser General Public License (LGPL) license.
GeoKettle is a past realization of the GeoSOA research group (headed by
Prof. Thierry Badard, http://geosoa.scg.ulaval.ca) at the Department of
Geomatics Sciences of Laval University, Quebec City, Quebec, Canada. It is
now developed and professionaly supported by Spatialytics
(http://www.spatialytics.com), a company specialized in GeoBI (Geospatial
Business Intelligence) software development.
About GeoKettle versions numbering:
----------------------------------Since the last 3.2.0 version (aka. 3.2.0-20090609), it has been decided to
change the versions numbering of GeoKettle.
3.2.0 was a reference to the version of Kettle on which GeoKettle was based.
Current versions are important milestones for the project as they provide an
important amount of new features, better performance and robustness. The
previous numbering system did not allow to translate this matter of fact. That
is why it has been decided to change the numbering of the versions and to name
the new version as 2.x. It emphasizes more the important work performed to
provide these new versions.
However, it is important to note that versions 2.x will be the last versions of
GeoKettle based on the Kettle 3.2 code base. Thanks to the tremendous work of

the Kettle developers, future version of GeoKettle will be more pluggable with
Kettle and will not be anymore a friendly and spatially enabled fork of Kettle.
Hence, it will be possible to add spatial extensions provided by GeoKettle to
any Kettle/PDI 5.x installation.
What's new?
----------Since release 2.0:
Please see details at:
http://docs.spatialytics.com/doku.php?id=en:spatialytics_etl:000_version_histo
ry#what_s_new_in_versions_30_of_spatialytics_etl_and_25_of_geokettle
Since release 3.2.0-20090609:
Please see details at:
http://wiki.spatialytics.org/doku.php?id=projects:geokettle:documentation:what
_is_new_in_version_2.0
Since release 3.1.0-20081103:
- The GeoKettle extensions were ported to the new Pentaho Data
Integration (PDI) version 3.2.0-stable. As such, this release of GeoKettle
includes all the improvements from the new PDI version.
- Added a "GIS File Output" step. At present, this step supports the
writing of Shapefiles.
- Added support for Spatial Reference Systems (SRS). SRS metadata was
added to ValueMeta for Geometry fields. Steps allowing to set a
SRS ("Set SRS") and transform coordinates (reproject) of geometries
from one SRS to another ("SRS Transformation") have also been
developed. The SRS support is based on GeoTools' implementation of
coordinate reference systems (org.opengis.referencing package).
- In line with SRS support, SRS metadata for PostGIS and Oracle Spatial
DBMS is retrieved and written when reading/writing geometry columns.
To conform to integrity constraints when writing to a geometry column,
one must ensure that the SRS in GeoKettle matches the one defined for
the geometry column in the DBMS.
- Reading and writing SRS metadata (in the form of .PRJ files containing
WKT definitions of SRS) is also supported when reading/writing
Shapefiles in the GIS File Input/Output steps.
- Updated GeoTools libraries to version 2.5.5 and JTS to version 1.10.
Since release 2.5.2-20080531:
- The GeoKettle extensions were ported to the new Pentaho Data
Integration (PDI) version 3.1.0-GA. As such, this release of GeoKettle
includes all the improvements from the new PDI version.
- Changed the core Geometry object framework from GeOxygene to the JTS
Topology Suite (JTS).
- Added native support for Oracle Spatial and MySQL geospatial DBMS.

- Speed improvements: due to the upgrade of PDI core and to the JTS
library, GeoKettle now offers better throughput. We measured a typical
speedup of row throughput between 15% and 60% (depending of the
transformation) when using geospatial data.
What is geospatial data?
-----------------------Geospatial data is used to locate geographic features on a map. It is
used mainly in Geographic Information Systems (GIS) to create maps and
perform spatial analysis. Geospatial data can be classified in two main
categories: raster data, which is composed of bitmap images covering an
area on the surface of the earth (e.g. satellite or aerial imagery) and
vector data, in which individual features are represented by
vector-based geometric primitives (such as points, lines and polygons).
For example, a road can be represented as a series of line segments
(what is often called a "LineString") on a map.
You may have to deal with geospatial data if your organization uses GIS
software (e.g. ESRI ArcGIS or MapInfo) or has to handle spatial data in
one or another GIS file format (e.g. Shapefile), XML encoding (e.g. GML,
KML) or spatial DBMS (e.g. PostGIS, Oracle Spatial). In an ETL
perspective, you may want to automate the transformation and loading of
geospatial data from heterogeneous sources to a database. And
increasingly, business intelligence applications rely on geospatial
data, to enhance the user experience (e.g. map displays, end-user
software such as Spatial OLAP) and provide location-aware analysis
functionalities. This exposes a need for Spatially-enabled ETL tools,
supporting the extraction of geospatial data from various sources,
transformation of this data (including spatial analysis functions
handling the geometry of geographic features) and loading to a
spatially-enabled data warehouse.
GeoKettle aims to fulfill these requirements. It offers the full range
of functionality of Pentaho Data Integration (Kettle), and extends it
with a new "Geometry" data type for geospatial vector data. It also
features input/output support for GIS file formats, spatial DBMS and
OGC compliant web services such as SOS, CSW. It also provides
spatial analysis functions (e.g. topological predicates), scripting
support (with JavaScript) for Geometry objects and advanced geoprocessing
capabilities.
Using GeoKettle
--------------GeoKettle can be used the exact same way as Pentaho Data Integration.
Please refer to the PDI user documentation included in this
distribution but also to the wiki dedicated to the GeoKettle project
(http://wiki.spatialytics.org/doku.php?id=projects:geokettle).
Demo transformations showing the use of the geospatial extensions are
included in this distribution, in the samples/transformations/geokettle
directory.
If you face a bug or want to see a new feature added to GeoKettle, please
do not hesitate to post a ticket on the bug/issue tracking system available
at http://trac.spatialytics.com/geokettle.

Upcoming features
----------------A roadmap is available at http://trac.spatialytics.com/geokettle.
License and copyright
--------------------Like Pentaho Data Integration, GeoKettle is distributed under the GNU
Lesser General Public License (LGPL). Included libraries (GeoTools, JTS,
PostGIS driver wrapper) are also LGPL (or a compatible license). Some
other libraries (JDBC drivers, Oracle SDOAPI) are closed source but
included in binary form according to their respective end-user licenses.
Please refer to the included LICENSE.txt file for details.
The GeoKettle extensions are Copyright (C) 2009- Spatialytics,
(C) 2007-2009, GeoSOA research group, Department of geomatics sciences,
Laval University, Quebec, Canada.
Pentaho Data Integration (Kettle) is Copyright (C) 2007-2008, Pentaho
Corporation.
Contact and mailing lists
------------------------For future releases and more information, visit us at
http://www.geokettle.org.
All comments or questions about GeoKettle are welcome! A forum is available
at http://www.spatialytics.com/forum. Three sections are dedicated to
GeoKettle:
- users-spatial_etl, for problems, questions and comments about the
usage of GeoKettle.
- dev-spatial_etl, for problems, questions and comments relative
to development tasks with GeoKettle and for feature request.
- international-francais, for French users that are not confortable with
English language. They can ask for help in French in this section.
To subscribe or to sign off the forum, please visit:
http://www.spatialytics.com/forum
How to get involved?
-------------------There is a lot of work to do on a project like GeoKettle and your help
will be greatly appreciated. So we would gladly welcome any contribution
to further development, implementation and feedback on usage of GeoKettle.
Nevertheless, it is often hard for new developers or users to work out
where they can help. To begin with, we suggest you to subscribe to the
GeoKettle forums (http://www.spatialytics.com/forum). Listen-in for a while,
to hear how others make contributions.
You can get your local working copy of the latest code by checking out the

GeoKettle's svn repository. Review the todo list, choose a task or perhaps
you have noticed something that needs to be corrected. Make the changes, do
the testing, generate a patch, and post to the GeoKettle developers forum.
Document writers and translators are usually the most wanted people so if
you like to help but you're not familiar with the innermost technical details,
don't worry: we have work for you! ;-)
Contributors to GeoKettle must sign a Contributor License Agreement
(http://dev.spatialytics.com/cla/contributor_license_agreement.pdf).
Acknowledgments
--------------We would like to recognize the past contributions to GeoKettle from the
following organizations and people:
The NSERC Industrial Research Chair in Geospatial Databases for Decision
Support (held by Prof. Yvan Bedard,
http://mdspatialdb.chair.scg.ulaval.ca), for partial financial support
to the research project in which the development of GeoKettle started.
Professor Stefan Keller of
Switzerland, for involving
students, Pascal Hobus and
GeoKettle as part of their

HSR Hochschule fur Technik Rapperswil,


and co-supervizing two computer science
Sven Goldinger, in the development of
bachelors degree final thesis.

You might also like