DATA MODELS IN GIS

Mahesh K. Jat
Department of civil Engineering Malaviya National Institute of Technology Jaipur

DATA MODELS IN GIS
OUTLINE:  Overview of models  Data and levels of measurements  Raster and vector models  Conversion between models  Databases

GIS Analysis
 Much of GIS analysis and description

consists of investigating the properties of geographic features and determining the relationships between them.

Geographic information  Characteristics of Geographic Information Location! Volume Dimensionality  Point  Line  Area Continuity  Feature  field .

 Areas are made up of lines which are made up of points represented by their coordinates.Building complex features  Simple geographic features can be used to build more complex ones.  Areas = {Lines} = {Points} .

DIGITAL INFORMATION  GIS requires that both data and maps be represented as numbers physical data structure (i.  GIS places data into the computer’s memory in a  files can be written in binary or as ASCII text. ASCII can be read by humans and edited but uses more space.e.  binary is faster to read and smaller.  sent through a “pipe” consisting of 0s and 1s  stored on devices that can store only 0s and 1s  processed as 0s and 1s . files and directories).

DATA   locational and attribute data in a GIS attribute type: discrete vs continuous  discrete: presumed to occur at distinct locations with empty locations having a value of zero for the attribute in question continuous: feature occurs throughout geographical region. no locations are empty  .

Properties of Features  size  distribution  pattern  contiguity  neighborhood  shape  scale  orientation. .

Basic properties of geographic features .

interval and ratio each subsequent level includes all characteristics of preceding levels data available at higher levels can be reduced to lower levels. ordinal.DATA Levels of Measurement:  four levels are commonly recognized – nominal. opposite is not true   .

LEVEL OF MEASUREMENTS Nominal Scale  objects are classed into groups. religion. land use/cover  discrete variable .e. groups possess arbitrary labels (numbers/names) i.

M=10  discrete variables . M=1 equivalent to K=500. highway  can identify larger/smaller but can not comment on degree between variables  K=5. L=3.e.LEVEL OF MEASUREMENTS Ordinal Scale  categorization plus an ordering/ranking of data i. street. L=300. country road.

LEVEL OF MEASUREMENTS Interval Scale  measurements arranged in rank and distance between measurements is known  no “true” zero point i. elevation/topographic lines. temperature in oC  discrete or continuous .e.

but there is also a known.e. speed  continuous and discrete . temperature on Kelvin scale.LEVEL OF MEASUREMENTS Ratio Scale  like interval scaling: both rank and separation are known. fixed starting point i.

DATA MODELS – REPRESENTING DATA 1. Logical Data Model – logical organization of the database elements 4. Conceptual Data Model – describes and defines included entities (how they will be represented) 3. Reality – total phenomena as they actually exist 2. Physical Data Model or File Structure – how information will be structured for access .

   raster – based on pixels vector – based on points. only one is used for the internal organization of spatial data. lines and polygons  while most GIS systems can handle raster and vector. . GISs have traditionally used either raster or vector for maps.DATA MODELS  logical data model is how data are organized for use by the GIS.

DATA MODELS  rasters and vectors can be flat files … if they are simple Raster-based line Vector-based line Flat File 0000000000000000 0001100000100000 1010100001010000 1100100001010000 0000100010001000 0000100010000100 0001000100000010 0010000100000001 0111001000000001 0000111000000000 0000000000000000 Flat File 4753456 4753436 4753462 4753432 4753405 4753401 4753462 4753398 623412 623424 623478 623482 623429 623508 623555 623634 .

RASTER DATA MODELS   basic unit is cells or pixels which are uniformly spaced each cell/pixel has spatial and spectral information.e. given as the cell size in ground units.  higher resolution.”  cell has a resolution. even if it is “missing. i. smaller cell dimensions . digital elevation data and digital images  spatially exhaustive sampling of the area of interest  every cell has a value.

RASTER DATA MODELS Grid extent Resolution Columns Rows Grid cell Generic structure for a grid. .

RASTER DATA MODELS .

RASTER DATA MODELS Fining of Resolution .

RASTER DATA MODELS .

Sources of Raster Data  Satellite data LANDSAT SPOT IRS  Scanned aerial photography  Digital Orthophotography  Scanned maps and documents .

From where do we get Raster Data?  SCANNED Aerial photographs photographs are NOT raster images but SCANNED images ARE  SCANNED maps  Satellite images .

.

.

etc. even if null or zero (integers. data base. ratios.CREATING RASTER DATA MODELS  creating raster is like laying a grid over a map   code each cell with a value representing attribute every cell has a value.)  values for each cell are written into a file   spreadsheet. word processor imported into GIS so it can be reformatted  each pixel presumably has one value – in reality is this correct? mixed pixel issue .

” in this case.RASTER AND MISSING DATA GIS data layer as a grid with a large section of “missing data. . the zeros in the ocean off of New York and New Jersey.

MIXED PIXEL ISSUE Water dominates W W W W W W G G G Winner takes all W G W W W G G G G Edges separate W E W E E E G G G .

MIXED PIXEL ISSUE “Largest share” “Presence/Absence” 35% Water Land “Central point” 70% 80% 100% “Percent occurrence” .

.

CREATING RASTER DATA MODELS  raster data visualized as map layers  map layer: data describing a single characteristic for a location multiple items of information require multiple layers   creates problems – raster databases can become enormous  each map layer has thousands of cells .

RASTER DATA MODELS Advantages    simple data structures each cell can be owned by only one feature. because cells have the same size and shape technology is cheap   . overlay and combination of maps and remote sensed images easy simulation easy.

RASTER DATA MODELS Advantages  some spatial analysis methods simple to perform    local: cell by cell calculations focal: models cell value based on neighbours zonal: models cell value based on geographical areas global: models cell value based on all cells  .

RASTER DATA MODELS Disadvantages    volumes of graphic data use of large cells to reduce data volumes poor at representing points. lines and areas. good at surfaces must often include redundant or missing data network linkages are difficult to establish projection transformations are time consuming    .

COMPRESSION TECHNIQUES  raster compression techniques used in GIS are runlength encoding and quad trees Run-length Encoding – more efficient    values often occur in runs across several cells form of spatial autocorrelation e. array 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 would be entered as 3 0 2 1 2 0 3 1 2 0 3 1 .g.

Grassland   56 entries for 7x8 array. Mixed Conifer B. Oak Savannah D. or 22 pairs (44 entries) for 7x8 array .RUN-LENGTH CODING Row-by-row coding: CCCCCBBDCCCCBBDCCCBBBDDCBBA ADDDDBAADDBBBAADDDAAAADDDA AAA Run-length coding: 5C 2B 1D 4C 2B 1D 3C 3B 2D 1C 2B 2A 4D 1B 2A 2D 3B 2A 3D 4A 3D 4A A. Douglas Fir C.

COMPRESSION TECHNIQUES Quadtree Compression   hierarchical data model using a variable-sized grid cell finer subdivisions are used in areas requiring finer detail (higher resolution) pixel in each higher layer is derived from average or majority of 4 pixels from the lower layer not as efficient for more variable or complex data used primarily as a way to store data for rapid retrieval on display devices    .

QUAD TREE STRUCTURE

RASTER DATA FORMAT

most raster formats are digital image formats.
most GISs accept TIF, GIF, JPEG or encapsulated PostScript, which are not georeferenced. DEMs are true raster data formats.

RASTER DATA FORMAT

 . line or area. areas. lines.VECTOR DATA MODELS  think of world as a space populated by discrete features of various shapes and kinds – points. any location in space may be empty or occupied by one or more point.

normally represents a geographic feature too small to be displayed as a line or area stored by their real (earth) coordinates   .Y co-ordinate.VECTOR DATA MODELS point  zero-dimensional abstraction of an object represented by a single X.

  .VECTOR DATA MODELS line  set of ordered co-ordinates that represent the shape of geographic features too narrow to be displayed as an area at the given scale or linear features with no area lines and areas are built from sequences of points in order. lines have a direction to the ordering of the points.

have attributes that describe the geographic feature they represent.  . defined by the lines that make up its boundary and a point inside its boundary for identification.VECTOR DATA MODELS polygon   feature used to represent areas.

Areas are lines are points are coordinates .

arc junctions are only at nodes.  points. endpoint of a line (arc) is called a node. stored with the arc is the topology (i. with links between them.  an area consist of lines and a line consists of points. lines. the connecting arcs and left and right polygons).VECTOR DATA MODELS  vector data evolved the arc/node model in the 1960s.   . and areas can each be stored in their own files.e.

Topology  A set of rules on how objects relate to each other  Major difference in file formats  Higher level objects have special topology rules .

stretched or under go similar geometric transformations. . and for operations such as network tracing and tests of polygon adjacency.  The study of geometric properties that do not change when the forms are bent.Topology Definition  The Science of mathematics of relationships used to validate the geometry of vector entities.

.

Why Topology Matters  Error Detection open polygons unlabeled polygons slivers polygons that cannot exist next to each other  Network Modeling .

Show Placitas  Arc Node Topology Cover# Lpoly# and Rpoly# Tnode fnode  Label errors .

Higher Level Object  Regions  Networks  TIN – Triangulated irregular network  Dynamic Segmentation .

Regions Overlapping areas with different attributes Fire history Disconnected areas with the same attributes Hawaii .

power grids. drainage network  Continuous connected networks  Rules for displacement in a network  Attribute value accumulations due to displacements . water supply sewerage systems.Networks  Road systems.

TIN  Vector Surface Model  Triangulated Irregular Network  A set of nonoverlapping triangles each with a constant gradient  A TIN can honor original input elevations .

rarely are maps topologically clean when digitized or imported.TOPOLOGY     topological data structures dominate GIS software.  . stored explicitly allows automated error detection and elimination. GIS has to be able to build topology from unconnected arcs.

Area.13. Attributes 1 1.10.2.TOPOLOGY 2 9 10 12 7 POLYGON “A” 5 4 1 2 8 1 3 6 File of Arcs by Polygon A: 1.6.9. Poi nts Fi le 11 13 1xy 2xy 3xy 4xy 5xy 6xy 7xy 8xy 9xy 10 x y 11 x y 12 x y 13 x y .3.11.7 2 1.2 .7 Arcs File Arc/Node Map Data Structure with Files.4.5.8.12.

.TOPOLOGY   relationship between nodes. topology allows many GIS operations to be done without accessing the point files. topologically structured database for ease of retrieval and implementation of spatial-relational operations. elegant and efficient relational database construction and analysis complete topology makes map overlay feasible. arcs and polygons. advantages:      simple.

VECTOR DATABASE CREATION  database creation involves several stages:    input of the spatial data input of the attribute data linking spatial and attribute data  spatial data is entered via digitized points and lines. much work is still needed before it can be used  . scanned and vectorized lines or directly from other digital sources once the spatial data has been entered.

topology must be "built"  this involves calculating and encoding relationships between the points. lines and areas this information may be automatically coded into tables of information in the database  .VECTOR DATABASE CREATION Building Topology  once points are entered and geometric lines are created.

Topological Model  Topology: mathematical method to define spatial relationships  Arc-node data model Arc: a series of points that start and end at a node Node: an intersection point where two or more arcs meet .

e.e.Topological Data Spatial Operations  Contiguity: spatial relationship of adjacency i.. Bus stand adjacent to railway station  Connectivity: interconnected pathways or networks i. street and trail networks. stream networks ..

Basic arc topology n2 A n1 Arc 1 2 1 B Topological Arcs File From To PL PR n1x n1y n2x n2y n1 n2 A B x y x y 3 A topological structure for the arcs. .

Nodes that are close together are snapped. Rarely are maps topologically clean when digitized or imported. Slivers due to double digitizing and overlay are eliminated. A GIS has to be able to build topology from unconnected arcs.TOPOLOGY  Topological data structures dominate GIS software.  Topology allows automated error detection and     elimination. .

Slivers Sliver .

Unsnapped node .

elimination.  Topology allows many GIS operations to be done without accessing the point files. .Topology Matters  The tolerances controlling snapping. because they can move features. and merging must be considered carefully.  Complete topology makes map overlay feasible.

VECTOR DATABASE CREATION Editing  during topology generation process. problems such as overshoots. undershoots and spikes are either flagged for editing by the user or corrected automatically  automatic editing involves the use of a tolerance value which defines the width of a buffer zone around objects within which adjacent objects should be joined .

 .VECTOR DATA MODELS Advantages  good representation of structures (points. polygons) compact and more efficient     topology can be completely described accurate graphics retrieval. lines. updating and generalization of graphics and attributes possible work well with pen and light-plotting devices and tablet digitizers.

 . TIN must be used to represent volumes.VECTOR DATA MODELS Disadvantages   complex data structures combination of several vector polygon maps or polygon and raster maps through overlay creates difficulties    simulation is difficult display and plotting can be expensive technology is expensive  not good at continuous coverage or plotters that fill areas.

.

  true vector GIS data formats include ArcView Shapefiles and ArcGIS Interchange Files (E00) which has topology.VECTOR DATA FORMATS  vector formats are either page definition languages or preserve ground coordinates. page languages are HPGL. . and Autocad DXF. PostScript.

VECTOR DATA MODELS List of coordinates “spaghetti”      simple easy to manage no topology lots of duplication. hence need for large storage space very often used in CAC (computer assisted cartography) .

but still this model does not use topology .VECTOR DATA MODELS Vertex Dictionary  no duplication.

VECTOR DATA MODELS Dual Independent Map Encoding (DIME)  developed by US Bureau of the Census nodes (intersections of lines) are identified with codes assigns a directional code in the form of a "from node" and a "to node"    both street addresses and UTM coordinates are explicitly defined for each link .

VECTOR TO RASTER EXCHANGE   data exchange by translation (export and import) can lead to significant errors in attributes and in geometry. . efficient data exchange is important for the future of GIS.

VECTOR TO RASTER EXCHANGE .

ADVANCED DATA MODELS .TIN  triangulated irregular network is a set of elevation points which have been connected to form a network of triangles.  developed in early 1970s as a simple way to build a surface the sample points are connected by lines to form triangles. within each triangle the surface is usually represented by a plane triangles fit together in a manner which simulates the face of the land.   .

ADVANCED DATA MODELS .TIN .

ADVANCED DATA MODELS .more points smooth terrain .TIN  Ir-regularly spaced sample points can be adapted to the terrain    rough terrain .less points an irregularly spaced sample is more efficient .

ADVANCED DATA MODELS . aspect and area.  three vertices having elevation attributes  TIN model work best in areas with sharp breaks in slope .TIN  TINs can be seen as polygons having attributes of  slope.

ADVANCED DATA MODELS .TIN .

channels. and many other geometric parameters Disadvantages  in many cases require visual inspection and manual control of the network .TIN Advantages  ability to describe the surface at different level of resolution  efficiency in storing data  allows simple calculation of basin areas.ADVANCED DATA MODELS . slopes.

DATABASES  a spatial database is a collection of spatially referenced data that acts as a model of reality these selected phenomena are deemed important enough to represent in digital form the digital representation might be for some past. present or future time period   .

resolution of instrument and abstraction and production factors .DIGITIAL DATABASES  Scaleless .data can be stored at the level of detail found in the environment cartographer is responsible for choosing the content and resolution scale critical factor:     level of resolution set by field instruments digitizing .

roads may not line up resolved using ancillary source materials additional problems when using data sets of different themes i.DIGITIAL DATABASES  problems when using data sets of different resolutions i.e. combing elevation and drainage data – water running uphill or non-level lakes   .e.

DIGITIAL DATABASES Value of databases:  Cost of creation – cheaper to get data from an existing database    Appropriateness of use Lack of alternative data sources Graphic output .

content. identify the custodians and access conditions to the data.METADATA   “data about the data” could include data elements that: identify the data. describe projection. quality of data describes the action taken when handling databases of varying scale  .

These aerial photographs cover Portugal and were obtained in August 1995 in false color infra red film at scale 1:40 000. Airborne data>Aerial photos Type of dataset Locations Temporal Range Dataset scales Dataset resolution Dataset quality remarks Information creation date Portugal 19951:25 000-1:50 000 1 .3 meters Aquisition of data: aerial photographs.Dataset information Title Abstract Ortofotos'95 Ortofotos'95 is a collection of ortho-rectified aerial photographs. The Directorate General of Forests and The Paper Mill industry are the owners of the aerial photographs (in paper format). the film is scanned at very high resolution and ortho-rectified using DTM derived from topographic cartography at scale 1:25 000 1999-10-29 . CNIG.

command line based with read and write to hard disk. with large data volumes and number of data users best to use a database management system (DBMS) relational design has been the most useful (since 1980s)   .)   small GIS projects sufficient to store geographic information as simple files.DATABASES  pre-1970s. diskettes database approach – all reading and writing through simple interface (no need to care about tapes. tapes. etc.

data elements. Main 1175 W. features  all information about one occurrence of a feature columns: attributes.DATABASE MANAGEMENT SYSTEMS  contain tables or feature classes in which: rows: entities.500 98.450 89. varaibles  one type of information for all features  key field is an attribute whose values uniquely identify each row Parcel Table entity Parcel # 8 9 36 75 Address 501 N Hi 590 N Hi 1001 W. fields. records.780 101. observations. 1st Block 1 2 4 12 $ Value 105.000 Key field Attribute .

780 101.RDBM  tables are related or joined using a common record identifier (column variable) present in both tables Example:   goal: produce map of values by distinct/neighbourhood problem: no distance code available in parcel table Parcel # 8 9 36 75 Parcel Table Address Block 501 N Hi 1 590 N Hi 2 1001 W. Main 4 1175 W. 1st 12 $ Value 105.DATABASES .450 89.500 98.000 .

DATABASES - RDBM

solution: join parcel table containing values with geography table containing location codings, using Block as key field
Parcel # 8 9 36 75 Parcel Table Address Block 501 N Hi 1 590 N Hi 2 1001 W. Main 4 1175 W. 1st 12 $ Value 105,450 89,780 101,500 98,000

Secondary or foreign key
Block 1 2 4 12 Geography Table District Tract A 101 B 101 B 105 E 202 City Dallas Dallas Dallas Garland

DATABASES - RDBM
Relational Linkages
Spatial Attributes

Water Right Locations

Descriptive Attributes

DATABASES
Advantage
 

very flexible export data to another system easily

enables simple operations
i.e. search for records satisfying some condition

. 8. 9. first stage Thin First-Year Ice. 7. second stage Medium First-Year Ice Thick First-Year Ice Old Ice Second-Year Ice Multi-Year Ice Thickness <10 cm 0-10 cm 10-30 cm 10-15 cm 15-30 cm 30-200 cm 30-70 cm 30-50 cm 50-70 cm 70-120 cm 120-200 cm Code 1 2 3 4 5 6 7 8 9 1. 4. Ice Rind Young Ice Grey Ice Grey-White Ice First-Year Ice Thin First-Year Ice Thin First-Year Ice.Description New Ice Nilas.