You are on page 1of 7

University of Hawassa

Faculty of Agriculture

Department of Animal & Range Sciences

NaRM 326: Remote Sensing (RS) & Geographic Information
System (GIS)
UNIT 2:

COMPUTER REPRESENTATIONS OF GEOGRAPHIC INFORMATION:

(RASTER DATA STRUCTURE / RASTER REPRESENTATIONS)

This text note / or lecture will give you an introduction about the raster data
structure, which is particularly useful for handling Continuous geographic fields
/ or continuous (data) surfaces, but often used also for other types of data. In a
raster database, the data is stored in cells in a matrix and this is a very important
difference from the vector data structure.

In this example, a piece of land contains 3 classes (objects): lake, town and
forest. To convert this landscape to a raster data structure a grid (matrix) is
overlaid over the landscape and the classes are given a unique code (identifier),
in this case lake=1, town=2 and forest=3. Each cell in the matrix represents a
certain area in the real world, depending on the size of the cell.

A raster database is made of columns and rows. The rows are numbered starting
from the upper-left corner of the database, unlike in a vector database where the
origin of the coordinate system starts at the lower-left corner. Each cell is
identified with an index corresponding to the column number and the row
number. For example, the index “(7, 5)” corresponds to the cell in the column 7
and row 5.

Knowing the index number for a cell is not enough to know where the cell is
located geographically (on Earth), since the index for the cell only locates the
cells position in the matrix. To geo code the entire grid reference points for at
least two corners of the grid are necessary. To calculate the location of an
individual cell we also need to know the cell resolution (area the cell covers in
reality).

The true location of each cell (coordinates) can be calculated based on the
minimum X and Y coordinates of the grid. Most often, the coordinates represent
the location of the CENTER of the cell. Check the grid on the previous slide to
make sure that you understand how the computation works.

The value of a cell can represent a specific element or object in a landscape, like
for example 1 = lake, but it can also be an ID number for that cell. The ID number
can thereafter be linked to more complex attribute data (for example, text files,
tables, video files, pictures...). But in most cases the cell value is of the first type,
the cell value represents an object in the real world, which means that thematic
information is stored in separate raster databases.

The cell values can be either numbers or characters, depending on the needs of
the user. Numerical data can be stored in different formats, e.g. stored as byte (8
bit), integer (16 bit) or real (32 bit) data, depending whether it is important to have
decimals (fractions) and/or negative values in the numbers, and depending on
the size of the numbers to be stored. The database can also be BINARY
(numbers) or ASCII (numbers + characters + symbols), or LOGICAL expressions
(yielding true/false results) formats optionally.

2
When creating a raster database, the first step is to decide the resolution of the
grid (the size of the cells). It is very rare that the resolution is different for the X
and Y dimensions, although this is theoretically possible. Normally, the cells are
squares with equal X and Y resolutions. All cells must be given a value (so if
there is nothing to represent in the raster, the value zero can be given for
example). Raster data therefore requires a lot of storing space in your computer
because the raster structure does not allow for “empty” cells (compare with the
example on the slide – the zero cells contains no useful information, but still this
has to be stored).

One may ask “what happens if there is more than one geometrical object found
within the same cell?” One solution is to take to dominant area within a cell. In
this example, the forest covers a bigger area than the other classes so the cell
could be coded as “forest”. Another method is to code the cell with the class
found at the center of the cell. In this example, the cell would now be coded as
“lake”. As you can see in the example, the result may differ considerably
depending on which coding method that is selected. If the creator of a raster
database is aware of the algorithm used during rasterization process, he or she
should always add this information to the documentation data associated with a
raster database!

A very important problem with the raster data structure is that it does not permit
the user to know anything about what happens inside a cell. The cell is the
smallest unit in the database and anything that is smaller than the cell will not
show in the database. The following three slides will illustrate this for different
object types.

If point data is stored as raster structure data it is not possible to know exactly
where the points were situated within the cells. To increase the precision, the cell
size should be reduced (but more cells = more data = more storage space in
your computer! There is always a “tradeoff” between resolution and memory.

3
Information about the exact location of linear objects is lost in a similar way when
translating to raster data. In this example, both the red and the black line
networks will be represented in exactly the same manor using the raster data
structure despite the fact that they are very different to each other in reality.

Information about the exact location of linear objects is lost in a similar way when
translating to raster data. In this example, both the red and the black line
networks will be represented in exactly the same manor using the raster data
structure despite the fact that they are very different to each other in reality.

An important advantage with raster data is that it is possible to represent
continuous surfaces (continuous geographic fields) in a very realistic
manor. Topography or temperature, for example, occurs everywhere and vary
gradually over a surface and are as a consequence ideally for storing in a raster
database.

To summarize particularities with the raster data structure: The data structure do
not allow for empty cells. This in turn will cause raster databases to be large in
respect of storage usage on the computers hard disk. However, more
sophisticated raster software normally use different types of data compression,
similar to what is used for compressing image files, to reduce the storage space.

Factors influencing the size of a raster database are: Number of columns and
rows, which in turn is affected by the cell size provided that the database should
cover the same geographical area. Data type, which depends on the type of
numerical data that is stored, e.g. if only integers between 0 and 255 are to be
stored data can be stored as Byte data, as opposed to storing of very low and
high magnitude values with several decimals, which will demand data to be
stored as Real data type. File type, e.g. if the data file is stored in a very compact
format as binary instead of the more storage demanding ASCII-format. Data

4
compression type will also influence the space required. In most GIS software
handling raster data the user has at least limited control over data storing and as
a general rule a database should not be bigger (in terms of memory) than
needed. This means that the user must select data types, etc that are
appropriate for the type of data being stored.

Many GIS software programs offer the possibility to save the data as ASCII or
Binary data. In a computer the bit is the smallest possible unit of information and
can be considered as a sort of “switch” that can be either ON (1) or OFF (0).
Eight (8) bits form a byte. Here is an example of how a number can be stored
using one byte. You start at the RIGHT, and ADD all the values where the bits
are 1’s. If all bits have the value 1, the byte equals the number 255, while if they
7 6 5 4 3 2 1 0
are all 0’s, the byte equals 0. In a byte, the bits = 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2
(note: the base 2 is why it is called BINARY!). If data is stored as ASCII data
type, at least one byte is needed for each single number, e.g. the number 57 on
the slide that is possible to represent with one byte when stored as binary data
will require two bytes when stored as ASCII data, one for the number “5” and one
for the number “7”.

The main issues to consider for the raster data structure is that the way of storing
is quite simple and easy to understand, data handling can be somewhat slow if
the cells are small and larger areas should be represented, the raster data is very
efficient for representing continuous surfaces and the data format is particularly
suitable for combining with remote sensing data, since this type of data always is
stored in raster format.

As a summary, discrete data is ideally represented with a vector database. In this
example, a person walks from point A to point B and crosses a number of private
properties. The borders of the properties are very accurately defined, the borders
are discrete and the person will pass immediately from one property to the next.

5
For the route as in the previous slide, but not considering the topography instead
of properties, topography has no discrete borders, on the contrary the
topography vary continuously during the crossing and consequently this type of
data is better represented with raster data. Today, most GIS software programs
can handle both vector and raster structure data models.

6
(VERCTOR DATA STRUCTURE / VECTOR REPRESENTATIONS)

7