You are on page 1of 13

https://sites.google.

com/site/computing9691/

Chapter 2.3 Data types and data structures


2.3 (a) define and use different data types e.g. integer, real, Boolean, character and
string

A data type is a method of interpreting a pattern of bits.

Intrinsic data types


Intrinsic data types are the data types that are defined within a particular programming language.

There are numerous different data types. They are used to make the storage and processing of
data easier and more efficient. Different databases and programming systems have their own set
of intrinsic data types, but the main ones are:

• Integer;
• Real;
• Boolean;
• String;
• Character;
• Date;
• Container.

Integer
An integer is a positive or negative number that does not contain a fractional part. Integers are
held in pure binary for processing and storage. Note that some programming languages
differentiate between short and long integers (more bytes being used to store long integers).

Real
A real is a number that contains a decimal point. In many systems, real numbers are referred to
as singles and doubles, depending upon the number of bytes in which they are stored.

Boolean
A Boolean is a data-type that can store one of only two values – usually these values are True or
False. Booleans are stored in one byte – True being stored as 11111111 and False as 00000000.

String
A string is a series of alphanumeric characters enclosed in quotation marks. A string is
sometimes just referred to as ‘text’. Any type of alphabetic or numeric data can be stored as a
string: “Birmingham City”, “3/10/03” and “36.85” are all examples of strings. Each character
within a string will be stored in one byte using its ASCII code; modern systems might store each
character in two bytes using its Unicode. The maximum length of a string is limited only by the
available memory.

https://sites.google.com/site/computing9691/
Page 1 of 13
https://sites.google.com/site/computing9691/

Notes:
• if dates or numbers are stored as strings then they will not be sorted correctly; they will
be sorted according to the ASCII codes of the characters – “23” will be placed before
“9”;
• telephone numbers must be stored as strings or the leading zero will be lost.

Character
A character is any letter, number, punctuation mark or space, which takes up a single unit of
storage (usually a byte).

Dates
In most computer systems dates are stored as a ‘serial’ number that equates to the number of
seconds since January 1st, 1904 (thus they also contain the time). Although the serial numbers
are used for processing purposes, the results are usually presented in one of several ‘standard’
date formats – for example, dd/mm/yyyy, or dd MonthName, yyyy. Dates usually take 8 bytes of
storage.

Comparison of the common data types:

https://sites.google.com/site/computing9691/
Page 2 of 13
https://sites.google.com/site/computing9691/

2.3 (b) define and use arrays (one- and two-dimensional) for solving simple
problems (this should include initialising arrays, reading data into arrays and
performing a simple serial search on a one-dimensional array)
A data structure is a collection of different data items that are stored together in a clearly defined
way. Two common data structures are arrays and records.

Array
An array is a data structure, which allows a set of items of identical data type to be stored
together using the same identifier name.

Arrays are declared in a similar way to standard variables, except that the array size and
dimensions are included. For example, the following declares an array that reserves five
locations in memory and labels these as ‘Names’:

DIM Names(4) As String

The five individual locations are Names(0), Names(1), Names(2), Names(3), Names(4).
Each data item is called an element of the array. To reference a particular element a programmer
must use the appropriate index. For example, the following statement assigns data to the 5th
element:

Names(4) = “Johal”

Arrays simplify the processing of similar data. An algorithm for getting five names from the user
and storing them in the array Names is shown below:

Dim Names(4) As String


For i=0 to 4
Input Value
Names(i)=Value
Next i

https://sites.google.com/site/computing9691/
Page 3 of 13
https://sites.google.com/site/computing9691/

One-dimensional arrays
A one-dimensional array is a data structure in which the array is declared using a single index
and can be visually represented as a list.

The following diagram shows the visual representation of the array Names(4):

Two-dimensional arrays
A two-dimensional array is a data structure in which the array is declared using two indices and
can be visually represented as a table.

The following diagram shows the visual representation of an array Students(4,2):

Each individual element can be referenced by its row and column indices. For example:

Students(0,0) is the data item “JONES”


Students(2,1) is the item “M”
Students(1,2) is the item “LAMBOURNE”

https://sites.google.com/site/computing9691/
Page 4 of 13
https://sites.google.com/site/computing9691/

Initialising an array
Initialising an array is a procedure in which every value in the array is set with starter values –
this starting value would typically be “” for a string array, or 0 for a numeric array.

Initialisation needs to be done to ensure that the array does not contain results from a previous
use, elsewhere in the program.

Algorithm for initialising a one-dimensional numeric array:


DIM TestScores(9) As Integer
DIM Index As Integer
FOR Index = 0 TO 9
TestScores(Index) = 0
NEXT

Algorithm for initialising a two-dimensional string array:


DIM Students(4,2) As String
DIM RowIndex, ColumnIndex As Integer
FOR RowIndex = 0 TO 4
FOR ColumnIndex = 0 TO 2
Students(RowIndex,ColumnIndex) = “”
NEXT
NEXT

Serial search on an array


The following pseudo-code can be used to search an array to see if an item X exists:
01 DIM Index As Integer
02 DIM Flag As Boolean
03 Index = 0
04 Flag = False
05 Input X
06 REPEAT
07 IF TheArray(Index) = X THEN
08 Output Index
09 Flag = True
10 END IF
11 Index = Index + 1
12 UNTIL Flag = True OR Index > Maximum Size Of TheArray

Note that the variable Flag (line 04 and 09) is used to indicate when the item has been found and
stop the loop repeating unnecessarily (line 12 ends the loop if Flag has been set to True).

To complete the search algorithm some lines should be added, after the loop, to detect the times
when the item X was not found in the array:

13 IF Flag = False THEN


14 Show Message “Item not found”
15 END IF

https://sites.google.com/site/computing9691/
Page 5 of 13
https://sites.google.com/site/computing9691/

2.3 (c) design and implement a record format

User-defined data types


At times it is useful for programmers to be able to define their own data types. In such cases it is
common for these user-defined data types to contain items from several of the different intrinsic
types mentioned above.

User-defined Types (UDTs) are VB's way of implementing data structures. In C/C++, they are
called structs, in Pascal and COBOL they are called records (they are also called "group" items
in COBOL).

The following rules apply to UDTs:


UDTs may be declared only at the module-level (you may not declare a UDT in an individual
Sub or Function).

UDTs may have Public (project-level) or Private (module-level) scope. If the keyword Public or
Private is omitted, the default is Public.

UDTs with Public scope may only be defined in standard modules, not forms.

The syntax for defining a UDT is:

[Public | Private] Type TypeName


Variable1 As datatype
...
Variablen As datatype
End Type

For example, to define a UDT for an employee record, you might code the following:

Public Type EmployeeRecord


strEmpName As String
dtmHireDate As Date
sngHourlyRate As Single
End Type

However, the definition alone is not enough. The Type definition is basically a "template" on
which other variables are defined; the template itself does not store any data. To use a UDT, you
must define a variable "As" the name following the keyword "Type" (in this case,
"EmployeeRecord"). For example:

Dim udtEmpRec As EmployeeRecord

The above defines a variable called "udtEmpRec" which has the attributes defined by the
structure "EmployeeRecord". Thus, it is "udtEmpRec" which you refer to in your procedural
statements, NOT "EmployeeRecord". To reference an individual element of the structure, you
must qualify that element with the "udt" variable that you defined. For example, the following
code places data in the individual elements of udtEmpRec:

https://sites.google.com/site/computing9691/
Page 6 of 13
https://sites.google.com/site/computing9691/

udtEmpRec.strEmpName = "JOE SMITH"


udtEmpRec.dtmHireDate = #1/15/2001#
udtEmpRec.sngHrlyRate = 25.50

You can declare any number of variables "As" the UDT that you have defined. For example:

Dim udtEmpRec2 As EmployeeRecord


Dim audtEmpRec(1 To 10) As EmployeeRecord

Note that the second definition above declares an array of the EmployeeRecord type. To refer to
an individual field (such as strEmpName) in a particular "record number", you would code
something like the following:

audtEmpRec(5).strEmpName = "BILL JONES"

You can also have an array at the elementary level. For example:

Public Type EmployeeRecord


strEmpName As String
dtmHireDate As Date
sngHourlyRate As Single
dblQuarterlyEarnings(1 To 4) As Double
End Type

If you have this declaration:

Dim udtEmpRec As EmployeeRecord

Then a valid reference would be:

Print udtEmpRec.dblQuarterlyEarnings(2)

If you have this declaration:

Dim udtEmpRec(1 To 5) As EmployeeRecord

Then a valid reference would be:

Print udtEmpRec(3).dblQuarterlyEarnings(2)

https://sites.google.com/site/computing9691/
Page 7 of 13
https://sites.google.com/site/computing9691/

You can have nested UDTs. Consider the following declarations:

Public Type EmployeeName


FirstName As String
MidInit As String
LastName As String
End Type

Public Type EmployeeRecord


udtEmpName As EmployeeName
dtmHireDate As Date
sngHourlyRate As Single
dblQuarterlyEarnings(1 To 4) As Double
End Type

Dim udtEmpRec As EmployeeRecord

You would reference the EmployeeName fields as follows:

udtEmpRec.udtEmpName.FirstName
udtEmpRec.udtEmpName.MidInit
udtEmpRec.udtEmpName.LastName

Benefits of defined data-types


Whether intrinsic or user-defined, the use of data-types within a programming language:
• enable the compiler to reserve the correct amount of memory for the data – e.g. 4 bytes
for an integer;
• trap errors that a programmer has made and errors that a user of a program can make – a
variable defined as an integer cannot be given a fractional value;
• restrict the values that can be given to the data – a Boolean cannot be given the value
“maybe”;
• restrict the operations that can be performed on the data – a string cannot be divided by
10.

https://sites.google.com/site/computing9691/
Page 8 of 13
https://sites.google.com/site/computing9691/

Using an array with differing data types


If an array is to be used to store data of different data types, then:

1. The different data must be defined within a Record:

RECORD Student
Name: String
Gender: Character
Age: Integer
END RECORD

2. The array may now be declared to contain items of data type Record:

DIM MyArray(6) As Record

3. Each record can now be stored as a single item within the array.

Key terms
The following terms are used to describe parts of a database:

https://sites.google.com/site/computing9691/
Page 9 of 13
https://sites.google.com/site/computing9691/

Field
A field is a single category of data within a database, which appears in all the records of a table –
it is a column within a table.

Record
A record is a collection of fields that contains data about a single item or person – it is a row
within a table.

Table
A table is a collection of related records.

Key fields
A key field is used to identify the records within a database. There are two types of keys:
• primary key;
• secondary key.

Primary key
The Primary key is a unique field that identifies a single record.
Some ‘natural’ primary keys are:
• CarRegistrationNumber;
• ISBN – a 10-digit code that uniquely identifies a book;
• MAC number – a 6-part number that uniquely identifies a network card
• NationalInsuranceNumber – can uniquely identify employees of a company (not
usable for under 16s or for non-British nationalists!)

Secondary key
A Secondary key is a non-unique field, used in a search, that does not always produce only
one matching record.

Some typical secondary keys are:


• LastName;
• PostCode;
• DateOfBirth

https://sites.google.com/site/computing9691/
Page 10 of 13
https://sites.google.com/site/computing9691/

2.3 (d) estimate the size of a file from its structure and the number of records
The basic formula for estimating the size of a file is:

If we consider a file with 200 records, which stores the details of an organisation’s customers:

CUSTOMER(RefCode, Name, PostCode, Telephone, DoB, Age)

We can estimate the size of the record as follows:

Thus 200 records would require:

Note that to determine the maximum field length, an extreme case was considered and several
bytes added to play safe.

https://sites.google.com/site/computing9691/
Page 11 of 13
https://sites.google.com/site/computing9691/

2.3 (e) store, retrieve and search for data in files


2.3 (f) use the facilities of a procedural language to perform file operations
(opening, reading, writing, updating, inserting, appending and closing) on
sequential files

Handling data within files


Before a program can use a file, the file needs to be opened. The program needs to specify whether the
file is to be opened for writing, or opened only for reading. After the data has been read or written to the
file, it then needs to be closed.
All algorithms for handling data within files must have the following lines:

OPEN [New/Existing] File in READ/WRITE MODE




CLOSE All Files

Note that these lines are not necessarily at the beginning and the end of the code, but they must be in
place to make sure that the file(s) is opened and closed correctly.

Adding data
Serial file
Adding data is simple – it is added to the end of the file:

OPEN File in WRITE MODE


GOTO End of File
WRITE NewData
CLOSE File

Sequential file
The addition of data to a sequential file is more complicated than in a serial file, because the record must
be inserted into the correct position – not just at the end of the file.
In practise, when a record is added to a sequential file, it is usual to create a new file and copy all the
records from the old file, but insert the new record in its correct position.

An algorithm for this is shown below:


OPEN a NewFile in WRITE MODE
OPEN ExistingFile in READ MODE
READ First Record in ExistingFile
REPEAT
IF key of SelectedRecord in ExistingFile < key of NewRecord THEN
COPY SelectedRecord into NewFile
ELSE
COPY NewRecord into NewFile
COPY SelectedRecord into new file
END IF
READ Next Record in ExistingFile
END REPEAT when new record has been copied
COPY ALL remaining records from ExistingFile into NewFile
CLOSE NewFile and ExistingFile

https://sites.google.com/site/computing9691/
Page 12 of 13
https://sites.google.com/site/computing9691/

Searching for/retrieving data


Serial file
To retrieve data from a serial file, a program must examine the first record and then all subsequent
records until the desired one is found or until the end of the file has been reached.
The following algorithm does this:

OPEN File in READ MODE


READ First Record
SET Variable Found = False
REPEAT
IF RequiredRecord = SelectedRecord THEN
SET Variable Found = True
ELSE
READ Next Record
END IF
END REPEAT when Found = True OR when EOF is reached
CLOSE File

Note that to be sure that a record does not exist in a serial file, every single record must be examined.

Sequential file
Searching a sequential file is the same as searching a serial file, except that the search only has to
continue until a record with a higher Key field is reached – this would mean that the data is not in the file.

OPEN File in READ MODE


READ First Record
SET Variables Found = False, Exists = True
REPEAT
IF RequiredRecord = SelectedRecord THEN
SET Variable Found = True
ELSE
READ Next Record
IF Key of RequiredRecord > Key of SelectedRecord THEN
Exists = False
END IF
END IF
END REPEAT when Found = True OR Exists = False OR when EOF is reached
CLOSE File

Deleting data
Serial or sequential file
OPEN a NewFile in WRITE MODE
OPEN ExistingFile in READ MODE
READ First Record in ExistingFile
REPEAT
IF Key of SelectedRecord <> Key of RecordToDelete THEN
COPY SelectedRecord into NewFile
ELSE
DO NOT COPY SelectedRecord
END IF
READ Next Record in ExistingFile
END REPEAT when EOF is reached
CLOSE NewFile and ExistingFile
DELETE ExistingFile
RENAME NewFile to Name of ExixtingFile

https://sites.google.com/site/computing9691/
Page 13 of 13

You might also like