You are on page 1of 11

Data representation

User-defined data types

 When object-oriented programming is not being used, a programmer may choose not to

use any user-defined data types. However, for a large program, their use will make a

program less error-prone and more understandable. It also has less restriction and

allows for inevitable user definition. The use of built in data types are the same for any

program. However, there can't be a built-in record type because each different problem

will need an individual definition of a record.

i. Composite data types:

Composite user-defined data types have a definition with a reference to at least one

other type.

 ==Record Data type:== a data type that contains a fixed number of components that

can be of different types. it allows the programmer to collect together values with

different data types when these form a coherent whole. it could be used for the

implementation of a data structure where one or more of the variables defined are

pointer variables.


 <main identifier>

 DECLARE <subidentifier1> : <built in data type>

 DECLARE <subidentifier2> : <built in data type>


 <main identifier>.<sub identifier(x)> ← <value>

 ==Set Data type:== allows a program to create sets and to apply the mathematical

operations defined in set theory. Operations like:

 • Union

 • Difference

 • Intersection

 • Include an element in the set

 • Exclude an element from the set

 • Check whether an element is in a set

 ==Objects and Classes:== in object-oriented programming, a program defines the

classes to be used-they're all user-defined data types. Then for each class, the objects

must be defined.

ii. Non-Composite data types:

Non-composite user-defined data types don’t involve a reference to another type. When

a programmer uses a simple built-in type the only requirement is for an identifier to be

named with a defined type. They have to be explicitly defined before an identifier can be

created-unlike built-in data types which include string, integer, real…

 ==Enumerated Data type:== a list of possible data values. The values defined here

have an implied order of values to allow comparisons to be made. Therefore value2 is

greater than value1(they're not string values and can't be quoted). This allows for

comparisons to be made. It is also countable thus finite values.


 <Datatype> = (<value1>,<value2>,<value3>…)


 DECLARE <identifier> : <datatype>

 ==Pointer Data type:== used to reframe a memory location. it may be used to

construct dynamically varying data structures. The pointer definition has to relate to the

type of the variable that is being pointed to(doesn’t hold a value but a reference/address

to data).


 <Datatype> = ^<type name>


 DECLARE <identifier> : <datatype>

 <assignment value> ← <identifier>^

Special use of a pointer variable is to access the value stored at the address pointed to.

The pointer variable is said to be dereferenced.


File organization and access

Contents, in a file of any type, is stored using a defined binary code that allows the file

to be used in the way intended. But, for storing data to be used by a computer program,

there are only 2 defined file types, a text file or a binary file.

 A text file contains data stored according to a defined character code defined by ASCII

or Unicode. A text file can be created using a text editor.

 A binary file is a file designed for storing data to be used by a computer program(0's and

1's). It stores data in its internal representation(an integer value might be stored in 2

bytes in 2's complement representation to represent a negative number) and this file is

created using a specific program. Its organization is based on records (a collection of

fields containing data values). file → records → fields → values

Methods of file organization

 ==Serial files:== contains records that have no defined order. A text file may be a serial

file where the file has repeating lines which are defined by an end of line character(s).

There's no end of record character. A record in a serial file must have a defined format

to allow data to be input and output correctly. To access a specific record, it has to go

through every record until found.

File access: Successively read record by record until the data required is found thus

very slow. Uses:

 Batch processing

 Backing up data on magnetic tape

 Banks record transactions involving customer accounts every time there is a transaction

 ==Sequential files:== has records that are ordered and is suited for long term storage

of data and thus is considered an alternative to a database. A key field is required for a
sequential file to be ordered for which the values are unique and sequential. This way it

can be easily accessed. A sequential database file is more efficient than a text file due

to data integrity, privacy and less data redundancy. A change in one file would update

any other files affected. Primary keys from the DBMS(database management system)

need to be unique but not ordered unlike the key field from the sequential files which

need to be ordered and unique. A particular record is found by sequentially reading the

value of the key field until the required value is found.

File access:

Successively read the value In the key field until the required key is found.

To edit/delete data:

Create a new version of the file. Data is copied from the old file to the new file until the

record is reached which needs editing or deleting. For deleting, reading and copying of

the old file continue from the next record. If a record has been edited, the new version is

written to the new file and the remaining records are copied to the new file.

 ==Direct access/random access files:== access isn't defined by a sequential reading

of the file(random). It's well suited for larger files as it takes longer to access

sequentially. Data in direct access files are stored in an identifiable record which could

be found by involving initial direct access to a nearby record followed by a limited serial

search. The choice of the position chosen must be calculated using data in the record

so the same calculation can be carried out when subsequently there's a search for the

data. One method is the hashing algorithm which takes the key field as an input and

outputs a value for the position of the record relative to the start of the file. To access,
the key is hashed to a specific location. This algorithm also takes into account the

potential maximum length of the file which is the number of records the file will store.

 eg: If the key field is numeric, divide by a suitable large number and use the remainder

to find a position. But we won't have unique positions. If a hash position is calculated

that duplicates one already calculated by a different key, the next position in the file is

used. this is why a search will involve direct access possibly followed by a limited serial

search. That's why it's considered partly sequential and partly serial.

File access:

The value in the key field is submitted to the hashing algorithm which then provides the

same value for the position in the file that was provided when the algorithm was used at

the time of data input. It goes to that hashed position and through another short linear

search because of collisions in the hashed positions. Fastest access.

To edit/delete data:

Only create a new file if the current file is full. A deleted record can have a flag set so

that in a subsequent reading process the record is skipped over. This allows it to be



Most suited for when a program needs a file in which individual data items might be

read, updated or deleted.

Factors that determine the file organization to

 How often do transactions take place, how often does one need to add data?
 How often does it need to be accessed, edited, or deleted?

Real numbers and normalized floating-point

 Real number: A number that contains a fractional part.

 Floating-point representation: The approximate representation of a real number using

binary digits.

 Format: Number = ±Mantissa × BaseExponent

 Mantissa: The non-zero part of the number.

 Exponent: The power to which the base is raised to in order to accurately represent the


 Base: The number of values the number systems allows a digit to take. 2 in the case of

floating-point representation.

The floating point representation stores a value for the mantissa and a value for the


A defined number of bits are used for what is called the significant/mantissa, +-M.

Remaining bits are for the exponent, E. The radix, R is not stored in the representation

as it has an implied value of 2(representing 0 and 1's). If a real number was stored

using 8 bits: four bits for the mantissa and four bits for the exponent with each using two

complement representation. The exponent is stored as a signed integer. The mantissa

has to be stored as a fixed point real value. The binary point can be in the beginning

after the first bit(immediately after the sign bit) or before the last bit. The former
produces smaller spacing between the values that can be represented and is more

preferred. It also has a greater range than the fixed representation.

Converting a denary value expressed as a real number into a floating point binary

representation: Most fractional parts do not convert to a precise representation as

binary fractional parts represent a half, a quarter, an eighth…(even). Other than .5 no

other values unless the ones above can be converted accurately. So you convert by

multiplying by two and recording the whole number part.

For example: 8.63, 0.63 * 2 = 1.26 therefore .1 -> 0.26 * 2 = 0.52 and .10 -> 0.52 * 2 =

1.04 and .101 and you keep going until the required amount of bits are achieved.

The method for converting a positive value is:

1. Convert the whole number part

2. Add the sign bit 0

3. Convert the fractional part. You start by combining the two parts which gives the

exponent value of zero. Shift the binary points by shifting the decimal to the beginning

giving a higher exponent value. Depending on the number of bits, add extra 0's at the

end of the mantissa and beginning of the exponent.

4. Adjust the position of the binary point and change the exponent accordingly to

achieve a normalized form.

Therefore: 8.75 -> 1000 -> 01000 -> .11 -> 010000.11 -> 0.100011(mantissa) ->

0100011000 0100(10 for M, and 4 for E).

 For negatives, use 2's complement.

 When implementing the floating point representation, a decision has to be made

regarding the total number of bits to be used and how many for the mantissa and


 Usually, the choice for the total number of bits will be provided as an option when the

program is written, however, the split between the two parts will have been determined

by the floating point processor.

 If there were a choice, it's convenient to note that increasing the number of bits for the

mantissa would give better precision but would leave fewer bits for the exponent thus
reducing the range of possible values and vice versa. For maximum precision, it is

necessary to normalize a floating point number.

 Optimum precision will only be made once full use is made of the bits in the mantissa

therefore using the largest possible magnitude for the value represented by the


 Also, the two most significant bits must be different. 0 1 for positives and 10 for


 -they both equal 2 but the most precise is the second one with the, higher bits in the


 0.125 * 2^4 = 2 0 001 0100

 0.5 * 2^2 = 2 0 100 0010

-For negatives.

 0.25 * 2^4 = -4 1 110 0100

 1.0 * 2^2 = -4 1 000 0010

When the number is represented with the highest magnitude for the mantissa, the two

most significant bits are different thus that a number is in a normalized representation.

How a number could be normalized: for a positive number, the bits in the mantissa are

shifted left until the most significant bits are 0 followed by 1. For each shift left the value

of the exponent is reduced by 1. The same process of shifting is used for a negative

number until the most significant bits are 1 followed by 0. In this case, no attention is
paid to the fact that bits are falling off the most significant end of the mantissa. Thus

normalization is shifting bits to the left until the 2 most significant bits are different.

Problems with using floating point numbers:

1. The conversion of real denary values to binary mostly needs a degree of approximation

followed by the restriction of the number of bits used to store the mantissa. These

rounding errors can become significant after multiple calculations. The only way of

preventing a serious problem is to increase the precision by using more bits for the

mantissa. Programming languages therefore offer options to work in double/quadruple


2. The highest value represented is 112 thus a limited range. This produces an overflow

condition. If there is a result value smaller than one that can be stored, there would be

an underflow error condition. This very small number can be turned into zero but there

are several risks like multiplication or division of this value.

eg: One use of floating point numbers are in extended mathematical procedures

involving repeated calculations like weather forecasting which uses the mathematical

model of the atmosphere.

You might also like