Data Representation: User-Defined Data Types

Data representation
User-defined data types

 When object-oriented programming is not being used, a programmer may choose not to
use any user-defined data types. However, for a large program, their use will make a
program less error-prone and more understandable. It also has less restriction and
allows for inevitable user definition. The use of built in data types are the same for any
program. However, there can't be a built-in record type because each different problem
will need an individual definition of a record.
i. Composite data types:
Composite user-defined data types have a definition with a reference to at least one
other type.
 ==Record Data type:== a data type that contains a fixed number of components that
can be of different types. it allows the programmer to collect together values with
different data types when these form a coherent whole. it could be used for the
implementation of a data structure where one or more of the variables defined are
pointer variables.
 TYPE
 <main identifier>
 DECLARE <subidentifier1> : <built in data type>
 DECLARE <subidentifier2> : <built in data type>
 ENDTYPE

 <main identifier>.<sub identifier(x)> ← <value>
 ==Set Data type:== allows a program to create sets and to apply the mathematical
operations defined in set theory. Operations like:
 • Union
 • Difference
 • Intersection
 • Include an element in the set
 • Exclude an element from the set
 • Check whether an element is in a set
 ==Objects and Classes:== in object-oriented programming, a program defines the
classes to be used-they're all user-defined data types. Then for each class, the objects
must be defined.
ii. Non-Composite data types:
Non-composite user-defined data types don’t involve a reference to another type. When
a programmer uses a simple built-in type the only requirement is for an identifier to be
named with a defined type. They have to be explicitly defined before an identifier can be
created-unlike built-in data types which include string, integer, real…
 ==Enumerated Data type:== a list of possible data values. The values defined here
have an implied order of values to allow comparisons to be made. Therefore value2 is

greater than value1(they're not string values and can't be quoted). This allows for
comparisons to be made. It is also countable thus finite values.
 TYPE
 <Datatype> = (<value1>,<value2>,<value3>…)
 ENDTYPE
 DECLARE <identifier> : <datatype>
 ==Pointer Data type:== used to reframe a memory location. it may be used to
construct dynamically varying data structures. The pointer definition has to relate to the
type of the variable that is being pointed to(doesn’t hold a value but a reference/address
to data).
 TYPE
 <Datatype> = ^<type name>
 ENDTYPE
 DECLARE <identifier> : <datatype>
 <assignment value> ← <identifier>^
Special use of a pointer variable is to access the value stored at the address pointed to.
The pointer variable is said to be dereferenced.
Advertisement
File organization and access

Contents, in a file of any type, is stored using a defined binary code that allows the file
to be used in the way intended. But, for storing data to be used by a computer program,
there are only 2 defined file types, a text file or a binary file.
 A text file contains data stored according to a defined character code defined by ASCII
or Unicode. A text file can be created using a text editor.
 A binary file is a file designed for storing data to be used by a computer program(0's and
1's). It stores data in its internal representation(an integer value might be stored in 2
bytes in 2's complement representation to represent a negative number) and this file is
created using a specific program. Its organization is based on records (a collection of
fields containing data values). file → records → fields → values
Methods of file organization
 ==Serial files:== contains records that have no defined order. A text file may be a serial
file where the file has repeating lines which are defined by an end of line character(s).
There's no end of record character. A record in a serial file must have a defined format
to allow data to be input and output correctly. To access a specific record, it has to go
through every record until found.
File access: Successively read record by record until the data required is found thus
very slow. Uses:
 Batch processing
 Backing up data on magnetic tape
 Banks record transactions involving customer accounts every time there is a transaction
 ==Sequential files:== has records that are ordered and is suited for long term storage
of data and thus is considered an alternative to a database. A key field is required for a
sequential file to be ordered for which the values are unique and sequential. This way it
can be easily accessed. A sequential database file is more efficient than a text file due
to data integrity, privacy and less data redundancy. A change in one file would update
any other files affected. Primary keys from the DBMS(database management system)
need to be unique but not ordered unlike the key field from the sequential files which
need to be ordered and unique. A particular record is found by sequentially reading the
value of the key field until the required value is found.
File access:
Successively read the value In the key field until the required key is found.
To edit/delete data:
Create a new version of the file. Data is copied from the old file to the new file until the
record is reached which needs editing or deleting. For deleting, reading and copying of
the old file continue from the next record. If a record has been edited, the new version is
written to the new file and the remaining records are copied to the new file.
 ==Direct access/random access files:== access isn't defined by a sequential reading
of the file(random). It's well suited for larger files as it takes longer to access
sequentially. Data in direct access files are stored in an identifiable record which could
be found by involving initial direct access to a nearby record followed by a limited serial
search. The choice of the position chosen must be calculated using data in the record
so the same calculation can be carried out when subsequently there's a search for the
data. One method is the hashing algorithm which takes the key field as an input and
outputs a value for the position of the record relative to the start of the file. To access,
the key is hashed to a specific location. This algorithm also takes into account the
potential maximum length of the file which is the number of records the file will store.
 eg: If the key field is numeric, divide by a suitable large number and use the remainder
to find a position. But we won't have unique positions. If a hash position is calculated
that duplicates one already calculated by a different key, the next position in the file is
used. this is why a search will involve direct access possibly followed by a limited serial
search. That's why it's considered partly sequential and partly serial.
File access:
The value in the key field is submitted to the hashing algorithm which then provides the
same value for the position in the file that was provided when the algorithm was used at
the time of data input. It goes to that hashed position and through another short linear
search because of collisions in the hashed positions. Fastest access.
To edit/delete data:
Only create a new file if the current file is full. A deleted record can have a flag set so
that in a subsequent reading process the record is skipped over. This allows it to be
overwritten.
Uses:
Most suited for when a program needs a file in which individual data items might be
read, updated or deleted.
Factors that determine the file organization to
use:
 How often do transactions take place, how often does one need to add data?
 How often does it need to be accessed, edited, or deleted?
Real numbers and normalized floating-point
representation
 Real number: A number that contains a fractional part.
 Floating-point representation: The approximate representation of a real number using
binary digits.
 Format: Number = ±Mantissa × BaseExponent
 Mantissa: The non-zero part of the number.
 Exponent: The power to which the base is raised to in order to accurately represent the
number.
 Base: The number of values the number systems allows a digit to take. 2 in the case of
floating-point representation.
The floating point representation stores a value for the mantissa and a value for the
exponent.
A defined number of bits are used for what is called the significant/mantissa, +-M.
Remaining bits are for the exponent, E. The radix, R is not stored in the representation
as it has an implied value of 2(representing 0 and 1's). If a real number was stored
using 8 bits: four bits for the mantissa and four bits for the exponent with each using two
complement representation. The exponent is stored as a signed integer. The mantissa
has to be stored as a fixed point real value. The binary point can be in the beginning
after the first bit(immediately after the sign bit) or before the last bit. The former
produces smaller spacing between the values that can be represented and is more
preferred. It also has a greater range than the fixed representation.
Converting a denary value expressed as a real number into a floating point binary
representation: Most fractional parts do not convert to a precise representation as
binary fractional parts represent a half, a quarter, an eighth…(even). Other than .5 no
other values unless the ones above can be converted accurately. So you convert by
multiplying by two and recording the whole number part.

For example: 8.63, 0.63 * 2 = 1.26 therefore .1 -> 0.26 * 2 = 0.52 and .10 -> 0.52 * 2 =
1.04 and .101 and you keep going until the required amount of bits are achieved.
The method for converting a positive value is:
1. Convert the whole number part
2. Add the sign bit 0
3. Convert the fractional part. You start by combining the two parts which gives the
exponent value of zero. Shift the binary points by shifting the decimal to the beginning
giving a higher exponent value. Depending on the number of bits, add extra 0's at the
end of the mantissa and beginning of the exponent.
4. Adjust the position of the binary point and change the exponent accordingly to
achieve a normalized form.
Therefore: 8.75 -> 1000 -> 01000 -> .11 -> 010000.11 -> 0.100011(mantissa) ->
0100011000 0100(10 for M, and 4 for E).
 For negatives, use 2's complement.
 When implementing the floating point representation, a decision has to be made
regarding the total number of bits to be used and how many for the mantissa and
exponent.
 Usually, the choice for the total number of bits will be provided as an option when the
program is written, however, the split between the two parts will have been determined
by the floating point processor.
 If there were a choice, it's convenient to note that increasing the number of bits for the
mantissa would give better precision but would leave fewer bits for the exponent thus
reducing the range of possible values and vice versa. For maximum precision, it is
necessary to normalize a floating point number.
 Optimum precision will only be made once full use is made of the bits in the mantissa
therefore using the largest possible magnitude for the value represented by the
mantissa.
 Also, the two most significant bits must be different. 0 1 for positives and 10 for
negatives.
 -they both equal 2 but the most precise is the second one with the, higher bits in the
mantissa.
 0.125 * 2^4 = 2 0 001 0100
 0.5 * 2^2 = 2 0 100 0010
-For negatives.
 0.25 * 2^4 = -4 1 110 0100
 1.0 * 2^2 = -4 1 000 0010
When the number is represented with the highest magnitude for the mantissa, the two
most significant bits are different thus that a number is in a normalized representation.
How a number could be normalized: for a positive number, the bits in the mantissa are
shifted left until the most significant bits are 0 followed by 1. For each shift left the value
of the exponent is reduced by 1. The same process of shifting is used for a negative
number until the most significant bits are 1 followed by 0. In this case, no attention is
paid to the fact that bits are falling off the most significant end of the mantissa. Thus
normalization is shifting bits to the left until the 2 most significant bits are different.
Problems with using floating point numbers:
1. The conversion of real denary values to binary mostly needs a degree of approximation
followed by the restriction of the number of bits used to store the mantissa. These
rounding errors can become significant after multiple calculations. The only way of
preventing a serious problem is to increase the precision by using more bits for the
mantissa. Programming languages therefore offer options to work in double/quadruple
precision.
2. The highest value represented is 112 thus a limited range. This produces an overflow
condition. If there is a result value smaller than one that can be stored, there would be
an underflow error condition. This very small number can be turned into zero but there
are several risks like multiplication or division of this value.
eg: One use of floating point numbers are in extended mathematical procedures
involving repeated calculations like weather forecasting which uses the mathematical
model of the atmosphere.

Data Representation: User-Defined Data Types

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Representation: User-Defined Data Types

Uploaded by

Copyright:

Available Formats

Data representation

User-defined data types

will need an individual definition of a record.

i. Composite data types:

 DECLARE <subidentifier1> : <built in data type>

 DECLARE <subidentifier2> : <built in data type>

operations defined in set theory. Operations like:

 • Include an element in the set

 • Exclude an element from the set

 • Check whether an element is in a set

 ==Objects and Classes:== in object-oriented programming, a program defines the

ii. Non-Composite data types:

created-unlike built-in data types which include string, integer, real…

have an implied order of values to allow comparisons to be made. Therefore value2 is

comparisons to be made. It is also countable thus finite values.

 DECLARE <identifier> : <datatype>

 ==Pointer Data type:== used to reframe a memory location. it may be used to

 <Datatype> = ^<type name>

 DECLARE <identifier> : <datatype>

 <assignment value> ← <identifier>^

The pointer variable is said to be dereferenced.

File organization and access

or Unicode. A text file can be created using a text editor.

created using a specific program. Its organization is based on records (a collection of

fields containing data values). file → records → fields → values

Methods of file organization

through every record until found.

 Backing up data on magnetic tape

value of the key field until the required value is found.

 ==Direct access/random access files:== access isn't defined by a sequential reading

search because of collisions in the hashed positions. Fastest access.

read, updated or deleted.

Factors that determine the file organization to

Real numbers and normalized floating-point

 Floating-point representation: The approximate representation of a real number using

 Format: Number = ±Mantissa × BaseExponent

 Mantissa: The non-zero part of the number.

complement representation. The exponent is stored as a signed integer. The mantissa

preferred. It also has a greater range than the fixed representation.

representation: Most fractional parts do not convert to a precise representation as

binary fractional parts represent a half, a quarter, an eighth…(even). Other than .5 no

multiplying by two and recording the whole number part.

The method for converting a positive value is:

1. Convert the whole number part

2. Add the sign bit 0

end of the mantissa and beginning of the exponent.

achieve a normalized form.

0100011000 0100(10 for M, and 4 for E).

 For negatives, use 2's complement.

 When implementing the floating point representation, a decision has to be made

by the floating point processor.

necessary to normalize a floating point number.

 0.125 * 2^4 = 2 0 001 0100

 0.5 * 2^2 = 2 0 100 0010

 0.25 * 2^4 = -4 1 110 0100

 1.0 * 2^2 = -4 1 000 0010

Problems with using floating point numbers:

mantissa. Programming languages therefore offer options to work in double/quadruple

are several risks like multiplication or division of this value.

model of the atmosphere.

You might also like