You are on page 1of 59

CP 111: Principles of Programming

Data Types

02/02/2024 05:38 AM CP 111: Principles of Programming 1


Introduction
 Data type:
 Defines a collection of data objects and a set of
predefined operations on those objects.
 An object represents an instance of a user-defined
(abstract data) type.
 Computer programs produce results by manipulating
data.
 Descriptor:
 It isa collection of the attributes of a variable.
 An implementation a descriptor is a collection of
memory cells that store variable attributes.

02/02/2024 05:38 AM CP 111: Principles of Programming 2


Introduction
 Descriptor:
 If the attributes are static, descriptor are required only
at compile time.
 They are built by the compiler, usually as a part of the
symbol table, and are used during compilation.
 For dynamic attributes, part or all of the descriptor
must be maintained during execution.
 Descriptors are used for type checking and by
allocation and deallocation operations.
 One design issue for all data types
 What operations are defined and how are they
specified?

02/02/2024 05:38 AM CP 111: Principles of Programming 3


Primitive Data Types
 Also known as fundamental data types:
 Data types not defined in terms of other types.
 Almost all programming languages provide a set of
primitive data types.
 Some primitive data types are merely reflections of
the hardware.
 Others require only a little non-hardware support for
their implementation.
 The primitive data types of a language, along with one
or more type constructors provide structured types.

02/02/2024 05:38 AM CP 111: Principles of Programming 4


Primitive Data Types: Integer
 Almost always an exact reflection of the hardware,
so the mapping is trivial.
 Integer types are supported by the hardware.
 There may be as many as eight different integer
types in a language.
 Java has four: byte, short, int, and long.
 C++ has: int, long

02/02/2024 05:38 AM CP 111: Principles of Programming 5


Primitive Data Types: Floating Point
 Model real numbers, but only as approximations for
most real values.
 On most computers, floating-point numbers are stored
in binary, which exacerbates the problem.
 Another problem is the loss of accuracy through
arithmetic operations.
 Languages for scientific use support at least two
floating-point types; sometimes more:
 float: Single precision IEEE floating point
standard

02/02/2024 05:38 AM CP 111: Principles of Programming 6


Primitive Data Types: Floating Point
 Model real numbers, but only as approximations for
most real values.
 Languages for scientific use support at least two
floating-point types; sometimes more:
 double: double precision IEEE floating point
standard

02/02/2024 05:38 AM CP 111: Principles of Programming 7


Primitive Data Types: Decimal
 Most larger computers that are designed to support
business applications have hardware support for
decimal data types.
 Decimal types store a fixed number of decimal
digits, with the decimal point at a fixed position in
the value.
 They are primary data types for business data
processing and are therefore essential to COBOL.
 C# also supports decimal data type
 Advantage: accuracy of decimal values.
 Disadvantages: limited range since no exponents are
allowed, and its representation wastes memory.
02/02/2024 05:38 AM CP 111: Principles of Programming 8
Primitive Data Types: Complex
 Some languages support complex data types:
 The languages include Fortran, Python, C99
 Each value consists of two floats, the real part and
the imaginary part.
 Literal form (in Python): (7 + 3j), where 7 is the real
part and 3 is the imaginary part

02/02/2024 05:38 AM CP 111: Principles of Programming 9


Primitive Data Types: Boolean
 Range of values: two elements, one for “true” and
one for “false”.
 It is the simplest of all data types
 Introduced by ALGOL 60.
 They are used to represent switched and flags in
programs.
 The use of Booleans enhances readability.
 One popular exception is C89, in which numeric
expressions are used as conditionals.
 In such expressions, all operands with nonzero values
are considered true, and zero is considered false.

02/02/2024 05:38 AM CP 111: Principles of Programming 10


Primitive Data Types: Character
 Character types are stored as numeric codings
(ASCII / Unicode).
 Traditionally,the most commonly used coding was
the 8-bit code ASCII (American Standard Code for
Information Interchange).
 A 16-bit character set named Unicode has been
developed as an alternative.
 Includes characters from most natural languages
 Java was the first widely used language to use the
Unicode character set.
 Since then, it has found its way into JavaScript
and C#.

02/02/2024 05:38 AM CP 111: Principles of Programming 11


Primitive Data Types: Character
 32-bit Unicode (UCS-4)
 Supported by Fortran, starting with 2003
 Example of character data type usage in C++:
char grade=‘A’;

02/02/2024 05:38 AM CP 111: Principles of Programming 12


Character String Type
 A character string type is one in which values are
sequences of characters.
 Important design issues:
 Is it a primitive type or just a special kind of array?
 Is the length of objects static or dynamic?
 C and C++ use char arrays to store char strings
 Provides a collection of string operations through a
standard library whose header is string.h.

02/02/2024 05:38 AM CP 111: Principles of Programming 13


Character String Type: Operations
 Typical operations:
 Assignment and copying
 Comparison (=, >, etc.)
 Catenation
 Substring reference
 Pattern matching

02/02/2024 05:38 AM CP 111: Principles of Programming 14


Character String Type: Operations
 Some of the most commonly used library functions
for character strings in C and C++ are
 strcpy: copy strings
 strcat: catenates on given string onto another
 strcmp: lexicographically compares (the order of
their codes) two strings
 strlen: returns the number of characters, not counting
the null

02/02/2024 05:38 AM CP 111: Principles of Programming 15


Character String Type in Certain
Languages
 C and C++
 Not primitive
 Use char arrays and a library of functions that provide
operations
 SNOBOL4 (a string manipulation language)
 Primitive
 Many operations, including elaborate pattern
matching

02/02/2024 05:38 AM CP 111: Principles of Programming 16


Character String Type in Certain
Languages
 Fortran and Python
 Primitive type with assignment and several operations
 Java
 Primitive via the String class
 Perl, JavaScript, Ruby, and PHP
 Provide built-in pattern matching, using regular
expressions.

02/02/2024 05:38 AM CP 111: Principles of Programming 17


String Length Options
 Limited Dynamic Length Strings
 Allow strings to have varying length up to a declared
and fixed maximum set by the variable’s definition
 Exemplified by the strings in C.
 Dynamic Length Strings
 Allows strings various length with no maximum.
 Requires the overhead of dynamic storage allocation
and deallocation but provides flexibility.
 Example: Perl and JavaScript.
 Ada supports all three string length options.

02/02/2024 05:38 AM CP 111: Principles of Programming 18


String Length Options
 Static Length String:
 The length can be static and set when the string is
created.
 This is the choice for the immutable objects of:
 Java’s String class
 Similar classes in the C++ standard class library
 .NET class library available to C#.

02/02/2024 05:38 AM CP 111: Principles of Programming 19


Array Types
 An array is an aggregate of homogeneous data
elements.
 An individual element is identified by its position in
the aggregate, relative to the first element.
 A reference to an array element in a program often
includes one or more non-constant subscripts.
 Such references require a run-time calculation to
determine the memory location being referenced.

02/02/2024 05:38 AM CP 111: Principles of Programming 20


Array Design Issues
 Important considerations are:
 What types are legal for subscripts?
 Are subscripting expressions in element references
range checked?
 When are subscript ranges bound?
 When does allocation take place?
 What is the maximum number of subscripts?
 Can array objects be initialized?
 Are any kind of slices supported?

02/02/2024 05:38 AM CP 111: Principles of Programming 21


Array Indexing
 Indexing is a mapping from indices to elements.
 The mapping can be shown as:
 map(array_name, index_value_list) → an element
 Example in Ada
 Sum := Sum + B(I);
 Because ( ) are used for both subprogram parameters
and array subscripts in Ada, this results in reduced
readability.
 C-based languages such as C++ and Java use [ ] to
delimit array indices.

02/02/2024 05:38 AM CP 111: Principles of Programming 22


Array Indexing
 Two distinct types are involved in an array type:
 The element type
 The type of the subscripts.
 The type of the subscript is often a sub-range of
integers.
 Ada allows other types as subscripts, such as
Boolean, char, and enumeration.
 Among contemporary languages, C, C++, Perl, and
Fortran don’t specify range checking of subscripts,
but Java, and C# do.

02/02/2024 05:38 AM CP 111: Principles of Programming 23


Array Indexing
 Two distinct types are involved in an array type:
 The element type
 The type of the subscripts.
 The type of the subscript is often a sub-range of
integers.
 Ada allows other types as subscripts, such as
Boolean, char, and enumeration.
 Among contemporary languages, C, C++, Perl, and
Fortran don’t specify range checking of subscripts,
but Java, and C# do.

02/02/2024 05:38 AM CP 111: Principles of Programming 24


Subscript Binding and Array Categories
 The binding of subscript type to an array variable is
usually static
 The subscript value ranges are sometimes
dynamically bound.
 In C-based languages, the lower bound of all index
ranges is fixed at 0; Fortran 95, it defaults to 1.
 Static Array
 is one in which the subscript ranges are statically
bound and storage allocation is static (done before run
time).
 Advantages: efficiency “No allocation &
deallocation.”
 Arrays declared in C & C++ function that includes the
static
02/02/2024 modifier areCPstatic.
05:38 AM 111: Principles of Programming 25
Subscript Binding and Array Categories
 Fixed stack-dynamic array
 is one in which the subscript ranges are statically
bound, but the allocation is done at elaboration time
during execution.
 Advantages:
 Space efficiency
 A large array in one subprogram can use the same
space as a large array in different subprograms.
 Example:
 Arrays declared in C & C++ function without the
static modifier are fixed stack-dynamic arrays.

02/02/2024 05:38 AM CP 111: Principles of Programming 26


Subscript Binding and Array Categories
 Stack-dynamic array
 is one in which the subscript ranges are dynamically
bound, and the storage allocation is dynamic “during
execution.”
 Once bound they remain fixed during the lifetime of
the variable.
 Advantages
 Flexibility-the size of the array is not known until
the array is about to be used.

02/02/2024 05:38 AM CP 111: Principles of Programming 27


Subscript Binding and Array Categories
 Stack-dynamic array
 The user inputs the number of desired elements for
array List.
 The elements are then dynamically allocated when
execution reaches the declare block.
 When execution reaches the end of the block, the
array is deallocated.
Get (List_Len);
declare
List : array (1..List_Len) of Integer;
begin
...
end;

02/02/2024 05:38 AM CP 111: Principles of Programming 28


Subscript Binding and Array Categories
 Fixed heap-dynamic array
 is one in which the subscript ranges are dynamically
bound.
 Storage allocation is dynamic, but they are both fixed
after storage is allocated.
 The bindings are done when the user program
requests them, rather than at elaboration time and the
storage is allocated on the heap, rather than the stack.

02/02/2024 05:38 AM CP 111: Principles of Programming 29


Subscript Binding and Array Categories
 Fixed heap-dynamic array
 Example:
C & C++ also provide fixed heap-dynamic arrays,
the function malloc and free are used in C.
 The operations new and delete are used in C++.
 In Java all arrays are fixed heap dynamic arrays,
once created, they keep the same subscript ranges
and storage.

02/02/2024 05:38 AM CP 111: Principles of Programming 30


Subscript Binding and Array Categories
 Heap-dynamic array is characterized by:
 The subscript ranges are dynamically bound.
 The storage allocation is dynamic, and can change
any number of times during the array’s lifetime.
 Advantages:
 Flexibility-arrays can grow and shrink during
program execution as the need for space changes.
 Example:
 C# provides heap-dynamic arrays using an array
class ArrayList.

02/02/2024 05:38 AM CP 111: Principles of Programming 31


Subscript Binding and Array Categories
 Heap-dynamic array is characterized by:
 Perl and JavaScript also support heap-dynamic arrays.
 For example, in Perl we could create an array of
five numbers with
@list = {1, 2, 4, 7, 10);
 Later, the array could be lengthened with the push
function, as in
push(@list, 13, 17);
 Now the array’s value is (1, 2, 4, 7, 10, 13, 17).

02/02/2024 05:38 AM CP 111: Principles of Programming 32


Array Initialization
 Some language allow initialization at the time of
storage allocation
 C, C++, Java, C# example
int list [] = {4, 5, 7, 83}
 Character strings in C and C++:
char name [] = “freddie”;
 Arrays of strings in C and C++
char *names [] = {“Bob”, “Jake”, “Joe”};
 Java initialization of String objects
String[] names = {“Bob”, “Jake”, “Joe”};

02/02/2024 05:38 AM CP 111: Principles of Programming 33


Array Initialization
 Some language allow initialization at the time of
storage allocation
 C, C++, Java, C# example
int list [] = {4, 5, 7, 83}
 Character strings in C and C++:
char name [] = “freddie”;
 Arrays of strings in C and C++
char *names [] = {“Bob”, “Jake”, “Joe”};
 Java initialization of String objects
String[] names = {“Bob”, “Jake”, “Joe”};

02/02/2024 05:38 AM CP 111: Principles of Programming 34


Record Types
 A record is a possibly heterogeneous aggregate of
data elements in which the individual elements are
identified by names.
 Arrays stores multiple values of the same type.
 A record is ideal for storing multiple related values of
different types.

02/02/2024 05:38 AM CP 111: Principles of Programming 35


Record Types
 Design issues for records:
 What is the syntactic form of references to the field?
 Are elliptical references allowed?
 Elliptical references to something are indirect
rather than clear.
 The field is named, but any or all of the enclosing
record names can be omitted, as long as the
resulting reference is unambiguous in the
referencing environment.

02/02/2024 05:38 AM CP 111: Principles of Programming 36


Record Types
 Support of Records by different languages:
 In C, C++, and C#, records are supported with the
struct data type.
 In C++, structures are a minor variation on classes.
 COBOL uses level numbers to show nested records;
others use recursive definition.

02/02/2024 05:38 AM CP 111: Principles of Programming 37


Record Types
 Examples of record definition in different languages:
 Definition of records in COBOL
01 EMP-REC.
02 EMP-NAME.
05 FIRST PIC X(20).
05 MID PIC X(10).
05 LAST PIC X(20).
02 HOURLY-RATE PIC 99V99.

02/02/2024 05:38 AM CP 111: Principles of Programming 38


Record Types
 Examples of record definition in different languages:
 Definition of records in Ada: Record structures are
indicated in an orthogonal way
type Emp_Rec_Type is record
First: String (1..20);
Mid: String (1..10);
Last: String (1..20);
Hourly_Rate: Float;
end record;
Emp_Rec: Emp_Rec_Type;

02/02/2024 05:38 AM CP 111: Principles of Programming 39


Reference to Records
 Most language use dot notation
 RecordName.FieldName, example Emp_Rec.Name
 Fully qualified references must include all record
names.
 Elliptical references allow leaving out record names
as long as the reference is unambiguous, for example
in COBOL
 FIRST, FIRST OF EMP-NAME, and FIRST of EMP-
REC are elliptical references to the employee’s first
name

02/02/2024 05:38 AM CP 111: Principles of Programming 40


Operations on Records
 Assignment is very common if the types are
identical
 Example: StudentRec.RegistrationNumber=“T22-03-
00111”
 Ada allows record comparison
 Ada records can be initialized with aggregate literals
 COBOL provides MOVE CORRESPONDING
 Copies a field of the source record to the
corresponding field in the target record

02/02/2024 05:38 AM CP 111: Principles of Programming 41


Evaluation of Records vs Arrays
 Records are used when collection of data values is
heterogeneous.
 Access to array elements is much slower than access
to record fields, because subscripts are dynamic
(field names are static).
 Dynamic subscripts could be used with record field
access, but it would disallow type checking and it
would be much slower.

02/02/2024 05:38 AM CP 111: Principles of Programming 42


Union Data Type
 A union is a type whose variables are allowed to
store different type values at different times during
execution.
 Design issues
 Should type checking be required?
 Should unions be embedded in records?

02/02/2024 05:38 AM CP 111: Principles of Programming 43


Union Types
 Free Union
 Provide union constructs in which there is no
language support for type checking.
 Supported by Fortran, C, and C++.
 Discriminant Union
 Require that each union include a type indicator.
 Supported by Ada

02/02/2024 05:38 AM CP 111: Principles of Programming 44


Ada Union Type
type Shape is (Circle, Triangle, Rectangle);
type Colors is (Red, Green, Blue);
type Figure (Form: Shape) is record
Filled: Boolean;
Color: Colors;
case Form is
when Circle => Diameter: Float;
when Triangle => Leftside, Rightside: Integer;
Angle: Float;
when Rectangle => Side1, Side2: Integer;
end case;
end record;
02/02/2024 05:38 AM CP 111: Principles of Programming 45
Evaluation of Unions
 Free unions are unsafe
 Do not allow type checking
 Java and C# do not support unions
 Reflective of growing concerns for safety in
programming language
 Ada’s descriminated unions are safe

02/02/2024 05:38 AM CP 111: Principles of Programming 46


Pointer Types
 A pointer type variable has a range of values that
consists of memory addresses and a special value,
nil
 Provide the power of indirect addressing.
 Provide a way to manage dynamic memory.
 A pointer can be used to access a location in the area
where storage is dynamically created (usually called a
heap).

02/02/2024 05:38 AM CP 111: Principles of Programming 47


Pointer Types
 Design issues:
 What are the scope of and lifetime of a pointer
variable?
 What is the lifetime of a heap-dynamic variable?
 Are pointers restricted as to the type of value to which
they can point?
 Are pointers used for dynamic storage management,
indirect addressing, or both?
 Should the language support pointer types, reference
types, or both?

02/02/2024 05:38 AM CP 111: Principles of Programming 48


Operations on Pointers
 A pointer type usually includes two fundamental
pointer operations, assignment and dereferencing.
 Assignment sets a pointer variable’s value to some
useful address.
 Done on pointers of the same type, otherwise, type
casting will be required.

 Example in C++:
int a = 5;
int *b = &a;
02/02/2024 05:38 AM CP 111: Principles of Programming 49
Operations on Pointers
 A pointer type usually includes two fundamental
pointer operations, assignment and dereferencing.
 Dereferencing is used to access or manipulate data
contained in memory location pointed to by a
pointer.
 In C++, dereferencing is explicitly specified with
the (*) as a prefix unary operation.
 Example: If ptr is a pointer variable with the value
7080, and the cell whose address is 7080 has the
value 206, then the statement j = *ptr sets j to 206

02/02/2024 05:38 AM CP 111: Principles of Programming 50


Problems with Pointers
 Dangling Pointers: a pointer points to a heap-
dynamic variable that has been deallocated.
 Steps to create a dangling pointer
i. Pointer p1 is set to point at a new heap-dynamic
variable.
ii. Set a second pointer p2 to the value of the first
pointer p1.
iii. The heap-dynamic variable pointed to by p1 is
explicitly deallocated (setting p1 to nil), but p2 is
not changed by the operation.
 P2 is now a dangling pointer.

02/02/2024 05:38 AM CP 111: Principles of Programming 51


Problems with Pointers
 Dangling pointers are dangerous because of the
following reasons:
 The location being pointed to may have been
allocated to some new heap-dynamic variable.
 If the new variable is not the same type as the old
one, type checks of uses of the dangling pointer
are invalid.
 Even if the new one is the same type, its new value
will bear no relationship to the old pointer’s
dereferenced value.

02/02/2024 05:38 AM CP 111: Principles of Programming 52


Problems with Pointers
 Dangling pointers are dangerous because of the
following reasons:
 If the dangling pointer is used to change the heap-
dynamic variable, the value of the heap-dynamic
variable will be destroyed.
 It is possible that the location now is being
temporarily used by the storage management system
 Possibly as a pointer in a chain of available blocks
of storage
 Thereby allowing a change to the location to cause
the storage manager to fail.

02/02/2024 05:38 AM CP 111: Principles of Programming 53


Problems with Pointers
 Lost Heap Dynamic Variables (wasteful)
 It is variable that is no longer referenced by any
program pointer “no longer accessible by the user
program.”
 Such variables are often called garbage because they
are not useful for their original purpose.
 Also they can’t be reallocated for some new use by
the program.
 The process of losing heap-dynamic variables is
called memory leakage.

02/02/2024 05:38 AM CP 111: Principles of Programming 54


Problems with Pointers
 Creating Lost Heap-Dynamic Variables:
 Pointer p1 is set to point to a newly created heap-
dynamic variable.
 p1 is later set to point to another newly created heap-
dynamic variable.
 The first heap-dynamic variable is now inaccessible,
or lost.

02/02/2024 05:38 AM CP 111: Principles of Programming 55


Pointers in Ada
 Some dangling pointers are disallowed because
dynamic objects can be automatically deallocated at
the end of pointer's type scope.
 The lost heap-dynamic variable problem is not
eliminated by Ada (possible with
UNCHECKED_DEALLOCATION)

02/02/2024 05:38 AM CP 111: Principles of Programming 56


Pointers in C/C++
 Extremely flexible but must be used with care
 Unlike the pointers of Ada, which can only point into
the heap, C and C++ pointers can point at virtually
any variable anywhere in memory.
 Used for dynamic storage management and
addressing
 Pointer arithmetic is possible
 Makes their pointers more interesting than those of
the other programming languages.
 Explicit dereferencing and address-of operators

02/02/2024 05:38 AM CP 111: Principles of Programming 57


Reference Types
 C++ includes a special kind of pointer type called a
reference type that is used primarily for formal
parameters.
 A formal parameter is a variable that you specify
when you determine the subroutine or function.
 Provides the advantages of both pass-by-reference
and pass-by-value.

02/02/2024 05:38 AM CP 111: Principles of Programming 58


Reference Types
 Java extends C++’s reference variables and allows
them to replace pointers entirely
 References are references to objects, rather than being
addresses.
 Since Java class instances are implicitly deallocated
(there is no explicit deallocation operator), there
cannot be a dangling reference.
 C# includes both the references of Java and the
pointers of C++.
 However, the use of pointers is strongly discouraged.
 In fact, any method that uses pointers must include the
unsafe modifier.
02/02/2024 05:38 AM CP 111: Principles of Programming 59

You might also like