You are on page 1of 76

6.1 : Introduction 6.

2 : Primitive Data Type

Definition Collection of data values and a set of predefined operations. Uses of Type System of programming language
Error Detection Program Modularization Documentation

Descriptor Collection of the attributes of a variable. Static attribute, descriptor required at compile time only. Dynamic attribute, descriptor must be maintained during execution because it is used by the run-time system.

In all case, descriptor are used for type checking and to build the code for the allocation and deallocation operations.

Definition Data type that are not defined in terms of other types. Some of primitive type are merely reflections of the hardware.

Primitive Data type can be divided into 3 categories. 1. Numeric Type 2. Character Type 3. Boolean Type

Numeric Type play a central role among the type supported by any language. Numeric Type can be classified into 4 categories. 1. Integer 2. Floating 3. Complex 4. Decimal

Integer is the most primitive data type. Many computers now support several size of integers. For example, most popular integer size are byte, short, int, and long. These sizes of integers are supported by most programming languages. Signed integer is a string of bits, with one of the bits (leftmost) representing the sign. Contradictly, unsigned integer is string without sign bit and often used for binary data.

Model real numbers, but the representation are only approximations for many real values. Floating-point values are represented as fractions and exponents, a form that borrowed from scientific notation. Values that can be represented by floating-point is defined in terms of precision and range.

Some programming language support a complex data type such as Fortran and Python. Complex values are represented as ordered pairs of floating-point values. Language that support complex type include operations for arithmetic on complex values.

Decimal data type store a fixed number of decimal digits. Advantages: Able to precisely store decimal values, which cannot be done with floating-point. Disadvantages: Range of values is restricted because no exponents are allowed. Decimal type are stored very much like character strings, using binary codes decimal (BCD) for decimal digits and stored one digit / byte (4bit)

Boolean are the simplest of all types. Their range of values has only 2 elements (true/false) Introduced in ALGOL 60 All nonzero value considered true C99 and C++ allowed numeric expression as it is boolean but not for Java and C#. Used to represent switches or flag in program. Could be represented by a single bit, but because a single bit of memory cannot be accessed efficiently on many machines, they stored in a byte.

Character data are stored in computers as numeric codings. Most used coding was 8-bit ASCII code which uses values 0 to 127 to code 128 different character. Another coding was 8-bit ISO 8859-1character code that allows 256 different characters. Because of business growing, need for computer also increase. Unicode Consortium published UCS-2, a 16-bit character set. First 128 characters of unicode are identical to those of ASCII. Java was the first widely used language to use the Unicode character set.

Character string types is which consist of sequences of characters. In generally, string is array of characters of particular length that represent sequences of data values. For example , string mystring = "This is a string"; cout << mystring; In C and C++,string characters can be manipulate to do another operations. For example
char *str = apples; characters string are represented by char pointer Strcmp <- use to compare two given string

Java and C# supports string as primitive type through its standard class library for example by String class. Fortran 95 treats strings as a primitive type and provide assignment.

Static Length String


The length can be static and set when string is created. Fixed length strings are automatically filled with spaces to pad them to their fixed-length. When a fixed-length string is declared, it will be filled with Null characters until it is used.

Static String Length Address Limited Dynamic Length String


This option is allow to declare the length string and fixed maximum set by variables definition. Can store any number of characters between zero and maximum and at the end will be marked as Null. Limited dynamic string require run-time descriptor to store both fixed max and current length Example :
Class LD_String { //limited dynamic string class private: char pysical_buffer [1000]; int logical_size_buff = 100; int inc_size = 100; int lb_index; // index logical buffer int str_len; // current string length

Limited dynamic string Maximum length Current Length Address

Dynamic Length Strings


This option is allowed to varying length with no maximum. This option required to allocate and deallocate Dynamic length string is a complex storage which it is bound, grow and shrink dynamically. 2 approaches to support dynamic allocation and deallocation.
Linked-list when string grows, new cells can come from anywhere in the heap Extra storage occupied by the links in the list representation Store complete strings in adjacent storage cells A new area of memory is found that can store complete new string and the old part is moved to this area. Old string are deallocated.

Example : #include <string.h> cons int MSIZE = 24; // string grows in MSIZE (bytes)

Enumeration are named constants that provide defining a group of collection. For example :
Enum days { Mon, Tue, Wed, Thu, Fri, Sat, Sun };

C and Pascal is the first widely language that use enumeration by set internal values for enumeration constant.
For example, enum colors{red,blue,green,yellow,black}; 0, 1 , 2 , 3 , 4

Different with other, in Ada enumeration type allowed more than one declaration in same referencing which it call overloaded literals. In Java, enumeration are defined in class Enum that include data field,constructor and methods. In C++, numeric values can be assigned to enumeration type. For example:
enum colors { red=1, blue=100, green=1000};

Subrange type is contiguous subsequence of type. In subrange there are no design issues that apecific to subrange types. In Ada, subranges are included in the category of types called subtypes. For example:
Type days is (Mon, Tue, Wed, Thu, Fri, Sat, Sun); subtype Weekdays is days range Mon.Fri; subtype Index is Integer range 1.100;

Subrange can increase readability by making clear to readers that variable of subtypes can store in certain range of values. Assigning values outsides of range can be detect as error.

6.5 Array Type


Array same group of data elements individual element is identified by its position in the group (same type) subscript expression are used to make references to each element C, C++, Java, Ada, C# o element must be a same type o pointers are restricted to point a single type JavaScript, Python, Ruby o typeless references to object or data value Array and Indices Syntactic mechanism o Aggregate name o Dynamic selector consists one or more subscripts or indices vice versa to static selector Static selector => all subscript are constants Selection finite mapping

Problem with using parentheses to enclose subscript expression o often used to enclose parameters in subprogram calls

reduced readability In Ada o parentheses is used to enclose subscripts Fortran and Ada o use brackets to delimit array indices
o

Two distinct types in an array type o Element type o Type of Subscripts subrange of integers Ada -> allows Boolean, character and enumeration Subscripts range error are common -> need range checking -> reliability Perl subscripting @ - for beginning of arrays name $ - for beginning of scalars name and reference to array element arrays @subjects, $list[1] can have reference in negative subscript 0..4, $list[-2] Subscript Bindings and Array Categories Binding of the subscript type to an array variable is usually static Subscript ranges sometimes dynamically bound Lower bound of subscript range is implicit o C-based language is fixed at 0 o Fortran 95 is default by 1 but can be set to any integer literal o Other Languages are specified by programmer

Five categories of arrays : o Static Array subscript ranges are statically bound storage allocation is static Advantage : no dynamic allocation or deallocation is required(efficiency) Disadvantage : storage for the array is bound for the entire execution time o Fixed stack-dynamic array subscript ranges are statically bound allocation is done at declaration elaboration time during execution Advantage : spam efficiency (compare with static array) flexibility Disadvantage : required allocation and deallocation time

Stack-Dynamic Array o both subscript ranges and storage allocation are dynamically bound at elaboration time. o Advantage : flexibility Fixed Heap-Dynamic Array o similar to fixed stack-dynamic array but both subscript ranges and storage binding are done when the user program requests them during execution and the storage is allocated from the heap o Advantage : array's size fits the problem(flexibility) o Disadvantage : allocation time from the heap is longer than in stack

Heap-Dynamic Array o binding of the subscript ranges and storage allocation is dynamic and can be change any number of times during the array's lifetime o Advantage : Array can grow and shrink during execution o Disadvantage : allocation and deallocation take longer and may happen many times o in C#

in Java - similar to C # ArrayList but subscripting is not supported

in Perl o push and unshift o (), empty list -> no element o length of array = largest subscript + 1 in Javascript o same like Perl o negative subscript is not supported o subscript value need not be contiguous

Array initialization
Fortran 95 o assigning it an array aggregate in its declaration o single dimensional array -> list of literals delimited by parentheses and slashes. o example : Integer, Dimension (2) it List = (/1, 2/) C declaration : o int arrayList = {1,3,7,9,5,6}; o compiler sets the length of the array C and C++ o initialized character string with arrays of char char subject [] = "Programming"; array subject -> 12 elements o initialized with a string literal char *subjects [] = ["Formal Method", "Data Structure"]; array of pointers to characters

In Java o similar syntax is used o to define and initialized an array of reference to string objects o Eg: String[] subjects = ("Formal Method", "Data Structure"); In Ada o listing in the order it is to be stored List : array (1..5) of Integer : = (1, 2, 3, 4, 5)? o directly assigning to an index position operator(=>) arrow Bunch : array (1..5) of Integer := (1 => 23, 4 => 20, others => 0); In Python o Example: [x * x for x in range(12) if x % 3 == 0]

Array Operations
operates on an array as a unit Most common : o Assignment catenation o comparison for equality and inequality o slices In Ada : o allows array assignment and catenation by ampersand (&) In Python o arrays are called lists o provide array assignment(reference change) o has operations for array catenation (+) and element membership (in) o includes tuples use parentheses to enclose their literals immutable

In Ruby o array are references to objects o '==' is used between to arrays true only if 2 arrays have the same length and corresponding elements are equal o can be catenated with an Array method Fortran 95 o includes elemental o include intrinsic, library, functions for matrix multiplications, matrix transpose and vector dot product APL o the 4 basic operations are defined for vectors and matrices A+B o special operators that take other operands as operands A+.xB

Rectangular and Jagged Arrays Rectangular Array o multidimensional arrays o all rows have the same number of element, all columns have the same number of element o model rectangular table o not supported in C, C++ or Java o supported in Fortran, Ada and C# o subscript expressions in references to elements are placed in a single pair of brackets. o ArrayList[2,4] Jagged Array o the lengths of the rows need not be the same o supported in C, C++ and Java o use separate brackets for each dimension o ArrayList[2][4]

Slices substructure of array mechanism for referencing part of an array as a unit Example : o Python declaration

in Fortran 95 o if mat is a matrix,specific columns can be specified o Example: mat[; , 2] in Perl o slices in two forms list of specific subscripts range of subscript Example : @list[1..5] = @list2[1,2,3,4,5] in Ruby o supports slices with the slice method of its Array object o 3 forms of parameter single integer expression parameter two integer expression parameter range parameter in Ada o highly restricted slices are allowed o Example : List have index range of List(1..100) List(2..4) is a slice of List consist 3 elements indexed from 2 to 4

Implementation of Array Types


require more compile time

generalization of this access function for an arbitrary lower bound


o

Compile-time Descriptor o Descriptor includes information to construct access function no descriptor needed run-time checking of index ranges is NOT done attributes are all STATIC

True multidimensional array o complex to implement o 2 common ways of mapping multidimensional array row major order column major order

access function base address mapping a set of index values to the address in memory of the element specified by the index values location(a[i,j] = address of a[1,1]+ ((((num of rows above the ith row)* (size of row))+(num of element left of the jth column))*element_size)

6.6 Associative Array


Associative array o unordered collection of data elements o indexed by an equal number of values (keys) o user defined keys must be stored in the structure o use of Perl's design o supported by Python, Ruby, Lua, Java, C++, C# Non-associative array o indices not need to be stored

Structure and Operations in Perl o called hashes o namespace for hashes is distinct begin with '%' o hash elements key(string type) value(scalar type) o hashes in literal value %marks = ("abby" => 78, "bob => 80, "raymond" => 50, "haratio" => 100);

individual element values are referenced using notation key value - in braces hash name - replaced by a scalar variable name o hashes are not scalar but value part of element is scalar Example : note - scalar starts with $ sign $marks {"haratio") = 99; o new element is added using the same assignment statement form o remove element delete operator delete $marks("abby"); o empty hash @marks = (); o size of Perl dynamic exists operator returns true or false if(exists ?marks("robert"))... in Python o called dictionaries o values are all referenced to objects
o

in Ruby o similar to Python o keys can be any objects rather than strings PHP Arrays o normal arrays and associative array o allows indexed and hashes access to element o element of array created with simple numeric indices with string hash keys In Lua o Lua's table is an associative array o keys and values can be any type o table can use : traditional array or associative array brackets are used around the keys record keys are field name dot(.) notation for references to field

Implementing Associative Array


Perl's associative array o for fast lookups o relatively fast reorganization when array growth requires it PHP's array o placed in memory through hash function o all elements are linked together o support iterative access to elements through the current and next function

6.7 Record
Name
(string)

College Student

Student num. (integer) Characteristic of Record :Different size Adjacent memory location Identifier In C, C#, C++ Struct data type In Python, Ruby Hashes

Grade point average


(float)

What is the syntactic form of reference to fields? Are elliptical reference allowed?

COBOL
Level numbers

01 EMPLOYEE-RECORD. Format of field 02 EMPLOYEE-NAME. storage location 05 FIRST PICTURE IS X(20) . 05 MIDDLE PICTURE IS X(10) . 05 LAST PICTURE IS X(20) . 20 02 HOURLY-RATE PICTURE IS 99V99 . Alphanumeri
c number Field

4 Decimal digit with decimal point in the middle

ADA
type Employee_Name_Type is record First : String (1..20); Middle : String (1..10); Last : String (1..20); end record; type Employee_Record_Type is record Employee_Name : Employee_Name_Type; Hourly_Rate : Float; end record; Employee_Record : Employee_Record_Type;

COBOL field reference

MIDDLE OF EMPLOYEE-NAME OF EMPLOYEERECORD ADA field reference (C, C++) Employee_Record.Employee_Name.Middle


DOT notation Fully Qualified Reference

COBOL allows Elliptical Reference

MIDDLE MIDDLE OF EMPLOYEE-NAME MIDDLE OF EMPLOYEERECORD

Assignment (common) -Both side must identical Comparison (ADA) -Equality and inequality Move(COBOL) -MOVE CORRESPONDING statement
MOVE CORRESPONDING INPUT RECORD TO OUTPUT RECORD
01 INPUT-RECORD. 02 NAME. 05 LAST PICTURE IS X(20). 05 MIDDLE PICTURE IS X(15). 05 FIRST PICTURE IS X(20). 02 EMPLOYEE-NUMBER PICTURE IS 9(10). 02 HOURS-WORKED PICTURE IS 01 OUTPUT-RECORD. 02 NAME. 05 FIRST PICTURE IS X(20). 05 MIDDLE PICTURE IS X(15). 05 LAST PICTURE IS X(20). 02 EMPLOYEE-NUMBER PICTURE IS 9(10). 02 GROSS-PAY PICTURE IS

Array Same type & same processed way Indices Record Heterogeneous & different processed way Field Name address

6.8 Union
- Same variable different type value ,different time -Safe(ADA) -Unsafe (Fortran, C, C++) -Not supported (Java, C#)

Design issues
Should type checking be required? Note that any such type checking must be dynamic. Should unions be embedded in records?

Discriminated Union (ADA) tag / discriminant (type indicator) Free Union (Fortran, C, C++) complete freedom in type checking

Constraint variant variable -named constant -can be changed(assigning the entire record) -disallow inconsistent records

type Shape is (Circle, Triangle, Rectangle); type Colors is (Red, Green, Blue); type Figure (Form : Shape) is record Filled : Boolean; Color : Colors; case Form is when Circle => Diameter : Float; when Triangle => Left_Side : Integer; Right_Side : Integer; Angle : Float; when Rectangle => Side_1 : Integer; Side_2 : Integer; end case; end record;

Figure variable declaration. Figure_1 : Figure; (unconstrained variant record) Figure_2 : Figure(Form => Triangle);

Figure_1 :=(Filled => True, Color => Blue, Form => Rectangle, Side_1 => 12, Side_2 => 3); Figure_2 is constrain to be a triangle and cannot be changed to another variant.

Same address for every possible variant Sufficient storage for largest variant is allocated Tag of the discriminated union is stored with the variant in recordlike structure Complete description of each variant must be stored (compile time)

ADA example
type Node (Tag : Boolean) is record case Tag is when True => Count : Integer; when False => Sum : Float; end case; Discriminated union BOOLEAN

Tag

Count Case Table True False Sum Float Integer

Name Type

Offset
Address

Name Type

6.9 Pointer and Reference Type A Pointer type variable has a range of values that
consist of memory address and a special, nil. Provide the power of indirect addressing. Provide a method to manage dynamic memory. A pointer can used to access a location in the area where storage is dynamically created(called a heap ).

What are the scope of and lifetime of a pointer variable? What is the lifetime of a heap-dynamic variable? Are pointers restricted as to the type of value to which they can point? Are pointers used for dynamic storage management, indirect addressing, or both? Should the language support pointer types, reference types, or both?

Two fundamental operations: assignment and dereferencing Assignment is used to set a pointer variables value to some useful address Dereferencing yields the value stored at the location represented by the pointers value Dereferencing can be explicit or implicit C++ uses an explicit operation via * j = *ptr sets j to the value located at ptr

206

An anonymous dynamic variable

ptr

7080

The assignment operation j = *ptr

Dangling pointers (dangerous)


A pointer points to a heap-dynamic variable that has been deallocated

Lost heap-dynamic variable


An allocated heap-dynamic variable that is no longer accessible to the user program (often called garbage)
Pointer p1 is set to point to a newly created heap-dynamic variable Pointer p1 is later set to point to another newly created heapdynamic variable The process of losing heap-dynamic variables is called memory leakage

Some dangling pointers are disallowed because dynamic objects can be automatically deallocated at the end of pointer's type scope The lost heap-dynamic variable problem is not eliminated by Ada (possible with UNCHECKED_DEALLOCATION)

Extremely flexible but must be used with care Pointers can point at any variable regardless of when or where it was allocated Used for dynamic storage management and addressing Pointer arithmetic is possible Explicit dereferencing and address-of operators Domain type need not be fixed (void *) void * can point to any type and can be type
checked (cannot be de-referenced)

float stuff[100]; float *p; p = stuff; *(p+5) is equivalent to stuff[5] and p[5] *(p+i) is equivalent to stuff[i] and p[i]

C++ includes a special kind of pointer type called a reference type that is used primarily for formal parameters
Advantages of both pass-by-reference and pass-by-value

Java extends C++s reference variables and allows them to replace pointers entirely
References are references to objects, rather than being addresses

C# includes both the references of Java and the pointers of C++

Dangling pointers and dangling objects are problems as is heap management Pointers are like goto's--they widen the range of cells that can be accessed by a variable Pointers or references are necessary for dynamic data structures--so we can't design a language without them

Large computers use single values Intel microprocessors use segment and offset

Tombstone: extra heap cell that is a pointer to the heapdynamic variable


The actual pointer variable points only at tombstones When heap-dynamic variable de-allocated, tombstone remains but set to nil Costly in time and space

. Locks-and-keys: Pointer values are represented as (key, address) pairs


Heap-dynamic variables are represented as variable plus cell for integer lock value When heap-dynamic variable allocated, lock value is created and placed in lock cell and key cell of pointer

A very complex run-time process Single-size cells vs. variable-size cells Two approaches to reclaim garbage
Reference counters (eager approach): reclamation is gradual Mark-sweep (lazy approach): reclamation occurs when the list of variable space becomes empty

Reference counters: maintain a counter in every cell that store the number of pointers currently pointing at the cell
Disadvantages: space required, execution time required, complications for cells connected circularly Advantage: it is intrinsically incremental, so significant delays in the application execution are avoided

The run-time system allocates storage cells as requested and disconnects pointers from cells as necessary; mark-sweep then begins
Every heap cell has an extra bit used by collection algorithm All cells initially set to garbage All pointers traced into heap, and reachable cells marked as not garbage All garbage cells returned to list of available cells Disadvantages: in its original form, it was done too infrequently. When done, it caused significant delays in application execution. Contemporary mark-sweep algorithms avoid this by doing it more oftencalled incremental marksweep

x 3

6 x x 8 x 10

5 4 x x x 9 x 11 12 x

Dashed lines Show the order of node_marking

Generalize the concept of operands and operators to include subprograms and assignments Type checking is the activity of ensuring that the operands of an operator are of compatible types A compatible type is one that is either legal for the operator, or is allowed under language rules to be implicitly converted, by compilergenerated code, to a legal type
This automatic conversion is called a coercion.

A type error is the application of an operator to an operand of an inappropriate type

If all type bindings are static, nearly all type checking can be static If type bindings are dynamic, type checking must be dynamic A programming language is strongly typed if type errors are always detected Advantage of strong typing: allows the detection of the misuses of variables that result in type errors

You might also like