You are on page 1of 36

Strings, Records and Arrays

String
direct representation: Pascal string var s:string[11];

begin s:='now is the';


Symbol table

id type space offset s string 12 100

layout

Indirect Representation - a Java string

id type length offset S String 4 100

{String S; S=now is the;

Advantages of direct representation


1. Speed: access is faster since there is one less level of indirection. 2. No memory leaks, or garbage collector overhead. 3. Time performance is deterministic.

Advantages of indirect representation


1. Compile time simplicity - it is easier to work out usage of address space on the stack. 2. Flexibility - can handle strings and arrays of dynamically determined size. 3. Generality - all objects other than base types are represented by their addresses.

Records
A record it typically represented as a contiguous area of memory with the addresses of the elds of the record going rising as one moves through the elds.

record x:integer; b:byte; s:string[10] end;

records continued
This would 3 2 1 0 +-------+ | s | +-------+ | s | +---+-+-+ | s ||b| +---+-+-+ | x | +-------+ be 16 bytes organised as offset in word Address 12 8 4 0 note indicates length byte of string s

The nature of vector types


Languages can be categorized by the degree of dynamism associated with their vector types and the implications this has for how vector operations are done. Continuum Most dynamic -> least dynamic Lisp -> APL -> Algols -> Fortran

The nature of vector types

APL Algol Fortran like like like A+ Delphi C J Algol60 Pascal Scheme S-algol Ada Java ML Hi Haskell Note that at level of abstraction I treat lists, arrays and vectors as syntactic sugar over the basic concept of a sequence of values.

lisp like

Dynamism
Dynamic rank Fixed rank mutable array size Fixed rank array size xed at point of declaration - may depend on function parameters Statically known array size

Bounds
Are they checked Do they start at zero

Lisp Like
A list is a sequence of values which may be either atoms or lists. This allows lists of dynamically dened rank, and even of variable rank. (( 2, 3, 7), 6) The rst element of the list is a list of 3 atoms and the second element is an atom. Thus the notion of rank is not dened in this case.

APL Like
All arrays have a well dened rank ( number of dimensions ), and a well dened shape ( number of elements in a dimension) , but these can vary for a given array variable in the course of computation. APL: x n x gets the sequence of integers from 1 to n S-Algol : let x:= vector 1::n of 1 x becomes the array of n storage locations, all initialised to 1 Java: x= new int[n]; x becomes the array of n storage locations initialised to 0

APL Like
Note the difference in degrees of initialisation supported.

Algol 60 like
On entry to a block the array size is specied by values that may be computed at run time but once the variable has been created its size is un-varying.

read(x); begin integer [1:x] a; for i:=1 to x step 1 do read(a[i]); ....


a has a xed size during this block and is discarded on leaving the block.

Fortran Like :
In this case the compiler knows as soon as it has parsed the declaration exactly how much store will be used. The size of the array is xed. Additional issue here is whether the bounds have to start at a xed number such as 0 or 1 or can be dened by the programmer: C: int x[6]; reserve 6 store locations numbered 0 to 5 Pascal : var sales: array[1996..2004] of integer; reserve 8 store location numbered 1996 to 2004 The latter form has some advantages from the standpoint of intelligibility of code.

Representations
Lisp Like

((2,3,5),7)

Tags
Words can be tagged using the top bit of each word to indicate a pointer. This allows arbitrary list structures to be disambiguaged but this has the effect of reducing arithmetic precision. Assume you have a 32 bit word. Then the normal sign bit ( bit 31) is reserved as a tag. You only have the normal range of +ve numbers available as integers 0..231 1 We have to map this into a range of -ve and +ve numbers. Do this with the operations: getint(t)=t + 230 putint(i,t) t i 230

Tags
Some Lisp Machines have provided 40 bit words with 8 bit tags. Tags distinguished different types.

APL type arrays


these can be reshaped as the following A+ example shows

a:= 6 a 0 1 2 3 4 5 a 6 b:= 2 3 a b 0 1 2 3 4 5 b

APL type arrays


2 3
Here is the shape and reshape operator. is the operator that generates a sequence of integers.

Note that each variable is stored as a data vector and a shape vector which says how to interpret the data vector.

dynamic stack arrays


In this case the array has both upper and lower bounds supplied when it is created:

begin integer a,b,c,d; read(a,b,c,d); begin integer array x[a:b.c:d]; .....

dynamic stack arrays

Algol-Like arrays
Note that the array is created on the stack, the stack pointer being moved down when the array declaration is encountered.

dynamic stack arrays


The array descriptor contains a pointer to the start of the array. 1. initialise the bounds in the descriptor 2. compute the space required s = (1+upb1lwb1) (1 + upb2 lwb2) 3. subtract s from the stack pointer 4. set the pointer eld of the descriptor to the stack pointer.

Fortran style arrays


C arrays int a[2][3]; Compiler statically reserves 6 words of memory for the array. Allocated typically on the stack, but in data segment if they are declared as static.

Static Arrays
Denition: An array is static if its bounds are known at compile time. Examples

var a: array[ 1..3] of integer; b: array[ 0..7] of byte;


But also less obviously

const maxlen=8; var c: array[ 0.. maxlen-1] of char;


since maxlen-1 can be evaluated at compile time

A simple 0 based byte array


b: array[ 0..7] of byte;
relative word address 4 b[7] b[6] b[5] b[4] 0 b[3] b[2] b[1] b[0] The variable b represents an address in memory, call this @(b), the elements of b are obtained at the relative addresses shown above. Thus the term b[2] refers to mem[@(b)+4]

Use of the identier table


ID table identier base addr bytes lwb esize type a 0080 12 b 008c 8 c 0094 8 means translates to 1 0 0 4 1 1 int int uint

b[2]:=7 mem[@(b)+2] :=7 mem[008c+2]:=7 mem[008e]:=7 mov byte [008e],7 C606 8E00 07

assembler code machine code

var a: array[ 1..3] of integer;


relative address description 8 a[3] 4 a[2] 0 a[1] The variable a represents an address in memory, call this @(a)=0080, the elements of a are obtained at the relative addresses shown above. Thus the term a[2] refers to mem[@(a)+4] mem[0080+4] mem[0084]

Access to an arbitrary element of a


a[b[2]]:=6 The address of the element of a cannot be determined at compile time since it depends on the run time values in the array b. We know b[2] is in mem[008e] so

a[b[2]]:=6 a[mem[008e]]:=6 mem[@(a)+(mem[008e]-lwb(a))*esize(a)]:=6 mem[0080 +( mem[008e]- 1)*4]:=6 edi:=mem[008e] movzx edi,byte[0x008e] ; mem[0080+edi*4]:=6mov dword[0x0080+edi*4],6 ;

Access to an arbitrary element of a

movzx edi, byte [0x008e] 660FB63E 8E00 mov dword[0x0080+edi*4],66667C704BD 80000000 06000

General rule:
for a 1 dimensional static array translate according to x[i] mem[@x + (i lwb(x)) esize(x)] where i is an arbitrary expression

2 D array
var e:array[1..4,3..6] of integer; General rule f [i, j] mem[@f + ((i lwb1(f )) rs(f ) + j lwb2(f )) esize(f )] rs(f ) = upb(f )lwb(f )+1 row sizeAlgol-Like arrays assume @p = 0040, @q=0044, @e= 00c0 e[p,q]:=5 mem[@e+((p-1)*4+q-3)*4]:=5

mov edi,[0040] sub edi,1 imul edi,edi,4 add edi,[0044]

; ; ; ;

get p -lwb * row size add q

2 D array
sub edi,3 ; -lwb mov dword [00c0+edi*4],5

You might also like