You are on page 1of 19

Pointers and reference types

Pointer Type Definition:

 A pointer type is a data type in programming languages that allows variables to


store memory addresses as values. Along with valid memory addresses, pointer
variables can also hold a special value, often called nil or NULL, which signifies
that the pointer doesn't currently point to any valid memory location.

 nil is used to indicate that a pointer is not currently referencing any memory cell,
essentially pointing to nothing.

2. Uses of Pointers:

 Indirect Addressing: Pointers provide a mechanism for indirect addressing,


similar to what is commonly used in assembly language programming. This
means that instead of directly accessing the data stored at a memory address,
pointers can be used to indirectly access that data.

 Dynamic Storage Management: Pointers enable dynamic memory allocation,


allowing programs to allocate memory at runtime as needed. This dynamic
allocation typically occurs in an area of memory called the heap. Pointers can
then be used to access and manipulate data stored in this dynamically allocated
memory.

3. Heap-Dynamic Variables:

 Variables allocated from the heap dynamically are referred to as heap-


dynamic variables. These variables are often created without explicit identifiers
and are accessed through pointer variables. Because they lack identifiers, they are
sometimes called anonymous variables.

 Variables without names are called anonymous variable

Comparison with Other Types:

 Not Structured Types: Pointers are distinct from structured types like arrays and
records. While arrays and records organize data into structured formats,
pointers simply hold memory addresses.

 Different from Scalar Variables: Scalar variables store data directly, whereas
pointers store memory addresses that reference other variables or data.
Reference Types vs. Value Types:

 Pointers are classified as reference types because they reference or point to


other variables, rather than directly storing data themselves. In contrast, scalar
variables are examples of value types because they store data directly.

Enhancing Language Writability:

 Pointers contribute to the writability of programming languages by enabling more


flexible and efficient data structures and memory management techniques.
For instance, implementing dynamic structures like binary trees becomes
significantly easier with pointers.

Example with Fortran 77:

In Fortran 77, a language lacking support for pointers and dynamic memory allocation,
implementing dynamic data structures like a binary tree becomes challenging.

Manual Memory Management:

 In Fortran 77, memory allocation is typically static and determined at compile


time. This means that memory for variables must be allocated before the program
runs and cannot be dynamically adjusted during runtime.

 Implementing a binary tree requires nodes to be dynamically created and


linked together as the tree grows. Without dynamic memory allocation, the
programmer must preallocate memory for a fixed number of tree nodes,
potentially leading to inefficient memory usage or running out of memory if the
maximum number of nodes is exceeded.

Pool of Available Tree Nodes:

 Since dynamic memory allocation is not available, the programmer must maintain
a pool of available tree nodes beforehand. This pool would consist of preallocated
memory blocks that can be used to create tree nodes as needed.

 Managing this pool of nodes requires careful bookkeeping to keep track of which
nodes are currently in use and which ones are available for allocation.

Parallel Arrays:

 To represent the binary tree structure without pointers, programmers might resort
to using parallel arrays.
 One array might store the data associated with each node, while another array
could store indices or references indicating the parent-child relationships
between nodes.

 This approach is complex and error-prone, especially when performing operations


like insertion, deletion, or traversal on the binary tree.

Guessing Maximum Number of Nodes:

 Without dynamic memory allocation, the programmer must estimate the


maximum number of nodes the binary tree might contain. This requires predicting
the upper limit of the tree's size, which can be challenging and may lead to wasted
memory if the estimate is too high or runtime errors if it's too low.

 Moreover, the need to guess the maximum number of nodes limits the flexibility
and scalability of the program, as it cannot efficiently handle cases where the
actual number of nodes exceeds the estimated limit.

Awkward and Error-Prone:

 Overall, managing a dynamic data structure like a binary tree without pointers in
Fortran 77 is cumbersome, error-prone, and inefficient. The lack of dynamic
memory allocation and pointer support imposes significant limitations on the
programmer, making it challenging to implement and maintain complex data
structures.

Design Issues :The primary design issues particular to pointers are the following:

• What are the scope and lifetime of a pointer variable?

• What is the lifetime of a heap-dynamic variable (the value a pointer references)?

• Are pointers restricted as to the type of value to which they can point?

• Are pointers used for dynamic storage management, indirect addressing, or both?

• Should the language support pointer types, reference types, or both?


Pointer operations

Languages that provide a pointer type usually include two fundamental pointer operations:
assignment and dereferencing

Assignment Operation:

 The assignment operation sets a pointer variable's value to a memory


address.

 If pointers are used for managing dynamic storage, their initialization typically
occurs through memory allocation mechanisms, either by operators or built-in
subprograms.

 If pointers are used for indirect addressing to variables not dynamically


allocated, there must be explicit mechanisms to fetch the address of a
variable, which can then be assigned to the pointer variable.

Interpretation of Pointer Variables:

 A pointer variable in an expression can be interpreted in two ways:

 As a reference to the memory cell's contents to which it's bound, where


the content is typically an address.

 As a reference to the value in the memory cell pointed to by the pointer


variable's bound memory cell, which is known as dereferencing the
pointer.

 Dereferencing involves accessing the value stored at the memory address pointed
to by the pointer variable.

Explicit vs. Implicit Dereferencing:

 Dereferencing of pointers can be either explicit or implicit depending on the


programming language.

 In some languages like Fortran 95+, dereferencing is implicit, while in others, it


occurs only when explicitly specified.

 For example, in C++, dereferencing is explicitly specified using the asterisk (*) as
a prefix unary operator.
Example

In FORTRAN

program implicit_dereferencing

implicit none

integer :: arr(3)

integer, pointer :: ptr

arr = [1, 2, 3]

ptr => arr(1) ! ptr points to the first element of arr

print *, ptr ! This will print the value stored at the memory location ptr is pointing to

end program implicit_dereferencing

In this Fortran example, ptr implicitly points to the first element of the array arr. When we print
ptr, Fortran automatically dereferences the pointer and prints the value stored at that memory
location.

Example:

In C++

#include <iostream>

int main() {

int num = 10;

int *ptr = &num; // ptr stores the memory address of num

std::cout << "Value of num: " << num << std::endl;

std::cout << "Value of ptr: " << ptr << std::endl;

std::cout << "Dereferenced value of ptr: " << *ptr << std::endl; // Dereferencing ptr to get the
value stored at that memory location
return 0;

In this C++ example, ptr is explicitly dereferenced using the asterisk (*) operator to access the
value stored at the memory address it points to.

Pointer Operations with Records:

 When pointers point to records (structures in C/C++ or records/objects in other


languages), syntax for referencing fields in those records varies among languages.

 In C/C++, there are two common ways to reference fields in a record using a
pointer. One involves explicit dereferencing with (*p).age, while the other uses
the arrow operator ->, combining dereferencing and field reference as p->age.

 In Ada, pointers to records are implicitly dereferenced, allowing for a more


straightforward syntax such as p.age.

Heap Management and Allocation:

 Languages that support dynamic memory allocation using pointers must include
explicit allocation operations.

 Allocation is often specified through subprograms like malloc() in C or using


operators like new in object-oriented languages like C++.

 Languages like C++ typically require explicit deallocation of heap-allocated


memory using operators like delete to prevent memory leaks.
Consider the following example of dereferencing: If ptr is a pointer variable with the value 7080 and the
cell whose address is 7080 has the value 206, then the assignment

j = *ptr

sets j to 206.

Pointer Problems:

 Definition of Dangling Pointers:


 A dangling pointer, or dangling reference, is a pointer that contains the address of
a heap-dynamic variable that has been deallocated.

 When a pointer is left pointing to a memory location that has been deallocated, it
becomes a dangling pointer because it no longer points to a valid memory
location.

 Risks Associated with Dangling Pointers:


 Dangling pointers are dangerous for several reasons:

 They may point to a memory location that has been reallocated to a new
heap-dynamic variable. This can lead to invalid type checks or accessing
unrelated data.

 Even if the new dynamic variable is of the same type, its value has no
relationship to the old pointer's dereferenced value.

 Using a dangling pointer to modify the heap-dynamic variable can lead to


unintended behavior and potentially corrupt data.

 The memory location may be reused by the storage management system,


causing unexpected behavior or system failures.
 Creation of Dangling Pointers:
 Dangling pointers are often created when a pointer continues to hold the address
of a heap-dynamic variable after it has been deallocated.

 This can happen, for example, when one pointer is assigned the value of another
pointer, and then the original pointer is deallocated without updating the second
pointer.

Example in C++:

Example in C++ where a dangling pointer is created:

1. Two pointers arrayPtr1 and arrayPtr2 are declared.

2. arrayPtr2 is assigned a new heap-dynamic variable.

3. arrayPtr1 is assigned the value of arrayPtr2.

4. arrayPtr2 is deallocated using the delete operator, leaving arrayPtr1 as a


dangling pointer.

In C++, both arrayPtr1 and arrayPtr2 become dangling pointers after the
deallocation, as the delete operator does not change the value of the pointer itself.
#include <iostream>

int main() {

// Step 1: Declare two pointers

int* arrayPtr1;

int* arrayPtr2;

// Step 2: Assign a new heap-dynamic variable to arrayPtr2

arrayPtr2 = new int[5]; // Allocate memory for an array of 5 integers

// Step 3: Assign the value of arrayPtr2 to arrayPtr1

arrayPtr1 = arrayPtr2;

// Step 4: Deallocate memory pointed by arrayPtr2

delete[] arrayPtr2; // Deallocate the memory pointed by arrayPtr2

// At this point, arrayPtr1 becomes a dangling pointer because it still holds the address

// that was previously allocated to arrayPtr2, but that memory has been deallocated.

// Accessing or using arrayPtr1 now would result in undefined behavior.

// Attempt to use arrayPtr1 (dangling pointer)

// This could lead to undefined behavior such as segmentation fault or unexpected data.

std::cout << "Accessing using dangling pointer arrayPtr1: " << arrayPtr1[0] << std::endl;
return 0;

 We declare two pointers arrayPtr1 and arrayPtr2.


 arrayPtr2 is assigned memory dynamically allocated on the heap using the new operator
to create an array of 5 integers.
 The value of arrayPtr2 is then assigned to arrayPtr1.
 We deallocate the memory pointed to by arrayPtr2 using the delete[] operator.
 After deallocation, arrayPtr1 becomes a dangling pointer because it still holds the
address that was previously allocated to arrayPtr2, but that memory has been
deallocated.
 Attempting to access or use arrayPtr1 after deallocation would lead to undefined
behavior, as shown in the example by trying to access arrayPtr1[0].

Avoiding Dangling Pointers:

To prevent dangling pointers, it's important to ensure that pointers are always
pointing to valid memory locations.

When deallocating memory, it's best practice to set the pointer to null or nil to
avoid leaving it as a dangling pointer.

Lost Heap-Dynamic Variables

Definition of Lost Heap-Dynamic Variables:

 A lost heap-dynamic variable is a dynamically allocated memory block that becomes


inaccessible to the user program.

 These variables are often referred to as "garbage" because they are no longer useful for
their original purpose, and they cannot be reallocated for any new use within the
program.

 In simpler terms, when memory is allocated dynamically on the heap, but the program
loses track of it without deallocating it, it leads to memory leakage.

Creation of Lost Heap-Dynamic Variables:

 Lost heap-dynamic variables are typically created when a pointer initially points to a
newly allocated heap-dynamic variable, but is later reassigned to point to a different
dynamically allocated variable.
 This sequence of operations results in the first dynamically allocated variable becoming
inaccessible or lost because the program no longer has a reference to it.

Memory Leakage:

 Memory leakage occurs when lost heap-dynamic variables accumulate in a program


over time, consuming memory resources without being properly deallocated.

 Memory leakage is a problem regardless of whether the programming language uses


implicit or explicit memory deallocation mechanisms.

Impact and Consequences:

 Lost heap-dynamic variables can lead to inefficient memory usage, which can degrade
the performance of the program.

 In long-running programs or systems, memory leaks can accumulate over time,


eventually leading to the exhaustion of available memory resources and causing the
program to crash or become unstable.

Dealing with Memory Leaks:

 Language designers and developers need to address the issue of memory leaks to
ensure the stability and efficiency of software.

 Various strategies can be employed to mitigate memory leaks, including:

 Implementing robust memory management techniques, such as automatic


garbage collection.

 Adopting best practices for memory allocation and deallocation, such as


ensuring that each dynamically allocated memory block is properly released
when it's no longer needed.

 Using tools and utilities for detecting and debugging memory leaks during the
development and testing phases of software.

Example:

#include <iostream>

int main() {

// Dynamically allocate memory for an integer array


int *arrayPtr = new int[100];

// Use the array for some operations

for (int i = 0; i < 100; ++i) {

arrayPtr[i] = i * 2;

// Assume some other part of the program reassigns arrayPtr to point to a different
memory block

arrayPtr = new int[200];

// At this point, the memory block allocated for the initial array is lost

// Perform some operations with the new array

// ...

// Properly deallocate memory to avoid memory leaks

delete[] arrayPtr;

return 0;

Dynamic Memory Allocation: The program dynamically allocates memory for an


integer array of size 100 using the new keyword. The pointer arrayPtr points to the
beginning of this memory block.

Usage of Memory Block: The program uses the allocated memory block to perform
some operations, such as initializing the array elements.
Reassignment of Pointer: At some point, another part of the program reassigns
arrayPtr to point to a different dynamically allocated memory block of size 200.
However, the program does not deallocate the memory block previously pointed to by
arrayPtr.

Lost Heap-Dynamic Variable: After the reassignment, the memory block originally
allocated for the integer array (size 100) becomes inaccessible to the program. Since the
program no longer has a reference to this memory block and hasn't deallocated it, it
becomes a lost heap-dynamic variable.

Memory Leakage: As a result of the lost heap-dynamic variable, memory leakage


occurs. The memory block originally allocated for the integer array (size 100) remains
allocated but cannot be accessed or reused by the program. This consumes memory
resources without being properly deallocated, leading to memory leakage.

Dealing with Memory Leaks: To address memory leaks, it's essential to ensure that
dynamically allocated memory blocks are properly deallocated when they're no longer
needed. In this example, the program should have deallocated the memory block
originally allocated for the integer array (size 100) before reassigning arrayPtr to point
to a new memory block. This can be done using the delete[] operator.

Pointers in Ada

Access Types in Ada:

 Ada's pointers are referred to as access types. These types allow programmers to
manipulate memory directly, providing a mechanism for dynamic memory
allocation.

Partial Alleviation of the Dangling-Pointer Problem:

 Ada's design offers a partial solution to the dangling-pointer problem.

 Heap-dynamic variables, allocated using access types, may be implicitly


deallocated at the end of the scope of their pointer type. This means that when the
pointer type goes out of scope, the associated heap-dynamic variables are
automatically deallocated.

 This design choice aims to reduce the need for explicit deallocation, which can
lead to dangling pointers if not properly managed.
Limitations in Implementation:

 However, the passage notes that few, if any, Ada compilers actually implement
this form of garbage collection.

 As a result, the advantage of implicit deallocation is mostly theoretical.


Programmers often still need to manually deallocate memory using explicit
deallocation mechanisms.

Reduced Likelihood of Dangling Pointers:

 Ada's design also reduces the likelihood of dangling pointers by restricting access
to heap-dynamic variables.

 Since heap-dynamic variables can only be accessed by variables of a single type,


when the scope of that type declaration ends, no pointers can be left pointing at
the dynamic variable.

 This helps diminish the problem, as improper explicit deallocation, which is a


major source of dangling pointers, is less likely to occur due to the restricted
access.

Unchecked_Deallocation:

 Despite Ada's efforts to mitigate dangling pointers, it still provides an explicit


deallocator called Unchecked_Deallocation.

 The name of this operation is meant to discourage its use or at least warn the user
of its potential problems.

 However, using Unchecked_Deallocation can indeed cause dangling pointers if


not used correctly.

Lost Heap-Dynamic Variable Problem:

 Despite Ada's design efforts, the issue of lost heap-dynamic variables is not fully
eliminated.

 Lost heap-dynamic variables can still occur when dynamically allocated memory
blocks become inaccessible or unreachable without being deallocated.

Pointers in C and C++

Flexibility and Care:


 Pointers in C and C++ provide flexibility similar to addresses in assembly
languages.

 However, they must be used with great care due to the potential for errors such as
dangling pointers and lost heap-dynamic variables.

Dereferencing and Address-of Operator:

 In C and C++, the asterisk (*) denotes the dereferencing operation, while the
ampersand (&) denotes the operator for producing the address of a variable.

 For example, ptr = &init; assigns the address of init to ptr, and count = *ptr;
dereferences ptr to assign the value at init to count.

Pointer Arithmetic:

 Pointer arithmetic is possible in C and C++, allowing manipulation of memory


addresses.

 For example, ptr + index is a legal expression that calculates a new memory
address based on the index and the size of the data type pointed to by ptr.

Array Manipulation:

 Pointer arithmetic is commonly used for array manipulation.

 In C and C++, arrays use zero as the lower bound of their subscript ranges, and
array names without subscripts refer to the address of the first element.

Pointers to Functions:

 Pointers in C and C++ can point to functions, allowing functions to be passed as


parameters to other functions.

Void Pointers:

 C and C++ include pointers of type void *, which can point to values of any type.

 void * pointers are effectively generic pointers and are commonly used in
functions that operate on memory, as they allow passing pointers of any type.

 Type checking is not an issue with void * pointers because these languages
disallow dereferencing them.

Reference Types
A reference type variable is similar to a pointer, with one important and fundamental difference: A
pointer refers to an address in memory, while a reference refers to an object or a value in memory.

Reference Types vs. Pointers:

 A reference type variable in programming languages like C++ and Java is similar
to a pointer but with a fundamental difference: a pointer refers to an address in
memory, whereas a reference refers to an object or value in memory directly.

 Unlike pointers, which involve arithmetic operations on memory addresses,


arithmetic operations are not applicable to references.

C++ Reference Types:

 C++ includes a special kind of reference type used primarily for formal
parameters in function definitions.

 C++ reference type variables are constant pointers that are implicitly
dereferenced, meaning they always refer to the object they are initialized with and
cannot be reassigned to refer to another object.

 They are specified by preceding the variable name with an ampersand (&).

In C++

int result = 0;

int &ref_result = result; // ref_result is a reference to result

ref_result = 100; // changing ref_result changes result

C++ Function Parameters:

 C++ reference types enable two-way communication between caller and called
functions.

 They allow passing values by reference, which means changes made to the
parameter inside the function affect the original value outside the function.

 This two-way communication is not possible with non-pointer primitive


parameter types in C++, which are passed by value.

Java Reference Types:

 In Java, reference variables can be assigned to refer to different class instances,


unlike C++ reference variables which are constant.
 Java class instances are always referenced by reference variables, which are not
constant and can be reassigned to refer to different objects.

In Java

String str1; // str1 is a reference to a String object

str1 = "Java literal string"; // assigning a string literal to str1

Java Memory Management:

 In Java, objects are implicitly deallocated by the garbage collector when they are
no longer referenced, eliminating the possibility of dangling references.

C# and Pointers:

 C# includes both references and pointers, but the use of pointers is discouraged,
and any subprogram using pointers must be marked as "unsafe".

 Objects pointed to by references are implicitly deallocated, but this is not true for
objects pointed to by pointers.

Object-Oriented Languages:

 In languages like Smalltalk, Python, Ruby, and Lua, all variables are references,
and they are always implicitly dereferenced.

 Direct access to the values of these variables is not possible; they can only be
accessed indirectly through references

In Python

x = 10 # x is a reference to the integer value 10

Implementation of Pointer and Reference Types

Representations of Pointers and References:

 Pointers and references are typically single values stored in memory cells, although early
microcomputers had two-part addresses (segment and offset).

 In systems with two-part addresses, pointers and references were represented as pairs of
16-bit cells.

Solutions to the Dangling-Pointer Problem:


 Tombstones:
 Introduced by Lomet in 1975, tombstones are special cells associated with heap-
dynamic variables. They act as pointers to the variables and are set to nil when the
variable is deallocated, preventing pointers from referencing deallocated
variables.

 However, tombstones are costly in terms of both time and space due to additional
indirection and lack of deallocation.

 Locks-and-Keys Approach:
 Used in UW-Pascal, this approach represents pointer values as ordered pairs (key,
address). Heap-dynamic variables are associated with a lock value, and every
access checks if the key value matches the lock value.

 Deallocation involves clearing the lock value, preventing unauthorized accesses.

 Implicit Deallocation:
 The best solution is implicit deallocation by the runtime system, removing the
responsibility from programmers. Systems like LISP, Java, and C# adopt this
approach for reference variables.

Heap Management:

 Heap management involves allocating and deallocating memory on the heap, a complex
runtime process.

 Two situations are discussed: allocation and deallocation in fixed-size units and in
variable-size segments.

 Mark-Sweep Process:
 A garbage collection algorithm where cells are marked as garbage, reachable cells
are marked as active, and unmarked cells are deallocated.

 Mark-sweep can be infrequent and time-consuming, but incremental and partial


mark-sweep processes mitigate these issues.

 Reference Counter Approach:


 Tracks the number of references to each heap cell. Cells are deallocated when the
reference count reaches zero.

 Incremental nature reduces delays but introduces list maintenance overhead.

 Variable-Size Cells:
 Managing heaps with variable-size cells presents additional challenges such as
setting indicators, complex marking processes, and list maintenance overhead.

You might also like