You are on page 1of 18

Strings

Strings are a feature of every programming language. A string can be defined as an ordered
sequence of characters or symbols encased in matching single or double quotation marks. Most
languages will support primary ASCII characters as well as Unicode representations. A character
will occupy one byte of memory. However, there are many additional ways of representing
strings.

In this reading, you will learn about the string data structure, how strings are represented across
different languages, what they are commonly used for and why programming languages make
strings immutable.

String representation
There are critical differences in how each language represents and supports strings. All
languages support the basic operations of creating, modifying, copying, and assigning strings to
variables. Additionally, everyday actions include concatenating, appending strings, finding the
substring and dealing with collections of strings.

When processing strings, many languages will allow you to perform algebraic actions on them.
For instance, String_A == String_B, which will return a true or a false or String_A < String_B which
determines which is alphabetically first. Some languages allow strings to represent variables in a
string but require a special symbol. Often this is a dollar sign $ before the variable name but
might also, in some instances, include being encased in curly brackets {}. Escape flags allow the
inclusion of symbols in a string. Imagine that you want to have quotations within a string.

String = “the man said \”two more pints please\” to the barman”

Here the use of the backslash \ ensured that the string would retain the quotation marks in the
sentence. Other escape symbols could be #, %,' or a double quote inside a string denoted by
single quotes. While the actual implementation may differ, the general approach to dealing with
strings is the same.

Common usage
One area that deals heavily with strings is Natural Language Processing (NLP). How strings are
encoded when reading them from various locations such as Twitter, PDF files, text documents,
or Reddit can raise unexpected issues. These can create complex debugging problems if the
application processes an unusual symbol representation that affects the results.

When writing strings into a program, it is common to apply tokenization. This is converting a
string into an array of smaller strings. Here you would identify a delimiter and break the string into
segments separated by the said delimiter. So, a paragraph might be broken into sentences
through the use of the period(.) or a sentence into individual words by way of segmenting by
space (" "). However, care should be taken when tokenizing text. Consider the text below that
has three period symbols (.).

At 3.30 I went to the shop and spent $40.40 dollars.

Conventionally, you'd read it as a single sentence, but the periods used to indicate the time and
amount of money are delimiters so the sentence is broken up into three parts.
Tokenization on processed, formatted data offers less of a challenge. A common way to store
formatted string representations is through Comma Separated Values (CSV) or Tab Separated
Values (TSV).

Processing such strings is known as parsing, and the returned array will equal the size of the
number of delimiters found, with the delimiter removed. Here is an example of a CSV file that
holds the columns' age, name, and hair color. Notice how there is no comma at the end. Unless
otherwise stated, the new line symbol also functions as a delimiter.
Immutability
An important concept to consider when dealing with strings is whether the string is mutable or
immutable. Mutability refers to your ability to change a string after it has been created. Some
languages like Ruby and PHP allow strings to be changed after creation. It is more common,
however, to use immutability, such as in Java, C#, JavaScript, Python, and Go.

There are several reasons why programming languages make strings immutable. For one, it can
reduce memory consumption. Instead of creating a variable that contains a string, a string pool is
designed to represent all strings used. The immutable approach reuses memory allocation by
having all instances of string point to one location. So, when a change happens with a string, say
it becomes strings, instead of changing the value of string, the variable is pointed to another
instance of strings. Or it's pointed to another immutable example representing strings, which is
then added to the string pool.

This is a vast memory-saving device. Consider reading existing words instead of creating a
memory location for each word, including repeating ones. Using a unique set reduces the
required space. The drawback is that if your application constantly changes texts, a memory
penalty is incurred for every alteration made

Integers
An integer is used to hold numeric values. An integer can either be signed (holds both positive
and negative numbers), or unsigned (will only hold positive numbers). Integers are represented
in binary and an essential representation will need 4 bytes.

In this reading, you will learn about the integer data structure, such as how integers are
represented, memory allocation for integers and the integer wrapper class in Java.

Integer representation
There are several ways to represent integers in binary. One typical example is sign-magnitude. If
integers are represented in binary, how can they distinguish between a positive and negative
value? Sign-magnitude proposes using an indicator on the far left of the binary number to denote
polarity.

Integer Sign-magnitude representation


2 0010
1 0001
+0 0000
-0 1000
-1 1001
-2 1010
An integer cannot represent fractions. For this, one would use decimal or float. A fixed number of
bytes is used when representing integers, the size of which can be specified in some languages.
There is a standardized approach for representing numbers called the IEEE 754 standard, which
outlines a common set of standards for representing all numbers.

Some high-level languages like Python and JavaScript subscribe to this approach and
encapsulate the initialization of integers to this fixed representation. This makes working with
numbers easy in these high-level dynamic languages but removes the ability to customize your
approach to enable memory optimization.
Memory allocation
Other statically typed languages like C++, Rust, Ada and C allow you to customize the size in
memory. C++ allows you to customize your integer. Using unsigned short int will only take 2 bytes.
This savings can mount up if your application includes many positive numbers that do not require
precision to the degree of fractions.

Rust goes one better and enables the instantiation of unsigned 1-byte integers. This limited int
can hold integers from 0 – 255. While that may not seem like a massive range to deal with, if you
are working with pixels, you would be working with a tuple of values in this range. Image
processing is a highly CPU-intensive process and being able to store 8 pixels for the standard
price of 1 is a massive saving.

Wrapper classes
In addition to primitive integers denoted by int, Java allows you to wrap the value of the primitive
into a wrapper class Integer. This allows several methods for dealing with integers, such as
converting from a string to a double, comparisons, maximum and minimum size and so on. The
integer class is immutable, which makes it thread-safe. The extra functionality and safety incur a
memory cost, and Integer object will take 16 bytes of memory to store.

Conclusion
There is enormous freedom in controlling how your figures are stored in memory. While a high-
level dynamically typed language gets up and running, it may have limited returns if your
application becomes resource heavy. Suppose you are working with Arduinos or other hobby
electronics. In that case, space becomes a premium, and the freedom to customize may be the
difference between a functional and non-functional application. However, if you are developing
an application that requires substantial amounts of available memory, using integers quickly and
not having to worry about the detail may be the way to go

Booleans
When testing your knowledge in this course, you are often presented with two options, multiple
choice and true or false. The latter is an example of a Boolean; a thing is or is not. In this
reading, you will learn more about Boolean data structures and the critical features for working
with them, conditional statements and logical operators.

Conditional statements
Boolean expressions can take a range of relational operators.

> Greater than


< Less than
>= Greater than or equal to
<= Less than or equal to
== Equal to
!= Not equal to
These enable you to interrogate some data before running different code options. Therefore,
Boolean expressions are often referred to as conditionals. Combine these with conditional
statements, and you have an early implementation of AI.
Conditional statements are: if, else, else if, while and so on. A conditional block can enable you to
execute one set of instructions in one condition and another if the conditions are different.
Consider the following diagram.
That same diagram is represented in this Python code:

1
2
3
4
5
6
if right > wrong:
doTheRightThing()
elif wrong > right:
doTheWrongThing()
else:
keepResearching()

Using Boolean expressions as indicators, you can then use relational operators to determine
which line of code is to be executed. In this example, you have a computer evaluating between
right and wrong. This has multiple applications and is commonly used. Finally, if you have not
met any of your conditions, you might employ a catch-all code block. Roombas are a prime
example of how this code can be used in everyday life. In place of right and wrong, you will have
rudimentary sensors that determine which motors are to be activated based on distance and
impact.

Logical operators
Additionally, you might want to expand the scope of your application by using logical operators.

Logical operators
|| Logical OR
&& Logical AND
! Logical NOT
These logic expressions can be combined with Boolean expressions to give your code greater
diversity.

1
2
3
4
5
6
7
8
if condition_1 || condition_2:
doActionOne()
elif condition_1 && condition_2:
doActionTwo()
elif !condition_1:
doActionFour()
else:
waitForInstruction()

In this code, four outcomes are checked. First, take condition_1 or condition_2, and let's stick with
the Roomba example. If proximity detection or impact detection is true, then action one might be
stop and reverse. If there's proximity and impact detection, then it'll trigger an alarm. If there is no
proximity detection, it will continue to send power to the forward motors. Finally, if no conditions
are met, there should be some fail-safe code ready to activate.

The use of Boolean logic is the backbone of circuit design. It is a way of interpreting Boolean
operators together. The shapes in the diagram below are gates triggered when certain Boolean
operators are fired. A signal is sent through the wires (depicted by the incoming and outgoing
lines). The AND operator says if both wires are true, then continue this line of execution. At the
same time, the NAND specifies that only two false assertions will trigger a reaction.
Combined, they can be used to make complex decisions and cause a device to operate in
diverse ways, depending on the input from the sensors.

1.
Question 1

What does it mean to say a Data Structure is a first-class object?

It is very quick at retrieving and storing data.

They are not memory intensive.

This means that a data structure can be passed to a function, returned as a result and generally
treated like any other variable.

2.
Question 2

What does it mean to parse a string?

To pass it to the compiler to execute instructions.

To remove symbols and uppercases from a string of text.

To remove items from a string not based on a given format.

Question 3
How many bytes does it normally take to represent a standard int?

4
16

4.
Question 4

A Boolean answer is one that will be either true or false?

True

False

5.
Question 5

Is it possible to copy an array?

Yes, but only through making a deep-copy.

Yes, but only through making a shallow-copy.

No.
Conclusion
To conclude, Boolean expressions are either true or false, and their equivalence in binary would
be 0 or 1. Just like with binary, you might think that only having two values results in limited
capability. But, they can achieve a surprisingly elevated level of complexity when combined with
other mathematical constructs like conditional statements and logical operators. Boolean can be
used in your computer programs to inform which operations should run automatedly and form the
backbone of circuitry diagrams.

Arrays
This reading demonstrates the array data structure, including how to initialize, manage and join
arrays. All programming languages have some form of an array, but how they operate might
have slight differences. For instance, some languages specify that an array must hold the same
element type, like a string array or int array.

In contrast, other languages allow mixed elements to be stored in an array. The functionality of
arrays might also differ. Sometimes, an array will be a fully-fledged object with complex
functionality. At the same time, it can be treated as a storage type for primitives; operations and
functionality would then be externally applied.

Initializing arrays
Arrays can be created statically or dynamically. In a static language, the array would be kept on
the stack and require that the array type be specified a priori. Dynamic languages offer more
fluidity, sometimes calling for the size to be set and do not require the type to be specified before
use. Such an instance would be stored on a heap. Arrays have indexes, which is a contiguous
value starting at 0 and increasing until the end of the array. When an item is required, the array's
name followed by square brackets and the index location are given.

array_name[0]

This example will retrieve the first item in the array. Any number could be used and the element
at that index location will be returned. Specifying a number greater than the size of the array
would throw an out-of-bounds error. A typical action you will make with an array is to iterate over
the array and investigate the elements.

1
2
3
4
n <- size of array arr
FOR (i <- 0;i<(n-1); i <-(i+1)) DO
process element arr[i]
END
The general shape of a for loop is the same in all approaches: you need the size of the array, an
integer i that increases at every iteration and a general for loop for syntax. The appearance may
vary slightly, but the underlying mechanism is the same. Starting at 0, go through each item in
the array and do something. An array will always have some way of accessing the size. Some
examples are .size(), len(array) and .length().

Managing arrays in memory


One fundamental difference regarding arrays in different programming languages is where they
are stored. An item can be stored in heap or stack memory. Stack memory is created when
running a function. It is created for that function and discarded after the execution. Items in this
memory allocation are only available for that function. In contrast, heap memory is created during
the execution of instructions and is available to all.

Care should be taken when altering elements in a static environment. On instantiation of an


array, memory space equal to the specified initialization size is created on the call stack. A call
stack is created to execute the goal when a function is called. Altering an array and returning it
from the stack may lead to corrupted memory as the stack is discarded after the function
completes.

Dynamic languages avoid this issue by storing the array in a heap. Thus, the array remains
unaffected when the function ends, and the stack is discarded. Stacks, by their nature, hold
contiguous memory blocks, making accessing the information more manageable. A heap is less
organized and it can take more time to access the elements. Once again, you should consider
the trade-off between size and convenience, or accessibility versus speed.

Ordinarily, when a program is finished with an array, the memory is deallocated and becomes
available for something else. How this deallocation and garbage collection is handled is down to
the programming language selected and can only improve performance if done effectively. Poor
memory management can lead to leaks, which may crash your application upon repeated calls.

Two memory-related concepts you may encounter are shallow copy and deep copy. The first
instance does not make a copy of the array but returns an index location. A deep copy will create
a new instance of an array. Making a shallow copy optimizes memory usage; however, you must
ensure that no unexpected changes are inadvertently made to an array shared by two variables.

Joining arrays
A matrix is a two-dimensional array (or an array composed of arrays) that can act like a table. It
can be used to represent rows and columns. It gives an element a 2D (x,y) coordinate.
Here x would refer to the rows and y to the columns. As before, one would use square brackets
to access an element in a two-dimensional array. However, two index locations are required for a
matrix, as below.

Matrix[x][y]

A matrix will exhibit a square shape with the same number of columns and rows. This need not
necessarily be the case. But if you choose not to maintain uniformity, ensure that your loop acts
accordingly.

1
2
3
4
5
6
7
int[][] matrix = new int[5][7]
for (int i = 0; i < matrix.length; i++) {
for(int j = 0; j < matrix[i].length; j++)
{
doSomethingNeo(matrix[i][j])
}
}

In this example, a 2D array called a matrix is initialized to hold integers. The outer collection can
contain five elements (integer arrays). The inner arrays all have a capacity of 7. The first array in
the outer array is selected with the i, the size is determined (7), and each element of that inner
array is then passed to the method. Notice how the size of the inner array was accessed
dynamically using the .length attribute. This ensures that there can be no out-of-bounds errors
during the iteration.

Conclusion
Arrays are a fundamental data structure that exists in most programming languages. They help
store related data. It is worth noting that their implementation varies depending on the language,
so care should be taken when using them. This reading taught you about the array data
structure, including how to initialize, manage and join arrays.

Objects
Introduction
In this reading, objects will be discussed. Objects are the building blocks of all code and play a
particularly important role in object-orientated programming (OOP). In this reading, the benefits
of using objects will be outlined, as well as some important terminology relating to the use of
objects.

Definition
An object is a programming concept that means that a structure has both state and behavior.
Here, behavior relates to the object's ability to perform some action. As you progress through this
course, you will witness many instances of objects exhibiting different behaviors. Concretely it
relates to calling the methods of an object. An example of this might be calling the sort method of
an array. This has the result of re-organizing the items of an array so that they are organized in
relation to one another.

State pertains to the information about an object. Another word for this that you may be familiar
with is object attributes. If you create a person class and you instantiate it to an object, an
example of a state might be the person's age or name. The behavior then might relate to an
action that is required of this person-object, like run or tackle.

Classes are commonly described as blueprints for an object. By extension, an object can be
described as an instance of a class. The most common use of objects is in OOP, where code is
encapsulated into objects, and these objects then interact with one another.

Example
One of the considerable strengths of using objects to instantiate a class is that you only need to
create one template for how an object will act. Then you are free to create multiple instances of
this class that can interact with one another. Consider a football team with 11 players. Only one
class needs to be outlined that can hold varying speed, agility and work rate characteristics. This
can seriously reduce the amount of code overhead in creating a football game. Characteristics
like speed and agility are known as instance variables and can be said to relate to the state of
the object. No unit outside of the object will share this instance. Two players can have a fitness
instance. Changing this in one will not affect the fitness instance found within the other objects.
You can change the instances of a class through the constructor or an internal method of the
object. In Java, a typical instantiation of a class is as follows:

Player player1 = new Player(agility = 54, speed = 88, fitness = 90);

Player player2 = new Player(agility = 90, speed = 64, fitness = 83);

In this example, the player class creates two objects, player1 and player2. The variables within
the class are set differently depending on the values found within the soft brackets. There also
needs to be a method to change the variable. Over the course of a match, a player may become
tired. A method can be called to reduce the variable instance to reflect this.

player1.set_fitness(80)

This will alter the fitness level of player1 but will not affect the other players. The methods used to
change attribute instances are generally called getters and setters.

Further control of the objects is available through the other methods found with the object. A
command such as

player1.kick()

would cause the object assigned to player1 to enact the procedure of kicking the ball.

The player class allows for the creation of a team with varying abilities that can play. Another
concept related to the use of objects is inheritance. Imagine there is a need to create another
human on the pitch, such as a referee, that does not play but still performs actions relating to the
game, such as running and general appearance. You may want to reuse some of the code found
in the player without creating a player. Here you would create a class that holds all common
attributes you wish to retain, and each object can inherit them, as shown.
Thus, the standard methods such as run, get-tired, and follow the ball may all be found within
Humans, but how these relate to the class player and class referee can be differentiated. This
notion of having one general concept(human) that can manifest in different forms (players,
managers, referees) is known as polymorphism. One shape, many forms. The classic example is
the use of shapes. One overarching shape will contain the area, diameter, and height concepts.
Then each instance of shape (square, triangle, circle) will apply the actual implementations of
how these attributes look in their given state.

Conclusion
In this reading, the concept of objects was explored, with a particular focus on its usefulness
when used in an object-orientated programming approach.

You might also like