Data Types

UNIT-II
Names, Bindings, and Scopes: Introduction, Names, Variables, Concept

of Binding, Scope, Scope and Lifetime, Referencing Environments,
Named Constants.
Data Types: Introduction, Primitive Data Types, Character String Types,
User Defined Ordinal Types, Array, Associative Arrays, Record, Union,
Tuple Types, List Types, Pointer and Reference Types, Type Checking,
Strong Typing, Type Equivalence.
Expressions and Statements: Arithmetic Expressions, Overloaded Operators, Type
Conversions, Relational and Boolean Expressions, Short Circuit Evaluation,
Assignment Statements, Mixed-Mode Assignment.
Control Structures: Introduction, Selection Statements, Iterative Statements,

Unconditional Branching, Guarded Commands
Data Types: Introduction, Primitive Data Types, Character String Types,

User Defined Ordinal Types, Array, Associative Arrays, Record, Union,
Tuple Types, List Types, Pointer and Reference Types, Type Checking,
Strong Typing, Type Equivalence.
Data Types:
A data type defines a collection of data values and a set of predefined operations on
those values. Computer programs produce results by manipulating data.
A data type in programming is like a container that holds certain kinds of information
and allows us to perform specific actions on that information.
 Matching Real-World Objects: When we write computer programs, we often

need to work with data that reflects real-world objects, like numbers, names, or
records of information. So, it's crucial for programming languages to have data
types that match these real-world objects.
 Evolution of Data Typing: Over the years, the way we handle data in
programming languages has changed. Early languages had limited options for data
types, but newer languages have become more flexible, allowing us to define
custom types that better fit our needs.
 User-Defined Types: We can create our own custom data types, which can make
our code easier to understand and modify. For example, instead of just using basic
types like "number" or "text," we could create a type like "Student" or
"Employee" with specific attributes.
 Abstract Data Types (ADTs): These are types where we separate how they work
from how they're implemented. This separation helps us focus on what we can
do with the data type without worrying about how it's built internally.
 Structured Data Types: These are types that organize data in a structured way,
like arrays (lists of items) or records (collections of related information).
 Descriptors and Variables: Descriptors are like labels that describe the attributes
of variables, such as their type or size. They help with managing variables in
memory and ensuring that our code works correctly.
 Objects: In some languages, we use the term "object" to refer to instances of user-
defined types. For example, if we define a type called "Car," each individual car
in our program would be an "object" of that type.
Primitive data types
 Data types that are not defined in terms of other types are called primitive data
types.
 All programming languages provide a set of primitive data types
 To provide the structured types, the primitive data types of a language are used,
along with one or more type constructors.
Numeric Types
Integer
Integer Data Type: Integers are one of the most common primitive data types
used in programming. They represent whole numbers without any fractional or
decimal parts.
Sizes of Integers: Many modern programming languages support different sizes

of integers. For example, Java offers four sizes: byte, short, int, and long.
Signed and Unsigned Integers: Signed integers can represent both positive and
negative numbers, whereas unsigned integers represent only non-negative
numbers. Languages like C++ and C# offer support for unsigned integer types,
which are useful for handling binary data.
 Representation in Computers: Internally, computers represent integer

values as strings of binary digits (bits). One bit, typically the leftmost one,
is used to represent the sign of the number. Positive numbers are
represented directly, while negative numbers are represented using techniques
like twos complement or ones complement notation.
 Twos Complement Notation: Most modern computers use twos complement
notation to represent negative integers. In twos complement, the negative of a number
is obtained by taking the logical complement (flipping all the bits) of its positive
representation and adding one to it.
 Ones Complement Notation: Some older computers use ones complement
notation, where the negative of a number is obtained by taking the logical
complement of its absolute value.
Floating-Point
 Floating-Point Data Types: Floating-point data types are used to represent
real numbers in programming.
 Representation in Computers: Floating-point numbers are typically stored in
binary format on most computers.
 However, representing numbers like 0.1 in binary can lead to inaccuracies
because some numbers that are easy to represent in decimal notation can't be
represented exactly in binary form.
 Loss of Accuracy: Another issue with floating-point types is the loss of
accuracy that can occur through arithmetic operations. Performing operations
like addition, subtraction, multiplication, and division on floating-point
numbers can introduce small errors due to the limited precision of the
representation.
 IEEE Floating-Point Standard: Most modern computers use the IEEE
Floating-Point Standard 754 format to represent floating-point numbers. This
standard defines the formats for single-precision (usually called float) and
double-precision (usually called double) floating-point numbers.
 Float variables are typically stored in four bytes of memory, while double
variables occupy twice as much storage.
 Precision and Range: The collection of values that can be represented by a
floating-point type is defined in terms of precision and range.
 Precision refers to the accuracy of the fractional part of a value, measured in
terms of the number of bits used.
 Range encompasses both the range of fractions and the range of exponents,
determining the maximum and minimum values that can be represented.
Table – 1 Precision Representation
Bas Sig
Precision e n Exponent Significant
Single precision 2 1 8 23+1
Double
2 1 11 52+1
precision
Complex
 Some programming languages support a complex data type—for example,
Fortran and Python.
 Complex values are represented as ordered pairs of floating-point values.
 In Python, the imaginary part of a complex literal is specified by following it
with a j or J—for example,
(7 + 3j)Languages that support a complex type include operations for arithmetic
on complex values.
Decimal data types

 Decimal data types are important for business tasks because they
accurately handle decimal values like those used in financial calculations.
 Unlike floating-point numbers, decimal types keep decimal points fixed and
maintain precision, ensuring that values like 0.1 are exact.
 Popular programming languages like COBOL, C#, and F# support decimal
data types because they are important for business applications.
 However, they have limitations, such as a smaller range compared to
floating-point types since they can't handle exponents.
 Decimal types are stored in memory using binary coded decimal (BCD)
representations, which use binary codes for each decimal digit.
 This storage method is less efficient than binary representation, meaning
it takes more memory to store decimal numbers compared to binary ones.
 Despite the memory inefficiency, decimal types have hardware support for
quick operations on machines used for business tasks.
 Overall, despite their drawbacks, decimal types are essential for accurately
managing decimal values in business applications.
 In the BCD numbering system, the given decimal number is segregated into
chunks of four bits for each decimal digit within the number.
 Each decimal digit is converted into its direct binary form (usually represented
in 4-bits).
1. Convert (123)10 in BCD
From the truth table above,
1 -> 0001
2 -> 0010
3 -> 0011
thus, BCD becomes -> 0001 0010 0011

Character types
 Character types in programming languages are used to represent
individual characters, such as letters, digits, and symbols.
 Traditionally, character data was stored using 8-bit codes like ASCII or ISO
8859-1, allowing for the representation of a limited set of characters.
However, with the need for globalization and communication between
computers worldwide, more comprehensive character sets like Unicode have
become essential.
 Unicode, introduced in 1991, provides a standardized way to represent
characters from various languages and scripts.
 Initially, Unicode used a 16-bit character set called UCS-2, later expanded
to include more characters in UCS-4 (UTF-32).
 Unicode-16, or more precisely UTF-16 (Unicode Transformation Format, 16-
bit), is a character encoding standard that represents Unicode characters using
16 bits per character. It is one of the encoding schemes defined by the Unicode
Consortium for encoding Unicode characters.
 Many programming languages, including Java, JavaScript, Python, Perl, C#,
and F#, have adopted Unicode for character representation.
Example: In python
# Define a character variable
char = 'A'
Character String Types

 A character string type is one in which the values consist of sequences of
characters.
 Character string constants are used to label output, and the input and output of
all kinds of data are often done in terms of strings.,
 Character strings also are an essential type for all programs that do character
manipulation.
Boolean types
 Boolean types are the simplest data types in programming,

consisting of just two possible values: true and false.
 They were first introduced in ALGOL 60 and have become standard in
most general-purpose programming languages since the 1960s.
 In languages like C89, boolean types are not explicitly defined, and
instead, numeric expressions are used for conditionals. In these
languages, any nonzero value is treated as true, while zero is
considered false.
 However, later versions of C, such as C99 and C++, introduced
dedicated boolean types, though they still allow numeric expressions to
be used as boolean values.
 In contrast, languages like Java and C# strictly define boolean types
where only true and false values are allowed.
 Boolean types are commonly used to represent switches or flags in
programs, indicating whether a condition is true or false.
 Technically, a boolean value could be stored using just a single bit of
memory.
 However, accessing individual bits of memory can be inefficient on many
computer architectures.
 As a result, boolean values are typically stored in the smallest addressable
unit of memory, which is usually a byte (8 bits).
 This ensures efficient access and manipulation of boolean values, even though
it may require more memory than strictly necessary.
Strings and Their Operations
 The most common string operations are assignment, catenation, substring

reference, comparison, and pattern matching.
 A substring reference is a reference to a substring of a given string.
 Substring references are are called slices.
 If strings are not defined as a primitive type, string data is usually stored in
arrays of single characters and referenced as such in the language. This is the
approach taken by C and C++.
 C and C++ use char arrays to store character strings. These languages provide a
collection of string operations through standard libraries.
 The character strings are terminated with a special character, null, which is
represented with zero.
 Library functions that produce strings often supply the null character.
 The character string literals that are built by the compiler also have the null
character.
 For example, consider the following declaration:
char str[] = "apples";
In this example, str is an array of char elements, specifically apples0, where 0 is the null
character.
 Some of the most commonly used library functions for character strings in C
and C++ are strcpy, which moves strings;
strcat, which cancatenates one given string onto another;
strcmp, which lexicographically compares(by the order of their
character codes) two given strings; and
strlen, which returns the number of characters, not counting the null,
in the given string.
 The parameters and return values for most of the string manipulation
functions are char pointers that point to arrays of char.
 Parameters can also be string literals.
 The problem is that the functions in this library that move string data do not
guard against overflowing the destination.
 For example, consider the following call to strcpy: strcpy(dest, src);If
the length of dest is 20 and the length of src is 50, strcpy will write
over the 30 bytes that follow dest.
 In addition to C-style strings, C++ also supports strings through its
standard class library, which is also similar to that of Java.
 Because of the insecurities of the C string library, C++ programmers
should use the string class from the standard library, rather than
char arrays and the C string library.
 In Java, strings are supported by the String class, whose values are
constant strings, and the StringBuffer class, whose values are changeable
and are more like arrays of single characters.
 These values are specified with methods of the StringBuffer class. C#
and Ruby include string classes that are similar to those of Java.
 Python includes strings as a primitive type and has operations for
substring reference, catenation, indexing to access individual characters,
as well as methods for searching and replacement.
 There is also an operation for character membership in a string. So,
even though Python’s strings are primitive types, for character and
substring references, they act very much like arrays of characters.
 Python strings are immutable, similar to the String class objects of
Java.
 In F#, strings are a class.
 Individual characters, which are represented in Unicode UTF-16, can be
accessed, but not changed.
 Strings can be catenated with the + operator.
 In ML, string is a primitive immutable type.
 It uses ^ for its catenation operator and includes functions for
substring referencing and getting the size of a string.
Perl, JavaScript, Ruby, and PHP include built-in pattern-matching
operations.
 In these languages, the pattern-matching expressions are based on
mathematical regular expressions.
 They are called as regular expressions.
Consider the following pattern expression:
/[A-Za-z][A-Za-z\d]+/
 This pattern matches (or describes) the typical name form in

programming languages.
 The brackets enclose character classes. The first character class
specifies all letters; the second specifies all letters and digits (a
digit is specified with the abbreviation \d).
 If only the second character class were included, we could not
prevent a name from beginning with a digit.
 The plus operator following the second category specifies that
there must be one or more of what is in the category.
 So, the whole pattern matches strings that begin with a letter,
followed by one or more letters or digits.
Next, consider the following pattern expression:
/\d+\.?\d*|\.\d+/
 This pattern matches numeric literals. The \. specifies a literal
decimal point.
 The question mark quantifies what it follows to have zero or one
appearance.
 The vertical bar (|) separates two alternatives in the whole pattern.
 The first alternative matches strings of one or more digits,
possibly followed by a decimal point, followed by zero or more
digits;
 The second alternative matches strings that begin with a decimal
point, followed by one or more digits. Pattern-matching
capabilities using regular expressions are included in theclass
libraries of C++, Java, Python, C#, and F#.
^\d+\.?\d*:
^: This symbol denotes the start of the string.
\d+: Matches one or more digits.
\.?: Matches zero or one occurrence of a literal decimal point.
\d*: Matches zero or more digits after the decimal point.
Example: "123.45", "0.123", "100"
|:
The vertical bar | separates two alternatives in the pattern. It functions like an OR
operator, allowing either part of the pattern to match.
\.\d+$:
\.: Matches a literal decimal point.
\d+: Matches one or more digits after the decimal point.
$: Denotes the end of the string.
Example: ".123", ".5", ".007"
Putting it all together, the regular expression /^\d+\.?\d*|\.\d+$/ matches strings

representing numeric literals. It can match:
Strings starting with one or more digits, followed by an optional decimal point and zero
or more digits (\d+\.?\d*).
OR
Strings starting with a decimal point, followed by one or more digits (\.\d+).
Examples of strings matched by this regular expression:
"123.45"
"0.123"
"100"
".123"
".5"
".007"
This regular expression can be useful for tasks such as validating user input to ensure it
matches a numeric format, extracting numeric values from a larger string, or searching
for numeric literals within text.

Data Types

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Types

Uploaded by

Copyright:

Available Formats

UNIT-II

Names, Bindings, and Scopes: Introduction, Names, Variables, Concept

Control Structures: Introduction, Selection Statements, Iterative Statements,

Data Types: Introduction, Primitive Data Types, Character String Types,

 Matching Real-World Objects: When we write computer programs, we often

Primitive data types

Sizes of Integers: Many modern programming languages support different sizes

 Representation in Computers: Internally, computers represent integer

Table – 1 Precision Representation

Single precision 2 1 8 23+1

Decimal data types

1. Convert (123)10 in BCD

From the truth table above,

thus, BCD becomes -> 0001 0010 0011

Character String Types

 Boolean types are the simplest data types in programming,

 The most common string operations are assignment, catenation, substring

 This pattern matches (or describes) the typical name form in

^: This symbol denotes the start of the string.

\d+: Matches one or more digits.

\.?: Matches zero or one occurrence of a literal decimal point.

\d*: Matches zero or more digits after the decimal point.

Example: "123.45", "0.123", "100"

\.: Matches a literal decimal point.

\d+: Matches one or more digits after the decimal point.

$: Denotes the end of the string.

Example: ".123", ".5", ".007"

Putting it all together, the regular expression /^\d+\.?\d*|\.\d+$/ matches strings

Examples of strings matched by this regular expression:

You might also like