Professional Documents
Culture Documents
In the last session, we introduced ourselves to the primary data structure in Python - Lists. We
have seen that lists allow you to store multiple values with assignment to a single variable. We
have also briefly seen that by doing so, we can iterate over multiple values by using for loops. As
a result, we can do the same operations on multiple values using a small set of commands. The
list data structure allows you to optimize memory and reduce the length of the code. This data
structure is the foundation for the efficiency and effectiveness of Python.
Today, we will see three other data structures that build on the lists and allow you to do the
same operations even more efficiently. These data structures are
1. Dictionaries
2. Arrays
3. Data Frames
Dictionaries
Just like a list, a dictionary is a collection of many values. The only difference between list and
dictionaries is in terms of the way indexing works. Recall that lists are ordered sequences of
values. This means that for a list myList , myList[1,2] is not the same as myList[2,1] .
Because they are ordered sequences, elements of a list can be indexed with integers -3, -2, -1,
0, 1, 2, 3, etc.
On the contrary, index in dictionary can be of any type. An index in a dictionary is called a key.
Within a dictionary, each key is associated with a value, creating a key-value pair.
In [3]: firstDict['Address']
'RNo. 9'
In [4]: 'The ' + str(firstDict['age']) + ' year old professor, ' + firstDict['name'] + ' has
firstDict == secondDict
True
In [6]: a = [1,2,3]
b = [3,2,1]
a==b
False
In [7]: secondDict['specialization']
---------------------------------------------------------------------------
at line 1 in <module>
KeyError: 'specialization'
The KeyError when working with dictionaries is similar to IndexError when working with lists.
KeyError means that you have used a non-existing key as an index in a dictionary. IndexError in
the same way means you used a non-existing index in a list.
In class exercise:
The following table shows the marks of five students in five subjects:
Anand 65 78 85 58 72
Bhanu 83 64 74 94 65
Chetna 47 84 74 59 82
Durga 57 59 95 78 49
Eshwar 78 65 84 68 65
Exercise 1: Get this data into your Python environment using lists and using dictionaries.
Exercise 2: Now create lists that can be converted to a dictionary that stores the above table.
Arrays
Notice that we have to iterate over the lists to make simple calculations. This is very inefficient.
The loops are inefficient because they execute sequentially and have to temporarily store values
until the loop ends. There are efficient ways of doing mathematical operations on multiple
values. We do not need to bother about these efficient ways. This is because these methods are
written as functions in various modules. The best module for doing mathematical operations is
numpy. The fundamental datatype in numpy is the ndarray or the n-dimensional array.
NdArray
The numpy module is based on one main object: ndarray. It stands for N-dimensional array. It is
a
Let's code!!!
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
array([1, 2, 3])
In [16]: type(firstArray)
numpy.ndarray
In [17]: firstArray.dtype # this gives the type of the data within the ndarray it is int in t
dtype('int64')
secondArray = np.array([[1.5,2.5],[3.5,4.5]])
display(secondArray)
secondArray.dtype
dtype('float64')
display(modFirstArray)
modFirstArray.dtype
dtype('float64')
# number of dimension, ndim, is the 1 for a vector, 2 for a matrix and can be any va
nDim1 = firstArray.ndim
nDim2 = secondArray.ndim
print('First array has {0} dimensions and second array has {1} dimensions'.format(nD
size1 = firstArray.size
size2 = secondArray.size
print('First array has {0} size and second array has {1} size'.format(size1,size2))
shape1 = firstArray.shape
shape2 = secondArray.shape
print('First array has {0} shape and second array has {1} shape'.format(shape1,shape
First array has (3,) shape and second array has (2, 2) shape
Data Frames
The final data structure we will study is the most important one - Data frames. The DataFrame is
the fundamental datatype in the pandas module. You can understand a DataFrame as building
on two data structures discussed above: arrays and dictionaries.
From a single array, we arrive at pandas Series. A dictionary of arrays can be converted into a
DataFrame
import numpy as np
0 1
1 4
2 9
3 16
dtype: int64
display(firstArray)
array([1, 2, 3])
display(firstSeries.index)
In [37]: display(firstSeries.values)
array([ 1, 4, 9, 16])
Notice that the values attribute of the Series is actually a numpy array. Essentially, pandas Series
is an improvement to numpy array in two ways:
In [39]: # we can give our own index. we will see this with the second series
a Alice
b Bob
c Connor
d Dana
dtype: object
Exercise 5: Build the above Series using dictionaries.
Dataframe
It is an ordered collection of columns each of which can contain a value of a different type.
There are two index arrays:
1. Index associated with lines or rows. This is similar to the index in series.
2. Array of labels associated with each column.
You can consider a dataframe a dictionary of series. The key to each series is the column name.
The values are the series that make up each column. A dataframe holds rectangular data with
index and column names.
Exercise 6: Build a DataFrame that best holds the marks of the five students.
Key Takeaways
1. Python grows as new modules build on old modules to increase efficiency and
effectiveness.
2. Lists, dictionaries, arrays, Series and DataFrame all build on the previous datatypes and help
store more complex datasets while making operations on them more efficient.
3. Lists help iterate easily. Multiple assignment a boon.
4. Dictionaries help make indexing more meaningful and flexible.
5. Arrays make iteration unnecessary for mathematical operations.
6. Series and DataFrame take the best from all three worlds and help store complex datasets
and work on them.
We will compare each of these datatypes on how they enable working with data and where each
is useful.