You are on page 1of 2

NumPy Tutorial

(Notebook authors: Prabhat and Sarvesh)

NumPy is one of the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including
mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more. In this tutorial we will be covering some of the basic concepts in numpy such as arrays and its operations, a
few mathematical functions, and sampling random numbers from some widely used probability distributions.

You can find the documentation for the latest stable NumPy version here, and more resources for learning NumPy here.

NumPy Overview
In [377… # Do you know which version of numpy are you using?
# Why is it important to know that? Answer it yourself with the reason.

import numpy as np
print(np.__version__)

1.25.2

In [378… # Is it necessary to always use "np" as an alias for numpy? Uncomment the line below and check it out
''' Ans: no , np is a general alias '''

import numpy as a
print(a.array([1]))

[1]

Arrays
NumPy works with "multidimensional homogeneous arrays", which are multidimensional tensors containing elements of the same data type.

Two dimensional arrays are similar to matrices that we are familiar with, whereas higher dimensional arrays are analogous to tensors from linear algebra.

In [379… # You can create a numpy array using an iterable (eg: lists, tuples)

arr1 = np.array([1,2,3,4,5]) # Using a list


arr2 = np.array((5,6,7,8), dtype=float) # Using a tuple (data type float)
arr3 = np.array(range(4)) # Using a 'range' object
'''NOTE: Range object starts from '0' '''
arr4 = np.array([[1,2],[3,4]]) # Using a list of lists to create a 2D array
print(arr1, arr2, arr3,"\n-----")
print(arr4,"\n-----")

# NumPy also has inbuilt functions to create arrays


arr5 = np.arange(6) # just like using a 'range' object
print(arr5)

[1 2 3 4 5] [5. 6. 7. 8.] [0 1 2 3]
-----
[[1 2]
[3 4]]
-----
[0 1 2 3 4 5]

In [380… # Q: Can you create arrays of non numerial types? Like strings? Try it!
'''Yes '''
list1 = ["a","b","c","d"]
arr = np.array(list1)
print(arr)

# If you can, what can you do with such arrays?

# Q: Can you create an array of mixed types? Try it! WHat happens to the elements of the array?
# What can you conclude from this?

['a' 'b' 'c' 'd']

In [381… # Remember, everything in python is an object, which means that the numpy array
# is also an object. You can see what type of object it is by using the
# type() function
type(arr1)

Out[381… numpy.ndarray

In [382… # numpy.ndarray is a homogeneous array, which means every array has a


# particular data type. What datatype does "arr1" have?
arr1.dtype

Out[382… dtype('int32')

In [383… # You can check the number of dimensions of a numpy array by using the "shape" attribute of the array
arr4.shape

Out[383… (2, 2)

In [384… # You can change the shape of an array using the "reshape" method.
arr6 = arr5.reshape(3,2)
print(arr6)

[[0 1]
[2 3]
[4 5]]

In [385… # Note that the default behaviour of 'reshape' is to arrange the values row-wise
# It is also possible to arrange the values column-wise as well
arr6 = arr5.reshape(3,2,order='F')
print(arr6)

[[0 3]
[1 4]
[2 5]]

In [386… # What's the easiest way to create an array with shape (2,5) that returns
# the same array as the statement below? (Hint: Use the "numpy.arange" function)
#np.array([[1,2,3,4,5],[6,7,8,9,10]])

# Write code below:


arr = np.arange(11)
arr = np.delete(arr,0)
arr = arr.reshape(2,5)
print(arr)

[[ 1 2 3 4 5]
[ 6 7 8 9 10]]

In [387… # numpy has some convenient functions to create certain special arrays ...
# How would you create a numpy array with shape (3,10) that contains all 0's?
# Write code below:
''' Using np.zeros(x) function'''
zero_array = np.zeros(30).reshape(3,10)
print(zero_array)

[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

In [388… # A very useful function to create a special array ...

# numpy.linspace is a function provided by the NumPy library that generates


# an array of evenly spaced numbers over a specified interval. It is particularly useful
# for creating numerical sequences for simulations, visualizations, and other applications
# that require regularly spaced intervals.
''' np.linspace(start , stop , num = x )'''
''' ALSO INCLUDES Start number '''
start = 0.01
stop = 0.89
print(np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0))
arr = np.linspace(1 , 100 , num = 199)
print(arr)

[0.01 0.02795918 0.04591837 0.06387755 0.08183673 0.09979592


0.1177551 0.13571429 0.15367347 0.17163265 0.18959184 0.20755102
0.2255102 0.24346939 0.26142857 0.27938776 0.29734694 0.31530612
0.33326531 0.35122449 0.36918367 0.38714286 0.40510204 0.42306122
0.44102041 0.45897959 0.47693878 0.49489796 0.51285714 0.53081633
0.54877551 0.56673469 0.58469388 0.60265306 0.62061224 0.63857143
0.65653061 0.6744898 0.69244898 0.71040816 0.72836735 0.74632653
0.76428571 0.7822449 0.80020408 0.81816327 0.83612245 0.85408163
0.87204082 0.89 ]
[ 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5 6. 6.5
7. 7.5 8. 8.5 9. 9.5 10. 10.5 11. 11.5 12. 12.5
13. 13.5 14. 14.5 15. 15.5 16. 16.5 17. 17.5 18. 18.5
19. 19.5 20. 20.5 21. 21.5 22. 22.5 23. 23.5 24. 24.5
25. 25.5 26. 26.5 27. 27.5 28. 28.5 29. 29.5 30. 30.5
31. 31.5 32. 32.5 33. 33.5 34. 34.5 35. 35.5 36. 36.5
37. 37.5 38. 38.5 39. 39.5 40. 40.5 41. 41.5 42. 42.5
43. 43.5 44. 44.5 45. 45.5 46. 46.5 47. 47.5 48. 48.5
49. 49.5 50. 50.5 51. 51.5 52. 52.5 53. 53.5 54. 54.5
55. 55.5 56. 56.5 57. 57.5 58. 58.5 59. 59.5 60. 60.5
61. 61.5 62. 62.5 63. 63.5 64. 64.5 65. 65.5 66. 66.5
67. 67.5 68. 68.5 69. 69.5 70. 70.5 71. 71.5 72. 72.5
73. 73.5 74. 74.5 75. 75.5 76. 76.5 77. 77.5 78. 78.5
79. 79.5 80. 80.5 81. 81.5 82. 82.5 83. 83.5 84. 84.5
85. 85.5 86. 86.5 87. 87.5 88. 88.5 89. 89.5 90. 90.5
91. 91.5 92. 92.5 93. 93.5 94. 94.5 95. 95.5 96. 96.5
97. 97.5 98. 98.5 99. 99.5 100. ]

Slicing arrays
In [389… # Akin to a regular list, we can slice numpy arrays as well, in various ways
# Note that indexing starts at 0 and not 1, and that the last index is not included

print(arr1)
print(arr1[0:5])
print(arr1[0:4])
print(arr1[0:4:1])
print(arr1[0:4:2])
print(arr1[::-1])

[1 2 3 4 5]
[1 2 3 4 5]
[1 2 3 4]
[1 2 3 4]
[1 3]
[5 4 3 2 1]

In [390… # Unlike regular lists, we can slice multi-dimensional arrays as well


arr = np.arange(36).reshape(6,6)
print(arr)
print(arr[1:4,2:5])
# arr[ row1 : row2 , column1 : column2 ]

[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]
[24 25 26 27 28 29]
[30 31 32 33 34 35]]
[[ 8 9 10]
[14 15 16]
[20 21 22]]

Copies
new_array = arr : creates connected copies

In [391… # What happens when you assign an array to a new variable?

arr = np.arange(6)
print(arr)

new_arr = arr
print(new_arr)

new_arr[2] = 100 # Here we are changing a value in new_arr only, but ...

print(arr)
print(new_arr)

[0 1 2 3 4 5]
[0 1 2 3 4 5]
[ 0 1 100 3 4 5]
[ 0 1 100 3 4 5]

To create independent copies : Use : arr.copy()

In [392… # Why did "arr" also change when only new_arr was changed?
# Because "new_arr = arr" was execcuted, the pointer to the array object that
# "arr" contains, was also assigned to "new_arr"

# To create a new copy of the array, use the "copy()" method


arr = np.arange(6)
new_arr = arr.copy()

new_arr[2] = 100 #changing new_arr but not changing arr


print(arr, new_arr)

[0 1 2 3 4 5] [ 0 1 100 3 4 5]

Mathematical Operations
In [393… # You can calculate the sum of all the elements in an array using np.sum()
print(np.sum(arr1))

15

In [394… # The numpy.ndarray object has many methods (member functions).


# Another way to calculate the sum of an array using one of these methods.
print(arr1)

'''1. arr.sum() '''


print(f"the sum is {arr1.sum()}")

# There are many aggregate operations you can run on numpy arrays other than
# the sum() method.
'''2. arr.mean() '''
print(f"the mean is {arr1.mean()}") #calculate mean

'''3. arr.std() '''


print(f"the std deviation is {arr1.std()}") #calculate standard deviation

'''4. min and max'''


print(f"the min is {arr1.min()}")
print(f"the max is {arr1.max()}")

[1 2 3 4 5]
the sum is 15
the mean is 3.0
the std deviation is 1.4142135623730951
the min is 1
the max is 5

In [395… # All basic operators like +,-,*,/ perform element-wise operations


# when dealing with arrays of the same shape
arr = np.arange(4)
print(arr, arr*arr)

[0 1 2 3] [0 1 4 9]

In [396… # How to obtain an array that has the squares of each element in "arr"?
# Type code below:
squared_arr = arr*arr
print(squared_arr)

[0 1 4 9]

Operations on 2d array

In [397… # In the case of multi-dimensional arrays, the above functions can take an additional parameter 'axis'
# to define the dimension along which the operation is to be performed. For example
arr = np.arange(36).reshape(6,6)
print(arr,"\n-----")

print(arr.sum(axis=0),"\n-----") #sum of each column


print(arr.sum(axis=1)) #sum of each row

[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]
[24 25 26 27 28 29]
[30 31 32 33 34 35]]
-----
[ 90 96 102 108 114 120]
-----
[ 15 51 87 123 159 195]

Vector and Matrix operations


You can calculate the dot product between two vectors using numpy.dot() .

In [398… arr1 = np.arange(4)


arr2 = arr1 + 3
print(arr1)
print(arr2)

print(np.dot(arr1, arr2))

[0 1 2 3]
[3 4 5 6]
32

What happens when you run numpy.dot() on 2 matrices?

Ans : Matrix multiplication [a*b]*[b*d] = [a*d]

In [399… arr1 = np.arange(6).reshape(3,2)


arr2 = np.ones((2,3))
print(arr1,"\n-----")
print(arr2,"\n-----")

# The following lines are all equivalent, they all calculate the dot product of the two arrays
print(np.dot(arr1,arr2),"\n-----")
print(arr1.dot(arr2),"\n-----")
print(arr1 @ arr2)

[[0 1]
[2 3]
[4 5]]
-----
[[1. 1. 1.]
[1. 1. 1.]]
-----
[[1. 1. 1.]
[5. 5. 5.]
[9. 9. 9.]]
-----
[[1. 1. 1.]
[5. 5. 5.]
[9. 9. 9.]]
-----
[[1. 1. 1.]
[5. 5. 5.]
[9. 9. 9.]]

In [400… # What happens when the following code is run?


# Ans : error due to wrong dimensions

arr1 = np.arange(6).reshape(2,3)
arr2 = np.arange(12).reshape(4,3)
'''print(arr1,"\n-----")
print(arr2,"\n-----")
print(arr1 @ arr2)'''

# ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0,


# with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 4 is different from 3)

Out[400… 'print(arr1,"\n-----")\nprint(arr2,"\n-----")\nprint(arr1 @ arr2)'

To take transpose use arr = arr.T and arr = np.transpose(arr)

In [401… # As you might have guessed, the matrix dimensions are not conducive for matrix multiplication
# One has to often 'transpose' a matrix prior to carrying out operations like multiplication
# This can be done by a method `.T` as shown below
# We can also use the `np.transpose()` function to acheive the same result

print(arr1,"\n-----") # 2 x 3 matrix
print(arr2.T,"\n-----") # 3 x 4 matrix
print(arr1 @ arr2.T,"\n-----") # Now there will be no error
print(arr1 @ np.transpose(arr2)) # Same effect as the above statement

[[0 1 2]
[3 4 5]]
-----
[[ 0 3 6 9]
[ 1 4 7 10]
[ 2 5 8 11]]
-----
[[ 5 14 23 32]
[ 14 50 86 122]]
-----
[[ 5 14 23 32]
[ 14 50 86 122]]

Using Linear Algebra library methods np.linalg.

To calculate determinant : Use : det = np.linalg.det(arr1)

In [402… # Let's look into some standard operations that can be performed on square matrices.
# You can obtain the determinant of a square matrix by using the `np.linalg.det()` function

arr1 = np.arange(9).reshape(3,3)
arr1 *= arr1
# arr1 = np.array([1,2,10,5,6,3,8,9,23]).reshape(3,3)
print(arr1,"\n-----")
print("Value of determinant " , np.linalg.det(arr1))

[[ 0 1 4]
[ 9 16 25]
[36 49 64]]
-----
Value of determinant -216.00000000000006

To calculate determinant: Use: inv_arr = np.linalg.inv(arr1)

In [403… # You can use the `np.linalg.inv()` function to calculate the inverse of a
# square matrix. What will happen when you run the following code?

print(np.linalg.inv(arr1))

[[ 0.93055556 -0.61111111 0.18055556]


[-1.5 0.66666667 -0.16666667]
[ 0.625 -0.16666667 0.04166667]]

In [404… # Q: What changes will you make to the above matrix to make it invertible?
# Try it out here ...
''' To make the array invertible we square the array ( arr1 *= arr1)'''
# What do you think will be the output if we try finding a determinant and
# inverse for a rectangular matrix? Do try it out.
''' ValueError '''
# Write code here:

Out[404… ' ValueError '

In [405… # Now let's find out the eigen values and vectors of the matrix

mat = np.arange(1,10).reshape(3,3)
eigen_value, eigen_vector = np.linalg.eig(mat)

print(f"Eigen values are {eigen_value}")


print(f"Eigen vectors are \n {eigen_vector}")

Eigen values are [ 1.61168440e+01 -1.11684397e+00 -1.30367773e-15]


Eigen vectors are
[[-0.23197069 -0.78583024 0.40824829]
[-0.52532209 -0.08675134 -0.81649658]
[-0.8186735 0.61232756 0.40824829]]

Broadcasting arrays

In [406… # Let's see what happens when we multiply a 2D array with


# 1) a scalar
# 2) a 2D array with the same shape

arr = np.arange(9).reshape(3,3)
print(arr,"\n-----")

print(arr*2,"\n-----") # Scalar multiplication


print(arr*arr) # Element-wise multiplication

[[0 1 2]
[3 4 5]
[6 7 8]]
-----
[[ 0 2 4]
[ 6 8 10]
[12 14 16]]
-----
[[ 0 1 4]
[ 9 16 25]
[36 49 64]]

In [407… # The scalar multiplication is as expected, and the array multiplication


# is element-wise, which is also expected.
# What would happen when the above array is multiplied with a 1D array?
arr2 = np.arange(3)
print(arr,"\n-----")
print(arr2,"\n-----")

print(arr2*arr)

[[0 1 2]
[3 4 5]
[6 7 8]]
-----
[0 1 2]
-----
[[ 0 1 4]
[ 0 4 10]
[ 0 7 16]]

What happened here?! To meaningfully perform multiplication, "arr2" is transformed (broadcasted) into another array of shape (3,3), after which element wise multiplication is performed. There are 2 rules for broadcasting:

1. Identify the array with smaller dimension and increase the dimension (by prepending "1" to the shape) such that it matches the dimension of the other array.
2. Identify arrays of size "1" along a dimension and increase the size along that dimension so that it matches the other array.

In the above example, the shape of arr2 changed as follows: (3,) -> (1,3) -> (3,3) following which element-wise multiplication took place.

In [408… # Try to figure out what happens in the following example


arr1 = np.arange(12).reshape(3,4)
arr2 = np.arange(4)
print(arr1,"\n-----")
print(arr2,"\n-----")

print(arr1+arr2)

[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
-----
[0 1 2 3]
-----
[[ 0 2 4 6]
[ 4 6 8 10]
[ 8 10 12 14]]

In [425… # Can you explain why the following code throws an error?
try:
arr1 = np.arange(12).reshape(3,4)
arr2 = np.arange(3)
print(arr1,"\n-----")
print(arr2,"\n-----")
print(arr1+arr2) # Error in this line

# ''' Remove the except Valueerror function to see the error'''

except ValueError as e :
print(f"An error occurred: {e}")

[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
-----
[0 1 2]
-----
An error occurred: operands could not be broadcast together with shapes (3,4) (3,)

You can read more about broadcasting in the documentation.

Comparison Functions
In [424… # Say you want to check whether a numpy array contains any number greater
# than 5. Would a regular python-styled comparison work?
try:
arr = np.arange(8)
if arr > 5:
print("True")

except ValueError as e :
print(f"An error occurred: {e}")

An error occurred: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [411… # It didn't work! But why didn't it?


# Let's try to understand what the > operator does
arr > 5

Out[411… array([False, False, False, False, False, False, True, True])

Use np.any(arr > x)

To check if any number in the array is greater than 5

check = np.any(arr>5)

check is a boolean

In [412… # The > operator, like any other comparison operator in numpy, return a boolean
# array of the same size. To check if any number in the array is greater than 5
# use numpy.any().
np.any(arr > 5)

Out[412… True

To check if all the numbers in the array are greater than 5, you can use

numpy.all()

In [413… # To check if all the numbers in the array are greater than 5, you can use
# numpy.all()
np.all(arr > 5)

Out[413… False

In [414… # What if you want to see which elements in arr are greater than 5?
# You can use the boolean array as an index to index the original array!
print(arr)
print(arr>5)
print(arr[arr > 5])

[0 1 2 3 4 5 6 7]
[False False False False False False True True]
[6 7]

In [415… # Say you want to create a new array arr2 which has the same size as arr,
# but all the elements less than or equal to 5 are replaced by 5. How would
# you create such an array? (Hint: can you assigned values to indexed arrays?)

# Write code here:

Constants
In [416… # A number of oft used constants are defined in the numpy package
# These include numpy.inf, numpy.e, numpy.pi

print(np.inf, np.e, np.pi)

# numpy constants are documented at: https://numpy.org/doc/stable/reference/constants.html

inf 2.718281828459045 3.141592653589793

Random Number Generation


Random numbers generation is an important tool in data science. It is used to generate random events during simulations For example, the data sets that you have used in exercises E2 were created using controlled random number generation!

In [417… # Let's check how to generate 5 random numbers between 0 and 1

rg = np.random.default_rng()
print(rg.random(5))

# Did you observe that everytime you run this cell, you get 5 different numbers.
# Can you keep it to be a constant? We will try it out in the next cell

# For now, Can you try to generate a 2D random array?


# Write your code here

[0.05587056 0.54761447 0.63688762 0.00323869 0.29701179]

In [418… # Run this cell multiple times and you will see the numbers in the array are always constant
# What is the reason for this? It is because we have provided a 'seed' value of 42

# Providing a seed value allows you to reproduce the same random numbers, which helps to
# repeat, verify and validate simulations and experiments

rg = np.random.default_rng(42)
print(rg.random(5))

# Now change the seed and re-run the cell, you will find a different set of numbers repeating ...
rg = np.random.default_rng(56)
print(rg.random(5))

# Review the documentation at https://numpy.org/doc/stable/reference/random/generator.html

[0.77395605 0.43887844 0.85859792 0.69736803 0.09417735]


[0.72612605 0.01690281 0.98080107 0.72460421 0.54421216]

In [419… # The above statements create random numbers between 0 and 1. What if we want integers?

rg = np.random.default_rng(42)
print(rg.integers(22))

# Now can you generate 5 random integers using rg.integers()?


# Try it yourself by referring to the documentation for numpy.random.default_rng().integers()
# https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.integers.html

# Further, how will you generate a 3 x 5 matrix of random integers between 0 and 45?
# Write your code here ...

File operations using numpy


In [420… # you can save a numpy array to a file

arr = np.arange(12).reshape(6,2,order='F')
np.savetxt("numpy-array.csv", arr, delimiter=',',fmt='%.4f',newline='\n', header='y,x', footer='', comments='', encoding=None)

In [421… # Read back the array from the file and print it
new_arr = np.loadtxt("numpy-array.csv", delimiter=',', skiprows=1, comments='#', encoding=None)
print(new_arr)

[[ 0. 6.]
[ 1. 7.]
[ 2. 8.]
[ 3. 9.]
[ 4. 10.]
[ 5. 11.]]

In [422… # Multiple numpy arrays can be compressed and written to a binary file using the "np.savez" function
# Data written in this way can be read back using the "np.load" function
# Refer to the documentation at
# https://numpy.org/doc/stable/reference/generated/numpy.savez.html and
# https://numpy.org/doc/stable/reference/generated/numpy.load.html

That's it ...
In [423… # This unit has introduced the basics of numpy and numpy arrays.
# The functionality covered in this unit is sufficient to start using numpy for data analysis.
# We will start solving problems using these concepts and functions ...

# The numpy package has many more sub modules, functions and methods.
# Definitely visit the following links:

# Quickstart guide to numpy: https://numpy.org/doc/stable/user/quickstart.html


# Reference guide to all the numpy classes and methods: https://numpy.org/doc/stable/reference/index.html
# You can refer to the entire numpy documentation at https://numpy.org/doc/stable/index.html
# There is much more to be 'discovered' in numpy, as it will be in almost all the Python libraries!
# Have fun!

You might also like