You are on page 1of 16

NUMPY LIBRARY

The numpy library is one of the core packages in Python's scientific software stack. Many other
Python data analysis libraries require numpy as a prerequisite, because they use its ndarray data
structure as a building block. The Anaconda Python distribution we installed in part 1 comes with
numpy.

Numpy implements a data structure called the N-dimensional array or ndarray. ndarrays are
similar to lists in that they contain a collection of items that can be accessed via indexes. On the
other hand, ndarrays are homogeneous, meaning they can only contain objects of the same type
and they can be multi-dimensional, making it easy to store 2-dimensional tables or matrices.

To work with ndarrays, we need to load the numpy library. It is standard practice to load numpy
with the alias "np" like so:

In [1]: import numpy as np

The "as np" after the import statement lets us access the numpy library's functions using the
shorthand "np."

In [2]: data1 = [[1,2,3,4],[5,6,7,8]]

arr1 = np.array(data1)
print(arr1)

print("Dimension of array is : ",arr1.ndim)

print("Shape of array is :", arr1.shape)

print("Shape of array is: ",arr1.dtype)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
print("Class of data is : ",data1.__class__)

print("Class of array is : ",arr1.__class__)

[[1 2 3 4]
[5 6 7 8]]
Dimension of array is : 2
Shape of array is : (2, 4)
Shape of array is: int32
Class of data is : <class 'list'>
Class of array is : <class 'numpy.ndarray'>

In [3]: print("The sequence of 3 numbers from zero : ",np.arange(3))


print("The sequence of 3 numbers from 2 :", np.arange(2,5))
print("The sequence of 3 numbers from 5 with a difference of 3 : ",np.a
range(5,12,3))
print("The sequence of 3 numbers from 5 with a difference of 3 : ",np.l
inspace(5,11,3))
print("The array of 5 zeros : ",np.zeros(5))
print("The matrix of 3 by 5 of zeros : ",np.zeros((3,5)))

The sequence of 3 numbers from zero : [0 1 2]


The sequence of 3 numbers from 2 : [2 3 4]
The sequence of 3 numbers from 5 with a difference of 3 : [ 5 8 11]
The sequence of 3 numbers from 5 with a difference of 3 : [ 5. 8. 1
1.]
The array of 5 zeros : [0. 0. 0. 0. 0.]
The matrix of 3 by 5 of zeros : [[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]]

In [4]: ar2 = np.array([[2,3,6],[3,4,7]],float)


print("Creating ",ar2.shape,"matrix using numpy :",ar2)

Creating (2, 3) matrix using numpy : [[2. 3. 6.]


[3. 4. 7.]]

Create an ndarray by passing a list to np.array() function:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [5]: my_list = [1, 2, 3, 4] # Define a list
my_array = np.array(my_list) # Pass the list to np.array()
type(my_array) # Check the object's type

Out[5]: numpy.ndarray

To create an array with more than one dimension, pass a nested list to np.array():

In [6]: second_list = [5, 6, 7, 8]


two_d_array = np.array([my_list, second_list])
print(two_d_array)

[[1 2 3 4]
[5 6 7 8]]

An ndarray is defined by the number of dimensions it has, the size of each dimension and the
type of data it holds. Check the number and size of dimensions of an ndarray with the shape
attribute:

In [7]: two_d_array.shape

Out[7]: (2, 4)

The output above shows that this ndarray is 2-dimensional, since there are two values listed, and
the dimensions have length 2 and 4. Check the total size (total number of items) in an array with
the size attribute:

In [8]: two_d_array.size

Out[8]: 8

Check the type of the data in an ndarray with the dtype attribute:

In [9]: two_d_array.dtype

Out[9]: dtype('int32')

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Numpy has a variety of special array creation functions. Some handy array creation functions
include:

In [10]: # np.identity() to create a square 2d array with 1's across the diagona
l
np.identity(n = 5) # Size of the array

Out[10]: array([[1., 0., 0., 0., 0.],


[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])

In [11]: # np.eye() to create a 2d array with 1's across a specified diagonal


np.eye(N = 3, # Number of rows
M = 5, # Number of columns
k = 1) # Index of the diagonal (main diagonal (0) is default)

Out[11]: array([[0., 1., 0., 0., 0.],


[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.]])

In [12]: # np.ones() to create an array filled with ones:


np.ones(shape= [2,4])

Out[12]: array([[1., 1., 1., 1.],


[1., 1., 1., 1.]])

In [13]: # np.zeros() to create an array filled with zeros:


np.zeros(shape= [4,6])

Out[13]: array([[0., 0., 0., 0., 0., 0.],


[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Array Indexing and Slicing
Numpy ndarrays offer numbered indexing and slicing syntax that mirrors the syntax for Python
lists:

In [14]: one_d_array = np.array([1,2,3,4,5,6])


one_d_array[3] # Get the item at index 3

Out[14]: 4

In [15]: one_d_array[3:] # Get a slice from index 3 to the end

Out[15]: array([4, 5, 6])

In [16]: one_d_array[::-1] # Slice backwards to reverse the array

Out[16]: array([6, 5, 4, 3, 2, 1])

If an ndarray has more than one dimension, separate indexes for each dimension with a comma:

In [17]: # Create a new 2d array


two_d_array = np.array([one_d_array, one_d_array + 6, one_d_array + 12
])
print(two_d_array)

[[ 1 2 3 4 5 6]
[ 7 8 9 10 11 12]
[13 14 15 16 17 18]]

In [18]: # Get the element at row index 1, column index 4


two_d_array[1, 4]

Out[18]: 11

In [19]: # Slice elements starting at row 2, and column 5


two_d_array[1:, 4:]

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Out[19]: array([[11, 12],
[17, 18]])

In [20]: # Reverse both dimensions (180 degree rotation)


two_d_array[::-1, ::-1]

Out[20]: array([[18, 17, 16, 15, 14, 13],


[12, 11, 10, 9, 8, 7],
[ 6, 5, 4, 3, 2, 1]])

Reshaping Arrays
Numpy has a variety of built in functions to help you manipulate arrays quickly without having to
use complicated indexing operations.

Reshape an array into a new array with the same data but different structure with np.reshape():

In [21]: np.reshape(a=two_d_array, # Array to reshape


newshape=(6,3)) # Dimensions of the new array

Out[21]: array([[ 1, 2, 3],


[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15],
[16, 17, 18]])

Unravel a multi-dimensional into 1 dimension with np.ravel():

In [22]: np.ravel(a=two_d_array, order='C') # Use C-style unraveling (by rows)

Out[22]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,


17,
18])

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [23]: np.ravel(a=two_d_array, order='F') # Use Fortran-style unraveling (by
columns)

Out[23]: array([ 1, 7, 13, 2, 8, 14, 3, 9, 15, 4, 10, 16, 5, 11, 17, 6,


12,
18])

Alternatively, use ndarray.flatten() to flatten a multi-dimensional into 1 dimension and return a


copy of the result:

In [24]: two_d_array.flatten()

Out[24]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,


17,
18])

Get the TRANSPOSE of an array with ndarray.T:

In [25]: two_d_array.T

Out[25]: array([[ 1, 7, 13],


[ 2, 8, 14],
[ 3, 9, 15],
[ 4, 10, 16],
[ 5, 11, 17],
[ 6, 12, 18]])

JOINING ARRAYS/CONCATENATION
Join arrays along an axis with np.concatenate():

In [38]: two_d_array

Out[38]: array([[ 1, 2, 3, 4, 5, 6],


[ 7, 8, 9, 10, 11, 12],

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
[13, 14, 15, 16, 17, 18]])

In [35]: array_to_join = np.array([[10,20,30],[40,50,60],[70,80,90]])


np.concatenate( (two_d_array,array_to_join), #Arrays t
o join
axis=1) #Join at
rows

Out[35]: array([[ 1, 2, 3, 4, 5, 6, 10, 20, 30],


[ 7, 8, 9, 10, 11, 12, 40, 50, 60],
[13, 14, 15, 16, 17, 18, 70, 80, 90]])

In [27]: array_to_join = np.array([[10,20,30],[40,50,60],[70,80,90]])


np.concatenate( (two_d_array,array_to_join), #Arrays t
o join
axis=2) #Axis to
join upon

-----------------------------------------------------------------------
----
AxisError Traceback (most recent call l
ast)
<ipython-input-27-ea78051e0a29> in <module>()
1 array_to_join = np.array([[10,20,30],[40,50,60],[70,80,90]])
2 np.concatenate( (two_d_array,array_to_join), #
Arrays to join
----> 3 axis=2) #
Axis to join upon

AxisError: axis 2 is out of bounds for array of dimension 2

In [39]: array_to_join = np.array([[10,20,30],[40,50,60],[70,80,90]])


np.concatenate( (two_d_array,array_to_join), #Arrays t
o join
axis=0) #Axis to
join upon

-----------------------------------------------------------------------
----

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
ValueError Traceback (most recent call l
ast)
<ipython-input-39-adc6385139bc> in <module>()
1 array_to_join = np.array([[10,20,30],[40,50,60],[70,80,90]])
2 np.concatenate( (two_d_array,array_to_join), #
Arrays to join
----> 3 axis=0) #
Axis to join upon

ValueError: all the input array dimensions except for the concatenation
axis must match exactly

In [41]: array_to_join1 = np.array([[10,20,30,40,50,60],[70,80,90,110,120,130],[


140,150,160,170,180,190]])
np.concatenate( (two_d_array,array_to_join1), axis=0 ) #Join at colu
mns

Out[41]: array([[ 1, 2, 3, 4, 5, 6],


[ 7, 8, 9, 10, 11, 12],
[ 13, 14, 15, 16, 17, 18],
[ 10, 20, 30, 40, 50, 60],
[ 70, 80, 90, 110, 120, 130],
[140, 150, 160, 170, 180, 190]])

In [42]: array_to_join1.shape

Out[42]: (3, 6)

Array Math Operations


Creating and manipulating arrays is nice, but the true power of numpy arrays is the ability to
perform mathematical operations on many values quickly and easily. Unlike built in Python
objects

you can use math operators like +, -, / and * to


Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
y p
perform basic math operations with ndarrays:

In [43]: two_d_array + 100 #Add 100 to each element

Out[43]: array([[101, 102, 103, 104, 105, 106],


[107, 108, 109, 110, 111, 112],
[113, 114, 115, 116, 117, 118]])

In [44]: two_d_array - 100 #Subtract 100 from each element

Out[44]: array([[-99, -98, -97, -96, -95, -94],


[-93, -92, -91, -90, -89, -88],
[-87, -86, -85, -84, -83, -82]])

In [45]: two_d_array * 2 #Multiply each element by 2

Out[45]: array([[ 2, 4, 6, 8, 10, 12],


[14, 16, 18, 20, 22, 24],
[26, 28, 30, 32, 34, 36]])

In [46]: two_d_array / 2 #Divide each element by 2

Out[46]: array([[0.5, 1. , 1.5, 2. , 2.5, 3. ],


[3.5, 4. , 4.5, 5. , 5.5, 6. ],
[6.5, 7. , 7.5, 8. , 8.5, 9. ]])

In [47]: two_d_array ** 2 #Square each element

Out[47]: array([[ 1, 4, 9, 16, 25, 36],


[ 49, 64, 81, 100, 121, 144],
[169, 196, 225, 256, 289, 324]], dtype=int32)

In [48]: two_d_array % 2 # Take modulus of each element

Out[48]: array([[1, 0, 1, 0, 1, 0],


[1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0]], dtype=int32)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Mathematical Operations using two different
arrays
you can also use the basic math operators on two arrays with the same shape. When operating
on two arrays, the basic math operators function in an element-wise fashion, returning an array
with the same shape as the original:

In [49]: small_array1 = np.array([[1,2],[3,4]])


print(small_array1)
small_array1 + small_array1

[[1 2]
[3 4]]
Out[49]: array([[2, 4],
[6, 8]])

In [50]: small_array1 - small_array1

Out[50]: array([[0, 0],


[0, 0]])

In [51]: small_array1 * small_array1

Out[51]: array([[ 1, 4],


[ 9, 16]])

In [52]: small_array1 / small_array1

Out[52]: array([[1., 1.],


[1., 1.]])

In [53]: small_array1 ** small_array1

Out[53]: array([[ 1, 4],

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
[ 27, 256]], dtype=int32)

OTHER MATHEMATICAL OPERATIONS


Numpy also offers a variety of named math functions for ndarrays. There are too many to cover
in detail here, so we'll just look at a selection of the most useful ones for data analysis:

MEAN
In [54]: print(two_d_array)
np.mean(two_d_array) # Get the mean of all the elements in an array
with np.mean()

[[ 1 2 3 4 5 6]
[ 7 8 9 10 11 12]
[13 14 15 16 17 18]]

Out[54]: 9.5

In [55]: # Provide an axis argument to get means across a dimension


np.mean(two_d_array, axis = 1) # Get means of each row

Out[55]: array([ 3.5, 9.5, 15.5])

In [56]: np.mean(two_d_array, axis = 0) # Get means of each column

Out[56]: array([ 7., 8., 9., 10., 11., 12.])

STANDARD DEVIATION
In [57]: # Get the standard deviation all the elements in an array with np.std()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [58]: np.std(two_d_array)

Out[58]: 5.188127472091127

In [59]: # Provide an axis argument to get standard deviations across a dimensio


n
np.std(two_d_array, axis = 0) # Get stdev for each colum
n

Out[59]: array([4.89897949, 4.89897949, 4.89897949, 4.89897949, 4.89897949,


4.89897949])

In [60]: # Provide an axis argument to get standard deviations across a dimensio


n
np.std(two_d_array, axis = 1) # Get stdev for each ROWS

Out[60]: array([1.70782513, 1.70782513, 1.70782513])

ROWS & COLUMN SUMS


In [61]: # Sum the elements of an array across an axis with np.sum()
np.sum(two_d_array, axis=1) # Get the row sums

Out[61]: array([21, 57, 93])

In [62]: np.sum(two_d_array, axis=0) # Get the column sums

Out[62]: array([21, 24, 27, 30, 33, 36])

LOGS & OTHER OPERATIONS


In [63]: # Take the log of each element in an array with np.log()
np.log(two_d_array)

Out[63]: array([[0. , 0.69314718, 1.09861229, 1.38629436, 1.60943791,

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1.79175947],
[1.94591015, 2.07944154, 2.19722458, 2.30258509, 2.39789527,
2.48490665],
[2.56494936, 2.63905733, 2.7080502 , 2.77258872, 2.83321334,
2.89037176]])

In [64]: # Take the square root of each element with np.sqrt()


np.sqrt(two_d_array)

Out[64]: array([[1. , 1.41421356, 1.73205081, 2. , 2.23606798,


2.44948974],
[2.64575131, 2.82842712, 3. , 3.16227766, 3.31662479,
3.46410162],
[3.60555128, 3.74165739, 3.87298335, 4. , 4.12310563,
4.24264069]])

RANDOM IN NUMPY
In [65]: np.random

Out[65]: <module 'numpy.random' from 'C:\\Users\\Shivani\\Anaconda3\\lib\\site-p


ackages\\numpy\\random\\__init__.py'>

In [66]: np.random.rand(1) # Random real number

Out[66]: array([0.87829669])

In [67]: np.random.rand(2,3) #rand(no. of rows, no. of columns)

Out[67]: array([[0.6760826 , 0.82210927, 0.48448645],


[0.29640984, 0.7259102 , 0.00114724]])

In [68]: np.random.randint(1)

Out[68]: 0

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [69]: np.random.randint(5)

Out[69]: 1

In [70]: np.random.randint(100,500,6) #np.random.randint(minimum range, maximum


range, no. of elements)

Out[70]: array([436, 400, 362, 166, 414, 166])

In [71]: lista=[1,2,3,4,5,6,7,8,9,10]
ary1=np.array(lista)
print(ary1.min()) #find out minimum value in an a
rray
print(ary1.max()) #find out maximum value in an a
rray
print(ary1.argmax()) #find out the index of the max
value in an array
print(ary1.argmin()) #find out the index of the min
value in an array

1
10
9
0

In [72]: arr = np.random.random(10)


print("Random 10 real numbers :\n",arr)
mat=arr.reshape(2,5)
print("Matrix(2,5) :\n",mat)
zer = np.zeros((2,1))
app=np.append(mat,zer,axis=1)
print("Matrix after appending (2,6) :\n",app)
zer1=np.zeros((1,5))
app1=np.append(mat,zer1,axis = 0)
print("Matrix after appending (3,5) :\n",app1)

Random 10 real numbers :


[0.90348546 0.94223808 0.60636848 0.39800106 0.60532874 0.82422691
0.95672622 0.93134467 0.00516792 0.21928855]

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Matrix(2,5) :
[[0.90348546 0.94223808 0.60636848 0.39800106 0.60532874]
[0.82422691 0.95672622 0.93134467 0.00516792 0.21928855]]
Matrix after appending (2,6) :
[[0.90348546 0.94223808 0.60636848 0.39800106 0.60532874 0. ]
[0.82422691 0.95672622 0.93134467 0.00516792 0.21928855 0. ]]
Matrix after appending (3,5) :
[[0.90348546 0.94223808 0.60636848 0.39800106 0.60532874]
[0.82422691 0.95672622 0.93134467 0.00516792 0.21928855]
[0. 0. 0. 0. 0. ]]

In [73]: arr = np.arange(10)


print("Mean of array : ",arr.mean())
print("Mean of array : ",np.mean(arr))
print("Sum of array : ",np.sum(arr))
print("Standard Deviation of array : ",arr.std())
print("Square root of array:",np.sqrt(arr))

Mean of array : 4.5


Mean of array : 4.5
Sum of array : 45
Standard Deviation of array : 2.8722813232690143
Square root of array: [0. 1. 1.41421356 1.73205081 2.
2.23606798
2.44948974 2.64575131 2.82842712 3. ]

In [74]: a1=np.linspace(3,5,3)
print("First array : ",a1)
a2=np.linspace(4,4,3)
print("Second array : ",a2)
print("Using greater_equal function ; ",np.greater_equal(a1,a2))
print("Using equal function :",np.equal(a1,a2))

First array : [3. 4. 5.]


Second array : [4. 4. 4.]
Using greater_equal function ; [False True True]
Using equal function : [False True False]

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD

You might also like