You are on page 1of 91

Unit 03

Prepared By
Archana
Implementation of Python libraries (PANDAS
& NUMPY) & data visualization (Matplotlib):
• Introduction of Pandas and NUMPY
• Pandas: Environment set up, PANDAS –series, data frame, read CSV, cleaning
data, correlations, lotting, panel, basic functionality, descriptive statistics, function
application, iteration, and sorting.
• NUMPY: Introduction and environment set up, data types, array, indexing &
slicing, binary operators, string functions, mathematical functions, arithmetic
operations, statistical functions, sort, search & counting functions, matrix library,
linear algebra
• Plotting in Python: Installation of Matplotlib, Pyplot, plotting, markers, line,
labels and title, grids, subplot, scatter, bar, histograms, pie-charts
• Pandas is an open-source Python Library providing high-performance data
manipulation and analysis tool using its powerful data structures. The name
Pandas is derived from the word Panel Data – an Econometrics from
Multidimensional data.
• In 2008, developer Wes McKinney started developing pandas when in need of
high performance, flexible tool for analysis of data.
• Prior to Pandas, Python was majorly used for data munging and preparation. It
had very less contribution towards data analysis. Pandas solved this problem.
• Using Pandas, we can accomplish five typical steps in the processing and
analysis of data, regardless of the origin of data — load, prepare, manipulate,
model, and analyze.
• Python with Pandas is used in a wide range of fields including academic and
commercial domains including finance, economics, Statistics, analytics, etc.
What is Pandas
• Pandas is a Python library used for working with data sets.
• It has functions for analyzing, cleaning, exploring, and manipulating
data.
• The name "Pandas" has a reference to both "Panel Data", and "Python
Data Analysis" and was created by Wes McKinney in 2008.

Why Use Pandas?
• Pandas allows us to analyze big data and make conclusions based on
statistical theories.
• Pandas can clean messy data sets, and make them readable and
relevant.
• Relevant data is very important in data science.
What Can Pandas Do?
• Pandas gives you answers about the data. Like:
• Is there a correlation between two or more columns?
• What is average value?
• Max value?
• Min value?
Key Features of Pandas
Fast and efficient DataFrame object with default and customized
indexing.
Tools for loading data into in-memory data objects from different file
formats.
Data alignment and integrated handling of missing data.
Reshaping and pivoting of date sets.
Label-based slicing, indexing and subsetting of large data sets.
Columns from a data structure can be deleted or inserted.
Group by data for aggregation and transformations.
High performance merging and joining of data.
Time Series functionality.
Pandas – Environment Setup
• Standard Python distribution doesn't come bundled with Pandas module.
• A lightweight alternative is to install NumPy using popular Python package
installer, pip.
<pip install pandas>
• If you install Anaconda Python package, Pandas will be installed by default
with the
• following:
• Windows
• Anaconda (from https://www.continuum.io) is a free Python distribution for
SciPy
• stack. It is also available for Linux and Mac.
Pandas – Introduction to Data Structures
Pandas deals with the following three data structures:
Series
DataFrame
• Panel
• These data structures are built on top of Numpy array, which means they are
fast.
• Dimension & Description
• The best way to think of these data structures is that the higher dimensional
data structure is a container of its lower dimensional data structure.
• For example, DataFrame is a container of Series, Panel is a container of
DataFrame.
• Building and handling two or more dimensional arrays is a tedious task, burden is placed on the
user to consider the orientation of the data set when writing functions.
• But using Pandas data structures, the mental effort of the user is reduced.
• For example, with tabular data (DataFrame) it is more semantically helpful to think of the index
(the rows) and the columns rather than axis 0 and axis 1.
Mutability
• All Pandas data structures are value mutable (can be changed) and except Series all are size
mutable. Series is size immutable.
• DataFrame is widely used and one of the most important data structures. Panel is very less used.
• What is a Series?
• A Pandas Series is like a column in a table.
• It is a one-dimensional array holding data of any type.

<#Create a simple Pandas Series from a list:


a = [1, 7, 2]

myvar = pd.Series(a)

print(myvar)>
Labels
• If nothing else is specified, the values are labeled with their index
number. First value has index 0, second value has index 1 etc.
• This label can be used to access a specified value.
• <print(myvar[0])>
Create Labels
a = [1, 7, 2]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar)
Key Points
Homogeneous data
Size Immutable
• Values of Data Mutable
• The table represents the data of a sales team of an organization with their overall performance
rating. The data is represented in rows and columns. Each column represents an attribute and each
row represents a person.
• What is a DataFrame?
• A Pandas DataFrame is a 2 dimensional data structure,
like a 2 dimensional array, or a table with rows and
columns.
Create DataFrame
• data = { "Porosity (%)": [0.5, 0.25, 0.10],"Permeability (mD)": [200,
300, 100]}
• #load data into a DataFrame object:
• df = pd.DataFrame(data)
• print(df)
• Locate Row
As you can see from the result above, the DataFrame is
like a table with rows and columns.
Pandas use the loc attribute to return one or more
specified row(s)
-------
#refer to the row index:
print(df.loc[0])
-----
Return row 0 and 1:
#use a list of indexes:
print(df.loc[[0, 1]])
Named Indexes
With the index argument, you can name your own indexes.

data = {"Porosity(%)": [0.35, 0.20, 0.10],"Permeability(mD)": [500,


300, 800]}
df = pd.DataFrame(data, index = ["Sl01", "Sl02", "Sl03"])
print(df)
Locate Named Indexes
Use the named index in the loc attribute to return the
specified row(s).
----
• #refer to the named index:
• print(df.loc["Sl01"])
Load Files Into a DataFrame
If your data sets are stored in a file, Pandas can load them into a
DataFrame.
DATA Frame
import pandas as pd
#pandas dataframe converts multi-dimensional dataset into 2D excel
like format”
movies_df=pd.read_csv('https://raw.githubusercontent.com/ammishra08
/MachineLearning/master/Datasets/movies.csv',sep=',')
movies_df
-----------------------------------------
#if the number of rows is not specified, the head() method will return
the top 5 rows.”
movies_df.head()
Panel
The panel is a three-dimensional data structure with heterogeneous data. It is hard to
represent the panel in graphical representation. But a panel can be illustrated as a
container of DataFrame.
Key Points
Heterogeneous data
Size Mutable
• Data Mutable
Pandas — Series
• Series is a one-dimensional labeled array capable of holding data of any type (integer,
string, float, python objects, etc.). The axis labels are collectively called index.
pandas.Series
A pandas Series can be created using the following constructor:
<pandas.DataFrame( data, index, dtype, copy)>
Create a DataFrame from Dict of Series
----------
• import pandas as pd
• d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
• 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
• df = pd.DataFrame(d)
• print (df)
• --------

• #Column Selection
• #We will understand this by selecting a column from the DataFrame.
• ------
• import pandas as pd
• d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
• 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
• df = pd.DataFrame(d)
• print (df ['one’])
• ---
• ## Adding a new column to an existing DataFrame object with column
label by passing new series
• print ("Adding a new column by passing as Series:")
• df['three']=pd.Series([10,20,30],index=['a','b','c'])
• print (df)
• print ("Adding a new column using the existing columns in
DataFrame:")
• df['four']=df['one']+df['three']
• print (df)
• Column Deletion
• Columns can be deleted or popped
• ---------------

• #Using the previous DataFrame, we will delete a column


• # using del function
• d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
• 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
• 'three' : pd.Series([10,20,30], index=['a','b','c'])}
• df = pd.DataFrame(d)
• print ("Our dataframe is:")
• print (df)
• # using del function
• print ("Deleting the first column using DEL function:")
• del df['one']
• print (df)
• # using pop function
• print ("Deleting another column using POP function:")
• df.pop('two')
• print (df)
• --------------------
• Row Selection, Addition, and Deletion

• #Selection by Label Rows can be selected by passing row label to a loc


function.
• d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
• 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
• df = pd.DataFrame(d)
• print (df.loc['b'])
• #Selection by integer location Rows can be selected by passing integer
location to an iloc function
• d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
• 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
• df = pd.DataFrame(d)
• print (df.iloc[2])
• -----
• Slice Rows
• Multiple rows can be selected using ‘ : ’ operator.
• ---
• d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
• 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
• df = pd.DataFrame(d)
• print df[2:4]
• -----
#Addition of Rows
#Add new rows to a DataFrame using the append function. This function will append the
#rows at the end.
---
• df = pd.DataFrame([[1, 2], [3, 4]], columns=['a','b'])
• df2 = pd.DataFrame([[5, 6], [7, 8]], columns=['a','b'])
• df = df.append(df2)
• print (df)
• -----
• Deletion of Rows
• Use index label to delete or drop rows from a DataFrame. If label is duplicated, then
• multiple rows will be dropped.
• If you observe, in the above example, the labels are duplicate. Let us drop a label and
will see how many rows will get dropped.

• df = pd.DataFrame([[1, 2], [3, 4]], columns=['a','b'])


• df2 = pd.DataFrame([[5, 6], [7, 8]], columns=['a','b'])
• df = df.append(df2)
• # Drop rows with label 0
• df = df.drop(0)
• print (df)
NumPy- Python Package
• NumPy is a Python package. It stands for 'Numerical Python.
• It is a library consisting of multidimensional array objects and a collection of routines for
processing of array.
• Numeric, the ancestor of NumPy, was developed by Jim Hugunin. Another package Numarray
was also developed, having some additional functionalities.
• In 2005, Travis Oliphant created the NumPy package by incorporating the features of Numarray
into the Numeric package. There are many contributors to this open source project.
• NumPy is a Python library used for working with arrays.
• It also has functions for working in the domain of linear algebra, Fourier transform, and matrices.
• NumPy was created in 2005 by Travis Oliphant. It is an open-source project and you can use it
freely.
• NumPy stands for Numerical Python.
NumPy
• NumPy is a Python library.
• NumPy is used for working with arrays.
• NumPy is short for "Numerical Python".
Why Use NumPy
• In Python we have lists that serve the purpose of arrays, but they are
slow to process.
• NumPy aims to provide an array object that is up to 50x faster than
traditional Python lists.
• The array object in NumPy is called ndarray, it provides a lot of
supporting functions that make working with ndarray very easy.
• Arrays are very frequently used in data science, where speed and
resources are very important.
Operations using NumPy
Using NumPy, a developer can perform the following operations:
Mathematical and logical operations on arrays.
Fourier transforms and routines for shape manipulation.
Operations related to linear algebra. NumPy has in-built
functions for linear algebra and random number generation.
NumPy − Environment
• Standard Python distribution doesn't come bundled with NumPy module.
• A lightweight alternative is to install NumPy using popular Python package installer, pip.
<pip install numpy>
To test whether NumPy module is properly installed, try to import it from Python prompt.
<import numpy>

If it is not installed, the error message will be displayed.

Alternatively, NumPy package is imported using the following syntax:


<import numpy as np>
NumPy − ndarray Object
• The most important object defined in NumPy is an N-dimensional array type
called ndarray.
• It describes the collection of items of the same type. Items in the collection can be
accessed using a zero-based index.
• Every item in a ndarray takes the same size of the block in the memory. Each
element in ndarray is an object of the data-type object (called dtype).
• Any item extracted from the ndarray object (by slicing) is represented by a Python
object of one of the array scalar types.
• The following diagram shows a relationship between ndarray, data type object
(dtype) and array scalar type:
• The basic ndarray is created using an array function in NumPy as follows:
<numpy.array>
• It creates an ndarray from any object exposing array interface, or from any method that returns an
array.
<numpy.array(object, dtype=None, copy=True, order=None, subok=False, ndmin=0)>
<Type Code>
• import numpy as np
• a=np.array([1,2,3])
• print (a)
<Type Code>
• a = np.array([[1, 2], [3, 4]])
• print (a)
• <Type Code>
• # minimum dimensions
• a=np.array([1, 2, 3,4,5], ndmin=2)
• print (a)
• <Type Code>
• # dtype parameter
• a = np.array([1, 2, 3], dtype=complex)
• print (a)

• ---
• The ndarray object consists of contiguous one-dimensional segment of
computer memory, combined with an indexing scheme that maps each item to
a location in the memory block.
• Pandas – Basic Functionality
<Type Code>
• #Create a series with 100 random numbers
• s = pd.Series(np.random.randn(100))
• print (s)
• <Type Code>
• #axes Returns the list of the labels of the series
• #Create a series with 100 random numbers
• s = pd.Series(np.random.randn(4))
• print(s)
• print ("The axes are:")
• print (s.axes)
<Type Code>
• #ndim Returns the number of dimensions of the object. By definition, a Series
is a 1D data structure, so it returns 1.
• #Create a series with 4 random numbers
• s = pd.Series(np.random.randn(4))
• print (s)
• print ("The dimensions of the object:")
• print (s.ndim)
<Type Code>
• #size Returns the size(length) of the series.
• #Create a series with 4 random numbers
• s = pd.Series(np.random.randn(2))
• print (s)
• print ("The size of the object:")
• print (s.size)
<Type Code>
• #values Returns the actual data in the series as an array.
• #Create a series with 4 random numbers
• s = pd.Series(np.random.randn(4))
• print (s)
• print ("The actual data series is:")
• print (s.values)
• #Head & Tail
• #To view a small sample of a Series or the DataFrame object, use the
head() and the tail() methods.
• #head() returns the first n rows(observe the index values). The default
number of elements to display is five, but you may pass a custom
number.
• <Type Code>
• #Create a series with 4 random numbers
• s = pd.Series(np.random.randn(4))
• print ("The original series is:")
• print (s)
• print ("The first two rows of the data series:")
• print (s.head(2))
• #tail() returns the last n rows(observe the index values). The default number of
#elements to display is five, but you may pass a custom number.
• <Type Code>
• #Create a series with 4 random numbers
• s = pd.Series(np.random.randn(4))
• print ("The original series is:")
• print (s)
• print ("The last two rows of the data series:")
• print (s.tail(2))
• #Create a Dictionary of series
• d={'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
• 'Age':pd.Series([25,26,25,23,30,29,23]),
• 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
• #Create a DataFrame
• df = pd.DataFrame(d)
• print ("Our data series is:")
• print (df)
• #T (Transpose) Returns the transpose of the DataFrame. The rows and
columns will interchange.
• # Create a Dictionary of series
• d={'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
• 'Age':pd.Series([25,26,25,23,30,29,23]),
• 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
• # Create a DataFrame
• df = pd.DataFrame(d)
• print ("The transpose of the data series is:")
• print (df.T)
• #axes Returns the list of row axis labels and column axis labels.
• #Create a Dictionary of series
• d={'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
• 'Age':pd.Series([25,26,25,23,30,29,23]),
• 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
• #Create a DataFrame
• df = pd.DataFrame(d)
• print ("Row axis labels and column axis labels are:")
• print (df.axes)
• #dtypes Returns the data type of each column
• #Create a Dictionary of series
• d={'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
• 'Age':pd.Series([25,26,25,23,30,29,23]),
• 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
• #Create a DataFrame
• df = pd.DataFrame(d)
• print ("The data types of each column are:")
• print (df.dtypes)
• #empty Returns the Boolean value saying whether the Object is
empty or not; True indicates that the object is empty.
• #Create a Dictionary of series
• d={'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
• 'Age':pd.Series([25,26,25,23,30,29,23]),
• 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
• #Create a DataFrame
• df = pd.DataFrame(d)
• print ("Is the object empty?")
• print (df.empty)
• #ndim Returns the number of dimensions of the object. By definition,
DataFrame is a 2D object
• #Create a Dictionary of series
• d={'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
• 'Age':pd.Series([25,26,25,23,30,29,23]),
• 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
• #Create a DataFrame
• df = pd.DataFrame(d)
• print ("Our object is:")
• print (df)
• print ("The dimension of the object is:")
• print (df.ndim)
• #values Returns the actual data in the DataFrame as an NDarray.
• #Create a Dictionary of series
• d={'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
• 'Age':pd.Series([25,26,25,23,30,29,23]),
• 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
• #Create a DataFrame
• df = pd.DataFrame(d)
• print ("Our object is:")
• print (df)
• print ("The actual data in our data frame is:")
• print (df.values)
• #Head & Tail To view a small sample of a DataFrame object, use the head() and
tail() methods.

• #Head & Tail To view a small sample of a DataFrame object, use the head() and
tail() methods.
• #Create a Dictionary of series
• d={'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
• 'Age':pd.Series([25,26,25,23,30,29,23]),
• 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
• #Create a DataFrame
• df = pd.DataFrame(d)
• print ("Our data frame is:")
• print (df)
• print ("The first two rows of the data frame is:")
• print (df.head(2))
Pandas – Descriptive Statistics
• A large number of methods collectively compute descriptive statistics and other related operations on DataFrame.
• Most of these are aggregations like sum(), mean(), but some of them, like sumsum(), produce an object of the same size.
• Generally speaking, these methods take an axis argument, just like ndarray.{sum, std, ...}, but the axis can be specified by
name or integer:
DataFrame: “index” (axis=0, default), “columns” (axis=1)
<Type Code>
import pandas as pd
import numpy as np
#Create a Dictionary of series
d={'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
#Create a DataFrame
df = pd.DataFrame(d)
print (df)
sum(): Returns the sum of the values for the requested axis. By default, axis is index
(axis=0).
<Type Code>
print (df.sum())
axis=1: This syntax will give the output as shown below.
<Type Code>
print (df.sum(1))
mean(): Returns the average value.
• <Type Code>
print (df.mean())
std(): Returns the Bressel standard deviation of the numerical columns
<Type Code>
print (df.std())
Functions & Description
• Let us now understand the functions under Descriptive Statistics in Python Pandas. The following
table list down the important functions:
Note: Since DataFrame is a Heterogeneous data structure. Generic operations don’t work
with all functions.
Functions like sum(), cumsum() work with both numeric and character (or) string data
elements without any error. Though n practice, character aggregations are never used
generally, these functions do not throw any exception.
Functions like abs(), cumprod() throw exception when the DataFrame contains character
or string data because such operations cannot be performed.
NumPy − Data Types
• NumPy supports a much greater variety of numerical types than Python does. The
following table shows different scalar data types defined in NumPy.
The parameters are:
• Object: To be converted to data type object
• Align: If true, adds padding to the field to make it similar to C-struct
• Copy: Makes a new copy of dtype object. If false, the result is reference to builtin
data type object

dt=np.dtype(np.int32)
print (dt)

#int8, int16, int32, int64 can be replaced by equivalent string 'i1',


'i2','i4', etc.
dt = np.dtype('i1')
print(dt)
• Each built-in data type has a character code that uniquely identifies it.
• NumPy − Array Attributes
ndarray.shape
• This array attribute returns a tuple consisting of array dimensions. It can also be used
to resize the array.

• a=np.array([[1,2,3],[4,5,6]])
• print (a.shape)

• a=np.array([[1,2,3],[4,5,6]])
• a.shape=(3,2)
• print (a)
• NumPy also provides a reshape function to resize an array.
• a = np.array([[1,2,3],[4,5,6]])
• b = a.reshape(3,2)
• print (b)

• ndarray.ndim
• This array attribute returns the number of array dimensions.

• a = np.arange(24)
• print (a)

• numpy.itemsize
• This array attribute returns the length of each element of array in bytes.

• x = np.array([1,2,3,4,5], dtype=np.int8)
• print (x.itemsize)
x = np.array([1,2,3,4,5])
print (x.flags)
• NUMPY: Introduction and environment set up, data types, array,
indexing & slicing, binary operators, string functions, mathematical
functions, arithmetic operations, statistical functions, sort, search &
counting functions, matrix library, linear algebra
NumPy Array Indexing
• Access Array Elements
• Array indexing is the same as accessing an array element.
• You can access an array element by referring to its index number.
• The indexes in NumPy arrays start with 0, meaning that the first
element has index 0, and the second has index 1 etc.

<import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[0])>
NumPy Array Slicing
• Slicing in python means taking elements from one given index to
another given index.
• We pass slice instead of index like this: [start:end].
• We can also define the step, like this: [start:end:step].
• If we don't pass start its considered 0
• If we don't pass end its considered length of array in that
dimension
• If we don't pass step its considered 1

<print(arr[1:5])>
NumPy Array Copy vs View
• The Difference Between Copy and View
• The main difference between a copy and a view of an array is that the copy
is a new array, and the view is just a view of the original array.
• The copy owns the data and any changes made to the copy will not affect
original array, and any changes made to the original array will not affect the
copy.
• The view does not own the data and any changes made to the view will
affect the original array, and any changes made to the original array will
affect the view.

<x = arr.copy()
arr[0] = 42
print(arr)
print(x)>
<x = arr.view()
arr[0] = 42
print(arr)
print(x)>
• NumPy Array Iterating
Iterating means going through elements one by one.
As we deal with multi-dimensional arrays in numpy, we can do this
using the basic for loop of python.
If we iterate on a 1-D array it will go through each element one by
one.

<for x in arr:
print(x)>
• Iterating 2-D Arrays
• In a 2-D array it will go through all the rows.
<arr = np.array([[1, 2, 3], [4, 5, 6]])
for x in arr:
print(x)>

<for x in arr:
for y in x:
print(y)>
NumPy Joining Array
Joining means putting the contents of two or more arrays in a
single array.
We pass a sequence of arrays that we want to join to
the concatenate() function, along with the axis. If axis is not
explicitly passed, it is taken as 0.

<arr1 = np.array([1, 2, 3])


arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)>
Join two 2-D arrays along rows (axis=1):

<arr1 = np.array([[1, 2], [3, 4]])


arr2 = np.array([[5, 6], [7, 8]])
arr = np.concatenate((arr1, arr2), axis=1)
print(arr)>
NumPy − String Functions
numpy.char.add(): This function performs elementwise string concatenation
<print ('Concatenate two strings:')
print (np.char.add(['hello'],[' xyz']))
print ('\n')
print ('Concatenation example:')
print (np.char.add(['hello', 'hi'],[' abc', ' xyz’]))>

numpy.char.multiply(): This function performs multiple concatenation.


<print (np.char.multiply('Hello ',3))>
numpy.char.center()
• This function returns an array of the required width so that the input string is centered
and padded on the left and right with fillchar.
<# np.char.center(arr, width,fillchar)
print (np.char.center('hello', 20,fillchar='*’))>
numpy.char.capitalize()
• This function returns the copy of the string with the first letter capitalized
<print (np.char.capitalize('hello world’))>
numpy.char.title():
• This function returns a title cased version of the input string with the first letter of each
word capitalized.
• < print (np.char.title('hello how are you?')) >
numpy.char.lower()
• This function returns an array with elements converted to lowercase. It calls str.lower for
each element.
• < print (np.char.lower(['HELLO','WORLD']))
• print (np.char.lower('HELLO’))>
numpy.char.upper()
• This function calls str.upper function on each element in an array to return the uppercase
array elements.
• < print (np.char.upper('hello'))
• print (np.char.upper(['hello','world’]))>

• numpy.char.split()
• This function returns a list of words in the input string. By default, a whitespace is
used as a separator. Otherwise the specified separator character is used to spilt the
string.
< print (np.char.split ('hello how are you?'))
print (np.char.split ('TutorialsPoint,Hyderabad,Telangana', sep=',’))>

• numpy.char.splitlines()
• This function returns a list of elements in the array, breaking at line boundaries
< print (np.char.splitlines('hello\nhow are you?'))
print (np.char.splitlines('hello\rhow are you?’))>
numpy.char.replace()
• This function returns a new copy of the input string in which all occurrences of the
sequence of characters is replaced by another given sequence.
<print (np.char.replace ('He is a good boy', 'is', 'was’))>
NumPy − Mathematical Functions
• Trigonometric Functions
• NumPy has standard trigonometric functions which return trigonometric ratios for a given
angle in radians.
<a = np.array([0,30,45,60,90])
print ('Sine of different angles:')
# Convert to radians by multiplying with pi/180
print (np.sin(a*np.pi/180))
print ('\n')
print ('Cosine values for angles in array:')
print (np.cos(a*np.pi/180))
print ('\n')
print ('Tangent values for given angles:')
print (np.tan(a*np.pi/180))>
• NumPy − Arithmetic Operations
• Input arrays for performing arithmetic operations such as add(), subtract(), multiply(), and divide() must be either of
the same shape or should conform to array broadcasting rules.
<a = np.arange(9, dtype=np.float_).reshape(3,3)
print ('First array:')
print (a)
print ('\n')
print ('Second array:')
b = np.array([10,10,10])
print (b)
print ('\n')
print ('Add the two arrays:')
print (np.add(a,b))
print ('\n')
print ('Subtract the two arrays:')
print (np.subtract(a,b))
print ('\n')
print ('Multiply the two arrays:')
print (np.multiply(a,b))
print ('\n')
print ('Divide the two arrays:')
print (np.divide(a,b))>
• numpy.reciprocal()
• This function returns the reciprocal of argument, element-wise. For elements with
absolute values larger than 1, the result is always 0 because of the way in which
Python handles integer division. For integer 0, an overflow warning is issued.

• numpy.power()
• This function treats elements in the first input array as base and returns it raised to
the power of the corresponding element in the second input array.

• numpy.mod()
• This function returns the remainder of division of the corresponding elements in the
input array. The function numpy.remainder() also produces the same result.
The following functions are used to perform operations on array with complex numbers.
• numpy.real() returns the real part of the complex data type argument.
• numpy.imag() returns the imaginary part of the complex data type argument.
• numpy.conj() returns the complex conjugate, which is obtained by changing the
sign of the imaginary part.
• numpy.angle() returns the angle of the complex argument. The function has
degree parameter. If true, the angle in the degree is returned, otherwise the angle
is in radians.
• NumPy − Statistical Functions

• numpy.amin() and numpy.amax()


• These functions return the minimum and the maximum from the elements in the given
• array along the specified axis.

• numpy.ptp()
• The numpy.ptp() function returns the range (maximum-minimum) of values along an
• axis.

• numpy.percentile()
• Percentile (or a centile) is a measure used in statistics indicating the value below which
a given percentage of observations in a group of observations fall. The function
• numpy.percentile() takes the following arguments.
• numpy.median()
• Median is defined as the value separating the higher half of a data sample from the
lower half.

• numpy.mean()
• Arithmetic mean is the sum of elements along an axis divided by the number of
elements.
• The numpy.mean() function returns the arithmetic mean of elements in the array. If
the axis is mentioned, it is calculated along it.

• numpy.average()
• Weighted average is an average resulting from the multiplication of each component by
a factor reflecting its importance. The numpy.average() function computes the
weighted average of elements in an array according to their respective weight given in
another array.
• The function can have an axis parameter. If the axis is not specified, the array is
• flattened.
THANK YOU

You might also like