Demo Class 15 and 16102022 (Pandas in Python)

AI
Demo Class
Pandas Library in Python
1
CONTENT
1. Python and Its Features
2. Python Libraries
3. Pandas in Data Analysis
4. VTCA Short Course : Python Developer for AI
5. VTCA Short Course : AI Specialist for AI Enginner & Data Scientist
6. VTCA Short Course : Data Analytics for BI/BA
Summary
2
2. Python Libraries
2. Python Libraries
Step-by-Step Guide to Creating Python Libraries
https://towardsdatascience.com/step-by-step-guide-
to-creating-r-and-python-libraries-e81bbea87911
https://dataindependent.com/pandas/
rows can be accessed using : df.loc[('a','one')], df.loc[('a','one'),:] df.loc[(slice(None),'one'), :]

rows can be accessed using : df['A'], df.loc[:,'A’], df.loc[:, (slice(None), 'x’)]
Multi-indexing across rows and columns : df['A']['x'], df['A','x'] or df.loc[:,('A','x’)]
To get specific items, use df.loc[('a','one'),('B','x')] or df.loc[('a','one'),'B’]
Partial slicing is done using df.loc[('a', 'one'):('b', 'two’)]
Reindexing is done with df.loc[[('a', 'one'), ('b', 'two')]] (result has two rows)
Boolean arrays of same size as index can be used for filtering, that is, select rows indexed with True
- df.loc[~df.index.isin([11])] ignores a specific index;
- df[df['A']>0] selects only positive values of column A
- df[(df['A'] > 2) & (df['B'] < 3)] shows a complex expression where use of parenthesis is important.
- df.query('a not in b') instead of df[~df['a'].isin(df['b’])]
- df.query('a in b and c < d') instead of df[df['b'].isin(df['a']) & (df['c'] < df['d'])]
Whereas df.where(df.A>0) or df[df.A>0] will return rows that match the condition
To select only rows with null values, use df[df.isnull().any(axis=1)].
Use df[~df.isnull().any(axis=1)] to do the reverse.
For MultiIndex DataFrame, isin() method can be used.
For example, df.loc[df.index.isin([('a','one'), ('b','two')])] or df.loc[df.index.isin(['a','c'], level=0)].
df.groupby('A').agg([np.sum, np.mean])
df.groupby('A').agg({'C': np.sum, 'D': lambda x: np.std(x)})
GroupBy uses a split-apply-

combine workflow. Data is split using index
or column values. A function is applied to
each group independently. The results are
combined into a single data structure. In the
apply step, data can be aggregated (sum,
mean, min, etc), transformed (normalize
data, fill missing values, etc)
or filtered (discard small groups, filter data
based on group mean, etc).
When data is in "record" form, pivot() spreads rows into columns. Method melt() does the opposite.
Methods stack() and unstack() are designed for MultiIndex objects and they're closely related to pivot.
They can be applied on Series or DataFrame. Here are a few examples:
•df.pivot(index='foo', columns='bar', values='baz'): Column 'foo' becomes the index, 'bar' values
become new columns and values of 'baz' becomes values of the new DataFrame. A more
generalized API is df.pivot_table() that allows for duplicate values of an index/column pair.
•df.melt(id_vars=['A','B']): Two columns are retained and other columns are spread into rows. Two
new columns named 'variable' (with old column names as values) and 'value' are introduced. Original
index can be retained but values will be duplicated.
•df.stack(): Columns becomes part of a new inner-most index level. If DataFrame has hierarchical
column labels, level can be specified as argument.
•df.unstack('second') or df.unstack(1): Values of index 'second' are spread into new columns. If no
argument is supplied, the inner-most index is spread.
result = pd.merge(df1, df2, how="left",
on=["key1", "key2"])
or
result = pd.merge(df1, df2, left_on='county_ID',
right_on='countyid')
Rows of one are concatenated to the other: pd.concat([df1, df2]).

To concatenate column-wise, use pd.concat([df1, df2], axis=1)
Database-style joins are possible with df.join(df1) in which the argument how takes four
values: left, right, outer, inner. By default, joins are based on index. With on argument, we can
choose to join column or index level of df with index of df1
Statistical analysis in Pandas
Statistical analysis in Pandas
Iterating Data Frames in Pandas
•iteritems() − Iterates over

the data and results in (key,
value) pairs
•iterrows() − Iterates over

the rows and results in
(index, series) pairs
•itertuples() − Iterates over

the data rows and results in
named tuples or
namedtuple
Sorting in Pandas
Operations on Text data in Pandas
Data Visualization in Pandas
Data Visualization in Pandas
4. VTCA Python Developer for AI
Python Developer
(60 hours - Online Course)
Python Variables and Operators
Python DataTypes
Python Conditions and Loops
Python Functions
Python Input and Output
Python Error Handling
Python Class/Object, Data Visualization
and Modules/Packages
Data Visualization
Data Science Problems
AI Maths
Algebra
Calculus
Statistics & Probabilities
Optimization 41
5. VTCA AI SPECIALIST- AI ENGINEER
AI Specialist - AI Engineer
Data Mining
Pipeline and Preprocessing
Exploration and Visualization
Feature Scaling & Engineering
Machine Learning Recommendation System
Regression and Classification
Unsupervised Learning
Ensemble Learning
Association Learning
RNN & LSTM & Deep Neural Network
Deep Learning
Computer Vision Segmentation
Transfer Learning
One and Two-Stage Object Detection
Natural Language Processing
Acoustic Modelling and Processing
Time Series Analysis
Recommendation System
Reinforcement Learning 42
5. VTCA AI SPECIALIST- DATA SCIENTIST
AI Specialist - Data Scientist
Data Mining
Pipeline and Preprocessing
Exploration and Visualization
Feature Scaling & Engineering
Machine Learning Recommendation System
Regression and Classification
Unsupervised Learning
Association & Ensemble Learning
RNN & LSTM & Deep Neural Network
SQL and NoSQL
Data Engineer Database Management
SQL, Execution Plan and Optimization
Database System Management
Big Data Analysis
Web Mining and Security
Data Scientist
Natural Language Processing
Time Series Analysis
Recommendation System
Reinforcement Learning 43
6. Analytics for BI/BA
Business Intelligence Analyst
BI/BA Life Cycle & Strategy
Data Warehouse Operations
Data Mining in BI/BA
Data Collection & Transformation
Explanatory Data Analysis
Data Analysis Expressions
Reporting and Dashboard
Integrating with Azure ML
Data Analytic Pipeline
Descriptive Analytics
Predictive Analytics
Diagnostic Analytics
Prescriptive Analytics
BI&BA Solution
Banking and Finance
HealthCare
Social Media
44
Transport & Logistics
THANK YOU
45
45

Demo Class 15 and 16102022 (Pandas in Python)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Demo Class 15 and 16102022 (Pandas in Python)

Uploaded by

Copyright:

Available Formats

AI

3. Pandas in Data Analysis

4. VTCA Short Course : Python Developer for AI

5. VTCA Short Course : AI Specialist for AI Enginner & Data Scientist

6. VTCA Short Course : Data Analytics for BI/BA

rows can be accessed using : df.loc[('a','one')], df.loc[('a','one'),:] df.loc[(slice(None),'one'), :]

GroupBy uses a split-apply-

Rows of one are concatenated to the other: pd.concat([df1, df2]).

•iteritems() − Iterates over

•iterrows() − Iterates over

•itertuples() − Iterates over

You might also like