You are on page 1of 45

AI

Demo Class
Pandas Library in Python

1
CONTENT
1. Python and Its Features

2. Python Libraries

3. Pandas in Data Analysis

4. VTCA Short Course : Python Developer for AI

5. VTCA Short Course : AI Specialist for AI Enginner & Data Scientist

6. VTCA Short Course : Data Analytics for BI/BA

Summary

2
1. Python and Its Features
1. Python and Its Features
1. Python and Its Features
1. Python and Its Features
1. Python and Its Features
1. Python and Its Features
1. Python and Its Features
1. Python and Its Features
1. Python and Its Features
1. Python and Its Features
1. Python and Its Features
1. Python and Its Features
1. Python and Its Features
2. Python Libraries
2. Python Libraries
Step-by-Step Guide to Creating Python Libraries

https://towardsdatascience.com/step-by-step-guide-
to-creating-r-and-python-libraries-e81bbea87911
3. Pandas in Data Analysis
3. Pandas in Data Analysis
3. Pandas in Data Analysis
https://dataindependent.com/pandas/
3. Pandas in Data Analysis
3. Pandas in Data Analysis
3. Pandas in Data Analysis
3. Pandas in Data Analysis
3. Pandas in Data Analysis

rows can be accessed using : df.loc[('a','one')], df.loc[('a','one'),:] df.loc[(slice(None),'one'), :]


rows can be accessed using : df['A'], df.loc[:,'A’], df.loc[:, (slice(None), 'x’)]
Multi-indexing across rows and columns : df['A']['x'], df['A','x'] or df.loc[:,('A','x’)]
To get specific items, use df.loc[('a','one'),('B','x')] or df.loc[('a','one'),'B’]
Partial slicing is done using df.loc[('a', 'one'):('b', 'two’)]
Reindexing is done with df.loc[[('a', 'one'), ('b', 'two')]] (result has two rows)
3. Pandas in Data Analysis

Boolean arrays of same size as index can be used for filtering, that is, select rows indexed with True
- df.loc[~df.index.isin([11])] ignores a specific index;
- df[df['A']>0] selects only positive values of column A
- df[(df['A'] > 2) & (df['B'] < 3)] shows a complex expression where use of parenthesis is important.
- df.query('a not in b') instead of df[~df['a'].isin(df['b’])]
- df.query('a in b and c < d') instead of df[df['b'].isin(df['a']) & (df['c'] < df['d'])]
3. Pandas in Data Analysis

Whereas df.where(df.A>0) or df[df.A>0] will return rows that match the condition
To select only rows with null values, use df[df.isnull().any(axis=1)].
Use df[~df.isnull().any(axis=1)] to do the reverse.
For MultiIndex DataFrame, isin() method can be used.
For example, df.loc[df.index.isin([('a','one'), ('b','two')])] or df.loc[df.index.isin(['a','c'], level=0)].
3. Pandas in Data Analysis
df.groupby('A').agg([np.sum, np.mean])
df.groupby('A').agg({'C': np.sum, 'D': lambda x: np.std(x)})

GroupBy uses a split-apply-


combine workflow. Data is split using index
or column values. A function is applied to
each group independently. The results are
combined into a single data structure. In the
apply step, data can be aggregated (sum,
mean, min, etc), transformed (normalize
data, fill missing values, etc)
or filtered (discard small groups, filter data
based on group mean, etc).
3. Pandas in Data Analysis
3. Pandas in Data Analysis
When data is in "record" form, pivot() spreads rows into columns. Method melt() does the opposite.
Methods stack() and unstack() are designed for MultiIndex objects and they're closely related to pivot.
They can be applied on Series or DataFrame. Here are a few examples:

•df.pivot(index='foo', columns='bar', values='baz'): Column 'foo' becomes the index, 'bar' values
become new columns and values of 'baz' becomes values of the new DataFrame. A more
generalized API is df.pivot_table() that allows for duplicate values of an index/column pair.

•df.melt(id_vars=['A','B']): Two columns are retained and other columns are spread into rows. Two
new columns named 'variable' (with old column names as values) and 'value' are introduced. Original
index can be retained but values will be duplicated.

•df.stack(): Columns becomes part of a new inner-most index level. If DataFrame has hierarchical
column labels, level can be specified as argument.

•df.unstack('second') or df.unstack(1): Values of index 'second' are spread into new columns. If no
argument is supplied, the inner-most index is spread.
3. Pandas in Data Analysis
result = pd.merge(df1, df2, how="left",
on=["key1", "key2"])
or
result = pd.merge(df1, df2, left_on='county_ID',
right_on='countyid')

Rows of one are concatenated to the other: pd.concat([df1, df2]).


To concatenate column-wise, use pd.concat([df1, df2], axis=1)
Database-style joins are possible with df.join(df1) in which the argument how takes four
values: left, right, outer, inner. By default, joins are based on index. With on argument, we can
choose to join column or index level of df with index of df1
3. Pandas in Data Analysis
Statistical analysis in Pandas
3. Pandas in Data Analysis
Statistical analysis in Pandas
3. Pandas in Data Analysis
Iterating Data Frames in Pandas

•iteritems() − Iterates over


the data and results in (key,
value) pairs
3. Pandas in Data Analysis
Iterating Data Frames in Pandas

•iterrows() − Iterates over


the rows and results in
(index, series) pairs
3. Pandas in Data Analysis
Iterating Data Frames in Pandas

•itertuples() − Iterates over


the data rows and results in
named tuples or
namedtuple
3. Pandas in Data Analysis
Sorting in Pandas
3. Pandas in Data Analysis
Operations on Text data in Pandas
3. Pandas in Data Analysis
Data Visualization in Pandas
3. Pandas in Data Analysis
Data Visualization in Pandas
4. VTCA Python Developer for AI
Python Developer
(60 hours - Online Course)
Python Variables and Operators
Python DataTypes
Python Conditions and Loops
Python Functions
Python Input and Output
Python Error Handling
Python Class/Object, Data Visualization
and Modules/Packages
Data Visualization
Data Science Problems
AI Maths
Algebra
Calculus
Statistics & Probabilities
Optimization 41
5. VTCA AI SPECIALIST- AI ENGINEER
AI Specialist - AI Engineer
(141 hours - Online Course)

Data Mining
Pipeline and Preprocessing
Exploration and Visualization
Feature Scaling & Engineering
Machine Learning Recommendation System
Regression and Classification
Unsupervised Learning
Ensemble Learning
Association Learning
RNN & LSTM & Deep Neural Network
Deep Learning
Computer Vision Segmentation
Transfer Learning
One and Two-Stage Object Detection
Natural Language Processing
Acoustic Modelling and Processing
Time Series Analysis
Recommendation System
Reinforcement Learning 42
5. VTCA AI SPECIALIST- DATA SCIENTIST
AI Specialist - Data Scientist
(141 hours - Online Course)

Data Mining
Pipeline and Preprocessing
Exploration and Visualization
Feature Scaling & Engineering
Machine Learning Recommendation System
Regression and Classification
Unsupervised Learning
Association & Ensemble Learning
RNN & LSTM & Deep Neural Network
SQL and NoSQL
Data Engineer Database Management
SQL, Execution Plan and Optimization
Database System Management
Big Data Analysis
Web Mining and Security
Data Scientist
Natural Language Processing
Time Series Analysis
Recommendation System
Reinforcement Learning 43
6. Analytics for BI/BA
Business Intelligence Analyst
(96 hours - Online Course)
BI/BA Life Cycle & Strategy
Data Warehouse Operations
Data Mining in BI/BA
Data Collection & Transformation
Explanatory Data Analysis
Data Analysis Expressions
Reporting and Dashboard
Integrating with Azure ML
Data Analytic Pipeline
Descriptive Analytics
Predictive Analytics
Diagnostic Analytics
Prescriptive Analytics
BI&BA Solution
Banking and Finance
HealthCare
Social Media
44
Transport & Logistics
THANK YOU

45
45

You might also like