0% found this document useful (0 votes)

44 views8 pages

Pandas Data Analysis Techniques

Pandas is a Python library used for working with and analyzing datasets. It allows users to clean, explore, and manipulate data efficiently. Pandas can be used to obtain summary statistics from data, correlate relationships between variables, and select specific rows and columns for further analysis. DataFrames are the primary data structure used in Pandas to organize and manipulate data.

Uploaded by

Priya S B

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Topics covered

Pandas,
Data Indexing,
Data Standards,
Data Selection,
Data Efficiency,
Data Frameworks,
DataFrame,
Data Statistics,
Data Overview,
Data Reporting

0% found this document useful (0 votes)

44 views8 pages

Pandas Data Analysis Techniques

Uploaded by

Priya S B

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Topics covered

Pandas,
Data Indexing,
Data Standards,
Data Selection,
Data Efficiency,
Data Frameworks,
DataFrame,
Data Statistics,
Data Overview,
Data Reporting

What is Pandas?

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.
The name "Pandas" has a reference to both "Panel Data", and "Python Data
Analysis" and was created by Wes McKinney in 2008.
Why Use Pandas?
Pandas allows us to analyze big data and make conclusions based on statistical
theories.
Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.
Pandas gives you answers about the data. Like:

Is there a correlation between two or more columns?

What is average value?
Max value?
Min value?
Example:
import pandas

mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}

myvar = pandas.DataFrame(mydataset)

print(myvar)
Data Manipulation in Python using Pandas
Creating DataFrame
Let’s dive into the programming part. Our first aim is to create a Pandas
dataframe in Python, as you may know, pandas is one of the most used libraries
of Python.
# Importing the pandas library
import pandas as pd
# creating a dataframe object
student_register = pd.DataFrame()
# assigning values to the
# rows and columns of the dataframe
student_register['Name'] = ['Abhijit','Smriti', 'Akash', 'Roshni']
student_register['Age'] = [20, 19, 20, 14]
student_register['Student'] = [False, True,True, False]
print(student_register)
Output:
Name Age Student
0 Abhijit 20 False
1 Smriti 19 True
2 Akash 20 True
3 Roshni 14 False
Adding data in DataFrame using Append Function
# creating a new pandas
# series object
new_person = pd.Series(['Mansi', 19, True], index = ['Name', 'Age', 'Student'])
# using the .append() function
# to add that row to the dataframe
student_register.append(new_person, ignore_index = True)
print(student_register)
Output:

Name Age Student

0 Abhijit 20 False
1 Smriti 19 True
2 Akash 20 True
3 Roshni 14 False

Getting Statistical Analysis of Data

Before processing and wrangling any data you need to get the total overview of
it, which includes statistical conclusions like standard deviation(std), mean and
it’s quartile distributions.

# for showing the statistical

# info of the dataframe
print('Describe')
print(student_register.describe())
Output:
Describe
Age
count 4.000000
mean 18.250000
std 2.872281
min 14.000000
25% 17.750000
50% 19.500000
75% 20.000000
max 20.000000
The description of the output given by .describe() method is as follows:

 count is the number of rows in the dataframe.

 mean is the mean value of all the entries in the “Age” column.
 std is the standard deviation of the corresponding column.
 min and max are the minimum and maximum entry in the column
respectively.
 25%, 50% and 75% are the First Quartiles, Second Quartile(Median) and
Third Quartile respectively, which gives us important info on the
distribution of the dataset and makes it simpler to apply an ML model.
Dropping Columns from Data
Let’s drop a column from the data. We will use the drop function from the pandas.
We will keep axis = 1 for columns.

students = student_register.drop('Age', axis=1)

print(students.head())

Output:

Name Student
0 Abhijit False
1 Smriti True
2 Akash True
3 Roshni False
From the output, we can see that the ‘Age’ column is dropped.
Dropping Rows from Data
Let’s try dropping a row from the dataset, for this, we will use drop function.
We will keep axis=0.

students = students.drop(2, axis=0)

print(students.head())
Output:

Name Student
0 Abhijit False
1 Smriti True
3 Roshni False
In the output we can see that the 2 row is dropped.

Indexing and Selecting Data with Pandas

Indexing in Pandas :
Indexing in pandas means simply selecting particular rows and columns of data
from a DataFrame. Indexing could mean selecting all the rows and some of the
columns, some of the rows and all of the columns, or some of each of the rows
and columns. Indexing can also be known as Subset Selection.
Let’s take a DataFrame with some fake data, now we perform indexing on this
DataFrame. In this, we are selecting some rows and some columns from a
DataFrame. Dataframe with dataset.

Selecting a single columns

# importing pandas package

import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving columns by indexing operator
first = data["Age"]
print(first)
OUTPUT

Selecting multiple columns

# importing pandas package
import pandas as pd
# making daTa frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving multiple columns by indexing operator
first = data[["Age", "College", "Salary"]]
first
Output:

Indexing a DataFrame using .loc[ ] :

Selecting a single row
# importing pandas package
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving row by loc method
first = data.loc["Avery Bradley"]
second = data.loc["R.J. Hunter"]
print(first, "\n\n\n", second)
OUTPUT

EDA Unit2
No ratings yet
EDA Unit2
99 pages
EDA Unit II
No ratings yet
EDA Unit II
117 pages
Python Data Science: Pandas & ML Basics
100% (1)
Python Data Science: Pandas & ML Basics
41 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
33 pages
Python Pandas: Data Manipulation Guide
No ratings yet
Python Pandas: Data Manipulation Guide
84 pages
Pandas
No ratings yet
Pandas
26 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Creating DataFrames with Python Pandas
No ratings yet
Creating DataFrames with Python Pandas
5 pages
Pandas Module Overview and Usage Guide
No ratings yet
Pandas Module Overview and Usage Guide
15 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Pandas
No ratings yet
Pandas
27 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Azure Notebook: Jupyter Basics Guide
No ratings yet
Azure Notebook: Jupyter Basics Guide
14 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
16 pages
Data Manipulation with Pandas Basics
No ratings yet
Data Manipulation with Pandas Basics
36 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Introduction to Pandas for Data Analysis
No ratings yet
Introduction to Pandas for Data Analysis
4 pages
Pandas Data Structures and Operations
100% (2)
Pandas Data Structures and Operations
46 pages
Introduction to Pandas for Data Science
No ratings yet
Introduction to Pandas for Data Science
14 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Creating and Managing DataFrames in Python
No ratings yet
Creating and Managing DataFrames in Python
41 pages
Dataframes-I (Create - Selection)
No ratings yet
Dataframes-I (Create - Selection)
12 pages
Introduction to Pandas for Data Analysis
No ratings yet
Introduction to Pandas for Data Analysis
12 pages
Introduction to Python Pandas Library
No ratings yet
Introduction to Python Pandas Library
44 pages
Introduction to Pandas Library in Python
No ratings yet
Introduction to Pandas Library in Python
39 pages
Pandas and Python
No ratings yet
Pandas and Python
24 pages
Getting Start With Pandas
No ratings yet
Getting Start With Pandas
11 pages
Pandas For Machine Learning
No ratings yet
Pandas For Machine Learning
10 pages
Lab 9
No ratings yet
Lab 9
9 pages
Pandas DataFrame: Syntax and Usage
No ratings yet
Pandas DataFrame: Syntax and Usage
70 pages
Introduction to Pandas DataFrames
No ratings yet
Introduction to Pandas DataFrames
25 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
4 pages
Introduction to Pandas Library
No ratings yet
Introduction to Pandas Library
31 pages
Mastering Pandas: A Comprehensive Guide
No ratings yet
Mastering Pandas: A Comprehensive Guide
13 pages
Creating DataFrames in Python Pandas
No ratings yet
Creating DataFrames in Python Pandas
10 pages
Data Science Notes Unit-1 Part - 2
No ratings yet
Data Science Notes Unit-1 Part - 2
22 pages
Ip Study
No ratings yet
Ip Study
18 pages
Python Pandas Data Manipulation Guide
No ratings yet
Python Pandas Data Manipulation Guide
11 pages
Mastering Pandas: DataFrame Operations
100% (2)
Mastering Pandas: DataFrame Operations
33 pages
Introduction to Pandas for Data Analysis
No ratings yet
Introduction to Pandas for Data Analysis
25 pages
Pandas Dataframe
No ratings yet
Pandas Dataframe
8 pages
Python Unit Iv - Pandas
No ratings yet
Python Unit Iv - Pandas
36 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
9 pages
Unit 3
No ratings yet
Unit 3
10 pages
Data Handling with Python & SQL Guide
No ratings yet
Data Handling with Python & SQL Guide
42 pages
Introduction to Pandas DataFrames
100% (1)
Introduction to Pandas DataFrames
21 pages
Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Pandas DataFrame Creation Guide
No ratings yet
Pandas DataFrame Creation Guide
23 pages
Python 2.1.2
No ratings yet
Python 2.1.2
7 pages
Python
No ratings yet
Python
16 pages
Data Frame
No ratings yet
Data Frame
17 pages
Pandas Data Manipulation Guide
No ratings yet
Pandas Data Manipulation Guide
15 pages
Data Analysis with Pandas
No ratings yet
Data Analysis with Pandas
31 pages
Data Analysis - 5th Unit
No ratings yet
Data Analysis - 5th Unit
14 pages
Pandas Series & DataFrame Guide
No ratings yet
Pandas Series & DataFrame Guide
60 pages
Python Pandas EDA Guide for BTech Students
No ratings yet
Python Pandas EDA Guide for BTech Students
15 pages
Krisha CNS Lab Manual
No ratings yet
Krisha CNS Lab Manual
11 pages
Touch Screen Control for AC Motors
No ratings yet
Touch Screen Control for AC Motors
5 pages
CART
No ratings yet
CART
26 pages
BCA 6th Sem Network Programming
No ratings yet
BCA 6th Sem Network Programming
6 pages
Experiment # 2 (Basic Logic Gates) at Proteus
No ratings yet
Experiment # 2 (Basic Logic Gates) at Proteus
5 pages
Comprehensive User Stories for Food Delivery Platform
No ratings yet
Comprehensive User Stories for Food Delivery Platform
11 pages
Digital Signal Processing Course
No ratings yet
Digital Signal Processing Course
2 pages
Sandeep Kumar - Original
No ratings yet
Sandeep Kumar - Original
2 pages
Hexagon PPM Compatibility Matrix - Product Report: Enterprise Database Server
No ratings yet
Hexagon PPM Compatibility Matrix - Product Report: Enterprise Database Server
2 pages
Contrarian ETF Strategy Using VRP
No ratings yet
Contrarian ETF Strategy Using VRP
16 pages
Binomial Distribution & Hypothesis Testing
No ratings yet
Binomial Distribution & Hypothesis Testing
8 pages
Professional 3D Printer Buyers Guide Update - Aug 2024 - WEB
No ratings yet
Professional 3D Printer Buyers Guide Update - Aug 2024 - WEB
8 pages
Artificial Intelligence and Robotics Huimin Lu PDF Download
No ratings yet
Artificial Intelligence and Robotics Huimin Lu PDF Download
128 pages
Optimization and Simulation
No ratings yet
Optimization and Simulation
20 pages
Anritsu 2015
No ratings yet
Anritsu 2015
826 pages
Folio Views 3.1 Guide for Columbia Law
No ratings yet
Folio Views 3.1 Guide for Columbia Law
11 pages
Saudi Industrial Services Expansion
No ratings yet
Saudi Industrial Services Expansion
20 pages
C Programming Practice Problems Guide
No ratings yet
C Programming Practice Problems Guide
4 pages
Telephone Number Due Date: Bill Mail Service Tax Invoice
No ratings yet
Telephone Number Due Date: Bill Mail Service Tax Invoice
3 pages
SRM Institute of Science and Technology
No ratings yet
SRM Institute of Science and Technology
7 pages
Assessment 2: BSBHRM506 - Manage Recruitment Selection and Induction Process
No ratings yet
Assessment 2: BSBHRM506 - Manage Recruitment Selection and Induction Process
2 pages
Firepower 4200 Datasheet
No ratings yet
Firepower 4200 Datasheet
14 pages
QUE Practical Linux (2000)
100% (1)
QUE Practical Linux (2000)
704 pages
Boot
No ratings yet
Boot
13 pages
Alpha 5i PDF
No ratings yet
Alpha 5i PDF
4 pages
Unplugged Resource: Computational Thinking
No ratings yet
Unplugged Resource: Computational Thinking
34 pages
Diode Behavior: Voltage and Current Analysis
No ratings yet
Diode Behavior: Voltage and Current Analysis
3 pages
Google PESTEL/PESTLE Analysis & Recommendations
No ratings yet
Google PESTEL/PESTLE Analysis & Recommendations
4 pages
File Handling Concepts in C++
No ratings yet
File Handling Concepts in C++
9 pages
Uvodbc
No ratings yet
Uvodbc
182 pages

Pandas Data Analysis Techniques

Uploaded by

Pandas Data Analysis Techniques

Uploaded by

What is Pandas?

Pandas is a Python library used for working with data sets.

Is there a correlation between two or more columns?

Name Age Student

Getting Statistical Analysis of Data

# for showing the statistical

 count is the number of rows in the dataframe.

students = student_register.drop('Age', axis=1)

students = students.drop(2, axis=0)

Indexing and Selecting Data with Pandas

Selecting a single columns

# importing pandas package

Selecting multiple columns

Indexing a DataFrame using .loc[ ] :

You might also like