Welcome to Scribd!

Dataframes and Series: Allen Downey

Uploaded by

0% found this document useful (0 votes)

12 views27 pages

This document discusses exploratory data analysis in Python using Pandas and Matplotlib. It introduces DataFrames and Series for working with tabular data. The document reads birth weight data from the National Survey of Family Growth into a DataFrame and explores the data through filtering, aggregating, plotting and other basic operations. These techniques allow examining the data, finding patterns and answering questions like determining average birth weight in the United States.

Original Description:

Original Title

chapter1(2)

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

12 views27 pages

Dataframes and Series: Allen Downey

Uploaded by

ALEJANDRA NORIEGA TORRES

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 27

Search inside document

Dataframes and

Series
E X P L O R AT O R Y D ATA A N A LY S I S I N P Y T H O N

Allen Downey
Professor, Olin College
Using data to answer questions
What is the average birth weight of babies in the United States?

Find appropriate data, or collect it

Read data in your development environment

Clean and validate

EXPLORATORY DATA ANALYSIS IN PYTHON

National Survey of Family Growth (NSFG)
NSFG data, from the National
Center for Health Statistics

"nationally representative of
women 15-44 years of age in
the ... United States

"information on family life,

marriage and divorce,
pregnancy, infertility, use of
contraception, and general
and reproductive health."

EXPLORATORY DATA ANALYSIS IN PYTHON

Reading data
import pandas as pd
nsfg = pd.read_hdf('nsfg.hdf5', 'nsfg')
type(nsfg)

pandas.core.frame.DataFrame

EXPLORATORY DATA ANALYSIS IN PYTHON

Reading data
nsfg.head()

caseid outcome birthwgt_lb1 birthwgt_oz1 prglngth nbrnaliv agecon \

0 60418 1 5.0 4.0 40 1.0 2000
1 60418 1 4.0 12.0 36 1.0 2291
2 60418 1 5.0 4.0 36 1.0 3241
3 60419 6 NaN NaN 33 NaN 3650
4 60420 1 8.0 13.0 41 1.0 2191

agepreg hpagelb wgt2013_2015

0 2075.0 22.0 3554.964843
1 2358.0 25.0 3554.964843
2 3308.0 52.0 3554.964843
3 NaN NaN 2484.535358
4 2266.0 24.0 2903.782914

EXPLORATORY DATA ANALYSIS IN PYTHON

Columns and rows
nsfg.shape

(9358, 10)

nsfg.columns

Index(['caseid', 'outcome', 'birthwgt_lb1', 'birthwgt_oz1', 'prglngth',

'nbrnaliv', 'agecon', 'agepreg', 'hpagelb', 'wgt2013_2015'],
dtype='object')

EXPLORATORY DATA ANALYSIS IN PYTHON

Columns and rows

EXPLORATORY DATA ANALYSIS IN PYTHON

Each column is a Series
pounds = nsfg['birthwgt_lb1']
type(pounds)

pandas.core.series.Series

EXPLORATORY DATA ANALYSIS IN PYTHON

Each column is a series
pounds.head()

0 5.0
1 4.0
2 5.0
3 NaN
4 8.0
Name: birthwgt_lb1, dtype: float64

EXPLORATORY DATA ANALYSIS IN PYTHON

Let's start exploring!
E X P L O R AT O R Y D ATA A N A LY S I S I N P Y T H O N
Clean and Validate
E X P L O R AT O R Y D ATA A N A LY S I S I N P Y T H O N

Allen Downey
Professor, Olin College
Selecting columns
pounds = nsfg['birthwgt_lb1']

ounces = nsfg['birthwgt_oz1']

EXPLORATORY DATA ANALYSIS IN PYTHON

pounds.value_counts().sort_index()

0.0 6
1.0 34
2.0 47
3.0 67
4.0 196
5.0 586
6.0 1666
7.0 2146
8.0 1168
9.0 363
10.0 82
11.0 17
12.0 7
13.0 2
14.0 2
17.0 1
98.0 1
99.0 94
Name: birthwgt_lb1, dtype: int64

EXPLORATORY DATA ANALYSIS IN PYTHON

EXPLORATORY DATA ANALYSIS IN PYTHON
Describe
pounds.describe()

count 6485.000000
mean 8.055204
std 11.178893
min 0.000000
25% 6.000000
50% 7.000000
75% 8.000000
max 99.000000
Name: birthwgt_lb1, dtype: float64

EXPLORATORY DATA ANALYSIS IN PYTHON

Replace
pounds = pounds.replace([98, 99], np.nan)
pounds.mean()

6.703286384976526

ounces.replace([98, 99], np.nan, inplace=True)

EXPLORATORY DATA ANALYSIS IN PYTHON

Arithmetic with Series
birth_weight = pounds + ounces / 16.0
birth_weight.describe()

count 6355.000000
mean 7.120978
std 1.422236
min 0.000000
25% 6.375000
50% 7.187500
75% 8.000000
max 17.937500
dtype: float64

EXPLORATORY DATA ANALYSIS IN PYTHON

Let's practice!
E X P L O R AT O R Y D ATA A N A LY S I S I N P Y T H O N
Filter and Visualize
E X P L O R AT O R Y D ATA A N A LY S I S I N P Y T H O N

Allen Downey
Professor, Olin College
Histogram
import matplotlib.pyplot as plt

plt.hist(birth_weight.dropna(), bins=30)

plt.xlabel('Birth weight (lb)')

plt.ylabel('Fraction of births')
plt.show()

EXPLORATORY DATA ANALYSIS IN PYTHON

EXPLORATORY DATA ANALYSIS IN PYTHON
Boolean Series
preterm = nsfg['prglngth'] < 37
preterm.head()

0 False
1 True
2 True
3 True
4 False
Name: prglngth, dtype: bool

EXPLORATORY DATA ANALYSIS IN PYTHON

Boolean Series
preterm.sum()

3742

preterm.mean()

0.39987176747168196

EXPLORATORY DATA ANALYSIS IN PYTHON

Filtering
preterm_weight = birth_weight[preterm]
preterm_weight.mean()

5.577598314606742

full_term_weight = birth_weight[~preterm]
full_term_weight.mean()

7.372323879231473

EXPLORATORY DATA ANALYSIS IN PYTHON

Filtering
Other logical operators:

& for AND (both must be true)

| for OR (either or both can be true)

Example:

birth_weight[A & B] # both true

birth_weight[A | B] # either or both true

EXPLORATORY DATA ANALYSIS IN PYTHON

Resampling
NSFG is not representative

Some groups are "oversampled"

We can correct using resample_rows_weighted()

EXPLORATORY DATA ANALYSIS IN PYTHON

Finish it off!
E X P L O R AT O R Y D ATA A N A LY S I S I N P Y T H O N

Vineland Adaptive Behaviour Scales (VABS) - II
Document6 pages
Vineland Adaptive Behaviour Scales (VABS) - II
Alan Marsh
67% (3)
Dataframes and Series: Allen Downey
Document27 pages
Dataframes and Series: Allen Downey
Aarthi Muthuswamy
No ratings yet
Dataframes and Series: Allen Downey
Document27 pages
Dataframes and Series: Allen Downey
ALEJANDRA NORIEGA TORRES
No ratings yet
Exploratory Data Analysis in Python - Chapter1
Document27 pages
Exploratory Data Analysis in Python - Chapter1
ums kams
No ratings yet
Exploring Relationships: Allen Downey
Document38 pages
Exploring Relationships: Allen Downey
ALEJANDRA NORIEGA TORRES
No ratings yet
Limits of Simple Regression: Allen Downey
Document43 pages
Limits of Simple Regression: Allen Downey
ALEJANDRA NORIEGA TORRES
No ratings yet
Limits of Simple Regression: Allen Downey
Document43 pages
Limits of Simple Regression: Allen Downey
Aarthi Muthuswamy
No ratings yet
Probability Mass Functions: Allen Downey
Document37 pages
Probability Mass Functions: Allen Downey
ALEJANDRA NORIEGA TORRES
No ratings yet
Chapter 4
Document43 pages
Chapter 4
Komi David ABOTSITSE
No ratings yet
Chapter 1
Document31 pages
Chapter 1
Cerveza La Gaceta
No ratings yet
A Complete Guide To Survival Analysis in Python, Part 3 - KDnuggets
Document22 pages
A Complete Guide To Survival Analysis in Python, Part 3 - KDnuggets
8c354be21d
No ratings yet
Chapter 1
Document30 pages
Chapter 1
Komi David ABOTSITSE
No ratings yet
Powerlifting Analysis
Document35 pages
Powerlifting Analysis
api-357199306
No ratings yet
Chapter 3
Document54 pages
Chapter 3
Komi David ABOTSITSE
No ratings yet
Guido's Guide To PROC MEANS - A Tutorial For Beginners Using The SAS® System
Document11 pages
Guido's Guide To PROC MEANS - A Tutorial For Beginners Using The SAS® System
Rajasekhar Reddy
No ratings yet
Chapter 3
Document29 pages
Chapter 3
David Emad
No ratings yet
Chapter 2
Document73 pages
Chapter 2
Komi David ABOTSITSE
No ratings yet
Cluster Analysis in Python Chapter1 PDF
Document31 pages
Cluster Analysis in Python Chapter1 PDF
Fgpeqw
No ratings yet
Chapter 1
Document22 pages
Chapter 1
Rodrigo Pasten Cortes
No ratings yet
Practical 1
Document3 pages
Practical 1
Vatsal
No ratings yet
Cia 4 ML
Document60 pages
Cia 4 ML
Shivangi Gupta
No ratings yet
Hungerford 1987
Document4 pages
Hungerford 1987
王琦
No ratings yet
Interpreting Statistical Results in Research
Document52 pages
Interpreting Statistical Results in Research
Janneth On Paysbook
No ratings yet
Segmentation-Factor Analysis
Document50 pages
Segmentation-Factor Analysis
prashant200690
No ratings yet
Quiz 3 - Nguyen Van Chieu
Document4 pages
Quiz 3 - Nguyen Van Chieu
Đoàn Nhâm Tân
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
Document29 pages
Classification With Decision Trees I: Instructor: Qiang Yang
Poornima Venkatesh
No ratings yet
Advanced Statistical Methods II
Document44 pages
Advanced Statistical Methods II
Sherry Wang
No ratings yet
Introduction To EFA: Jennifer Brussow
Document30 pages
Introduction To EFA: Jennifer Brussow
Eduard Sondakh
No ratings yet
Sol Anova Stata PDF
Document14 pages
Sol Anova Stata PDF
Lourd Catalan
No ratings yet
Easy Differential Expression: F. Hahne and W. Huber
Document6 pages
Easy Differential Expression: F. Hahne and W. Huber
ulirschj
No ratings yet
Stat Intro Comments
Document14 pages
Stat Intro Comments
cristianodavidjones
No ratings yet
AIML Expt
Document7 pages
AIML Expt
D Slm
No ratings yet
Data Normal - HTM
Document10 pages
Data Normal - HTM
Eka Chairun
No ratings yet
StudyQuestions Regression (Logarithms)
Document21 pages
StudyQuestions Regression (Logarithms)
master8875
No ratings yet
Hypothesis Testing
Document17 pages
Hypothesis Testing
Jose Maria Arjona Caballero
No ratings yet
21BCSA92 - Nikhil - Khatri ML Project
Document38 pages
21BCSA92 - Nikhil - Khatri ML Project
cse.21bcsa92
No ratings yet
Pairwise Sequ Datab: Appos® ©mimfoimrdaifcltes
Document25 pages
Pairwise Sequ Datab: Appos® ©mimfoimrdaifcltes
Ved Classes
No ratings yet
Chapter4 PDF
Document34 pages
Chapter4 PDF
Hana Banana
No ratings yet
Chapter4 PDF
Document33 pages
Chapter4 PDF
Uriel Zamora
No ratings yet
Chapter4 PDF
Document33 pages
Chapter4 PDF
Uriel Zamora
No ratings yet
Chapter4 PDF
Document33 pages
Chapter4 PDF
Uriel Zamora
No ratings yet
Ijssc 35 PDF
Document8 pages
Ijssc 35 PDF
Tamba Junior
No ratings yet
Chapter4 PDF
Document33 pages
Chapter4 PDF
Uriel Zamora
No ratings yet
R Assignment
Document9 pages
R Assignment
Milind Baranwal
No ratings yet
Survival Part 5
Document36 pages
Survival Part 5
bmartindoyle6396
No ratings yet
Lampiran 6 Analisis Validitas Dan Reliabilitas
Document7 pages
Lampiran 6 Analisis Validitas Dan Reliabilitas
Vraja Kisori
No ratings yet
Oe-Introduction To Data Analytics - DR Dasharathraj K Shetty. Lecture Notes - Only For Teaching-Learning Purpose in The Classroom - Do Not Distribute
Document14 pages
Oe-Introduction To Data Analytics - DR Dasharathraj K Shetty. Lecture Notes - Only For Teaching-Learning Purpose in The Classroom - Do Not Distribute
Khushi Chaudhary
No ratings yet
Sparse Principal Component Analysis
Document23 pages
Sparse Principal Component Analysis
方鑫然
No ratings yet
Hypothesis Testing PDF
Document9 pages
Hypothesis Testing PDF
mdkashif1299
No ratings yet
The Use and Usefulness of Amplification Curve Analysis in Quantitative PCR
Document33 pages
The Use and Usefulness of Amplification Curve Analysis in Quantitative PCR
Anonymous chCYuhh
No ratings yet
Npar Tests: (Dataset1) C:/Users/Indah Bau/Documents/Cita/Asi - Sav
Document7 pages
Npar Tests: (Dataset1) C:/Users/Indah Bau/Documents/Cita/Asi - Sav
yodhaarspino
No ratings yet
1PNO The Rasch Testlet Model
Document25 pages
1PNO The Rasch Testlet Model
farisafny
No ratings yet
Reabilitas: Case Processing Summary
Document3 pages
Reabilitas: Case Processing Summary
Izzah Luphe Dya
No ratings yet
Factor Analysis of Caregiver Strain Questionnaire in Clinical Research
Document8 pages
Factor Analysis of Caregiver Strain Questionnaire in Clinical Research
Amaresh C Psw
No ratings yet
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
Document38 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
Sarah Mendes
100% (1)
2012 Entropy StatstInform A.BayesianPerspective
Document12 pages
2012 Entropy StatstInform A.BayesianPerspective
Kenneth Ugalde
No ratings yet
Introduction To The Course: Rob Reider
Document36 pages
Introduction To The Course: Rob Reider
Aarthi Muthuswamy
No ratings yet
Explore: Uji Wilcoxon Pre-Post Sistol
Document5 pages
Explore: Uji Wilcoxon Pre-Post Sistol
Tri Putranto
No ratings yet
Frequencies: Lampiran 3 Statistik Deskriptif Variabel Penelitian
Document22 pages
Frequencies: Lampiran 3 Statistik Deskriptif Variabel Penelitian
Wahyou Trie
No ratings yet
The Matrixial Brain: Experiments in Reality
From Everand
The Matrixial Brain: Experiments in Reality
Paul Chaplin
No ratings yet
Bioinformation Discovery: Data to Knowledge in Biology
From Everand
Bioinformation Discovery: Data to Knowledge in Biology
Pandjassarame Kangueane
No ratings yet
DT-EDU-DEN80EDU01ABDS002 What Is Data Virtualization
Document21 pages
DT-EDU-DEN80EDU01ABDS002 What Is Data Virtualization
ALEJANDRA NORIEGA TORRES
No ratings yet
Chapter 1
Document43 pages
Chapter 1
ALEJANDRA NORIEGA TORRES
No ratings yet
Introduction To Exploratory Data Analysis: Justin Bois
Document45 pages
Introduction To Exploratory Data Analysis: Justin Bois
ALEJANDRA NORIEGA TORRES
No ratings yet
Probability Mass Functions: Allen Downey
Document37 pages
Probability Mass Functions: Allen Downey
ALEJANDRA NORIEGA TORRES
No ratings yet
Jaundice PDF
Document7 pages
Jaundice PDF
Sib Asuncion
No ratings yet
Safety Officer: Curriculam Viate Shahid Hussain
Document3 pages
Safety Officer: Curriculam Viate Shahid Hussain
abdullah
No ratings yet
WCC Fall 2020 v18n3 Final r1 p.40-47 Nutrition
Document8 pages
WCC Fall 2020 v18n3 Final r1 p.40-47 Nutrition
Bella Pratiwi
No ratings yet
Diabetes Mellitus and Chronic Obstructive Pulmonary Disease
Document6 pages
Diabetes Mellitus and Chronic Obstructive Pulmonary Disease
agil_ww
No ratings yet
DTP Poblacion Annex-E-2 Final
Document7 pages
DTP Poblacion Annex-E-2 Final
Mario N. Mari
No ratings yet
What Is An Intravenous Pyelogram (IVP) ?: X-Ray Ureters Bladder Contrast Material
Document6 pages
What Is An Intravenous Pyelogram (IVP) ?: X-Ray Ureters Bladder Contrast Material
Virgz Pal
No ratings yet
Wendy Silvers Master Series Interview - Dr. Judy Mikovits
Document92 pages
Wendy Silvers Master Series Interview - Dr. Judy Mikovits
Daniela30
No ratings yet
Skripsi Andi Rizal Taher
Document73 pages
Skripsi Andi Rizal Taher
imanudin kamil
No ratings yet
Nasopharyngeal Angiofibroma
Document51 pages
Nasopharyngeal Angiofibroma
kamal saud
No ratings yet
731 - Rcs Level 2 Jordan Raul Gala Taipe: Date / Time Student Score Passing Score Result
Document2 pages
731 - Rcs Level 2 Jordan Raul Gala Taipe: Date / Time Student Score Passing Score Result
Jordan Gala Taipe
No ratings yet
Psychosocial Integrity
Document48 pages
Psychosocial Integrity
popota
No ratings yet
Shillong Project
Document27 pages
Shillong Project
Pranay Jindal
No ratings yet
Cardiac Rehabilitation Following Open-Heart Surgery in Children
Document7 pages
Cardiac Rehabilitation Following Open-Heart Surgery in Children
egajaya
No ratings yet
Massager Pro: User Manual
Document8 pages
Massager Pro: User Manual
diegomatta
No ratings yet
Business Objective
Document29 pages
Business Objective
Rahul Kumar
No ratings yet
Russell-Taylor Delta-Tibial Interlocking Nail Technique
Document30 pages
Russell-Taylor Delta-Tibial Interlocking Nail Technique
mikeydaman
No ratings yet
Masterlist of Teaching and Non Teaching For Health
Document8 pages
Masterlist of Teaching and Non Teaching For Health
Jasmin Kerre Villarin
No ratings yet
Forcetriad Value Analysis Brief - ValleyLab
Document10 pages
Forcetriad Value Analysis Brief - ValleyLab
melquisedec
No ratings yet
Hale, Are Lesbians Women PDF
Document6 pages
Hale, Are Lesbians Women PDF
Tharántula Mulatu
No ratings yet
Final SHS 12 PR 2 Q1 Module 2
Document10 pages
Final SHS 12 PR 2 Q1 Module 2
Beakristine Cabrera
No ratings yet
Histological Approach To Nutritional Status of The Liver and Gut in Atlantic Cod Larvae With and Without Bile Salt Supplementation
Document14 pages
Histological Approach To Nutritional Status of The Liver and Gut in Atlantic Cod Larvae With and Without Bile Salt Supplementation
Jan Szczepaniak
No ratings yet
Stain Removal Effect of Novel Papain-And Bromelain-Containing Gels Applied To Enamel
Document6 pages
Stain Removal Effect of Novel Papain-And Bromelain-Containing Gels Applied To Enamel
SultanPatuahonPanggabean
No ratings yet
Physical Education: Quarter 2 - Module 1 Active Recreation (Fitness)
Document13 pages
Physical Education: Quarter 2 - Module 1 Active Recreation (Fitness)
쓰레기한국 팝
No ratings yet
Hematology Residency Training Program Handbook
Document319 pages
Hematology Residency Training Program Handbook
Paul Avelino Callupe
No ratings yet
Chronic Obstructive Pulmonary Disease (COPD) Is A Condition of
Document12 pages
Chronic Obstructive Pulmonary Disease (COPD) Is A Condition of
Diana Jalaynie S. Sambolawan
No ratings yet
Brainstorming (Nutrition)
Document2 pages
Brainstorming (Nutrition)
Renzy Lantaco Baguio
No ratings yet
Wickford Meats Sanitation Standard Operating Procedure (SSOP)
Document11 pages
Wickford Meats Sanitation Standard Operating Procedure (SSOP)
yoiber jimenez
No ratings yet
Tamir Hands of Clitroidectomy
Document5 pages
Tamir Hands of Clitroidectomy
Technic Reality
No ratings yet
Devcon F - Msds
Document10 pages
Devcon F - Msds
hamada
No ratings yet