You are on page 1of 8

Edx navakavi@yahoo.

comtarget@1

Data Science: Wrangling

Enrolled: (
Self-
paced)
Starts Jul
15, 2020
4-Jul Data science
Tools for Data science
Data science Methodology
Python for Data Science and AI
Data Bases and SQL for Data Science
Data Analysis with Python
Data Visualization with Python
Machine Learning with Python
Applied data science Capstone

4-Aug
Pandas:
open source Python library
mainly used for Data manipulation and Analysis.

Panel data--> Pandas.

Panel Data means multi-dimensional data involving measurements over time.

created by Wes McKinney - in 2015 .

Features of Panda:

1.Series object and data Frame:allow 1 or 2 dimensional data in a series object and data frame.
2.hanling of missing data- Panda represents (NAN)
3.Data Alignment -
4.group by functionality
5.slicing /indexing and subseting
6.Merging and Joining
7.Reshaping
8.Hierarchial labeling of axes
9. Robust input-output tool-loading file(flatfile,csv files,xl,db..)
10 . Time Series- Specific funcitonality

Series Object:

One Dimensional labeled array


Can contain data of similar or mixed types
Example :
data =[1,2,3,4]
Series1 = pd.Series(data)

Anaconda --
Install Anaconda ..

Launch Aanaconda Navigator


launch Jupyter Notebook
New -- Python -3
import numpy as np
import matplotlib.pyplot as plt

N = 30
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = np.pi * (20 * np.random.rand(N))**2

plt.scatter(x, y, s=area, c=colors, alpha=0.5)


plt.show()
Pandas Vs
Pandas
Performs better than numpy for 500K rows or
more
Panda
series object is more flexible as you can define
your own labeled index to index and access
elements of an array .

How to import :

import pandas ad pd

What kind of data suits: Tabular ,time Series data,matrix


Data Set in Pandas
One dimensional -- Series object
Multi Dimensional - Data frame (2 Dimensional) - Panel Data (3 Dimensional).
Numpy
Numpy performs
better for 50K rows or less.

Elements in Numpy arrays are accessed by their default intege


position

- Panel Data (3 Dimensional).


# Series Object in Panadas
import pandas as pd

data = [1,2,3,4]
S1 = pd.Series(data)
print(S1)
type(S1)
0 1
1 2
2 3
3 4
dtype: int64
pandas.core.series.Series

You might also like