You are on page 1of 12
‘Wem, 2:19 PM Pandas -Jupyler Notebook Pandas in Python What is Pandas? Pandas is defined as an open-source library that provides high-perpormance data manipulation in Python. The name of Pandas is derived from the word Panel Data, which means an Econometrics from Multidimensional data. Itis used for data analysis in Python and developed by Wes Mckinney in 2008, Data analysis requires lots of processing, such as restructuring, cleaning, merging ete. There are different tools are available for fast data processing, such as Numpy, Scipy, Panda. But we prefer Pandas because working with Pandas is fast, simple and more exressive than other tools. In [1]: # Import Pandas and check the version import pandas as pd print(pd.__version__) 1.4.4 Python Pandas Data Structure: The Pandas provide two data structures for processing the data, i.e., Series and DataFrame. Series: ''Tt is defined as a one-dimensional array that is capable of storing various data types. The row ladels of series are called the index. We can easily convert the list, tuple, and dictionary into series using ries” method. A Series cannot contain multiple columns.''* localhost:8888/notebooksiDownloads/Soumya Data Science/Pandas.ipynb® ane ‘Wem, 2:19 PM Pandas -Jupyler Notebook In [2]: # Create a Series from a numpy array and dict: ‘# From List: print("Using List") nylst = list([‘orange’, ‘banana’, ‘mango’, ‘carry*]) Ist = pd.Series(mylst) print(Ist) print (type(1st)) Using List @ orange 1 banana 2 mango 3 carry dtype: object In [3]: # From Numpy Array print("Using Numpy Array") import numpy as np myarr = np.arange(5) arr = pd.Series(nyarr) print(arr) print(type(arr)) Using Numpy Array @ 1 2 3 4 dtype: int32 In [4]: # From Dictionary print (“Using Dictionary") mydict = {1:'ogange’, 2:'mango', 3:'carry'} det = pd.Series(mydict) print (dct) print (type(dct)) Using Dictionary 1 ogange 2 mango 3 carry dtype: object localhost:8888/notebooksiDownloads/Soumya Data Science/Pandas.ipynb® ane ‘Wem, 2:19 PM Pandas -Jupyler Notebook In [5]: # Convert the index of a Series into a columns of a dataframe df1 = dct.to_frame().reset_index() print(df1.head()) index e @ 1 ogange 1 2 mango 2 3 carry In [6]: # Combine many series to from a dataframe d#2 = pd.concat([Ist, arr], axis = 1) print (d#2) e orange banana mango carry NaN auKHS auNHon In [7]: # Combine many series to frame a dataframe df3 = pd.DataFrame({‘col1':1st, ‘col2':arr}) print (df3) coll col2 © orange | 0 1 banana 1 2 mango 2 3 carry 3 4 NN 4 DataFrame: ‘*'Pandas DataFrame is a widely used structure which works wiht a two-dimensio nal array with labeled axes (rows and columns). DataFrame is defined as a standard way to store data that has two different ind exes, i.e., row index and column index. Create a DataFrame We can create a DataFrame using: * Dict * List * Numpy ndarrays * Series localhost:8888/notebooksiDownloads/Soumya Data Science/Pandas.ipynb® ann ‘Weare, 2:19PM Pandas -.Jupyter Notebook In [8]: # Create an empty DataFrame df = pd.bataFrame() print (df) Empty DataFrane columns: [] Index: [] In [9]: # Create DataFrame using List Ast = ['orange’, ‘mango’, ‘carry’, ‘banana"] df1 = pd.DataFrame(1st) print (df1) e @ orange 1 mango 2 carry 3 banana In [10]: # Create a DataFrame from Dict det = (*1D': [101, 162, 103, 104], ‘Department’: ['B.Sc’, 'B.Tach’, ‘M.Tech', ‘Phd']} df2 = pd.DataFrame(dct) print (df2) ID Department @ 101 B.Sc 1 162 B.Tach 2 103 M.Tech 3 104 Phd In [36]: # Create DataFrame using Dictionary det1 = {"1D': [1,2,3,4,5,6,7,8,9,10],, ‘Name’: ['Raj', "Ram", "Ramya’, "Vidya", ‘Vinay’, ‘Shanti', "Bhagya’, ‘Arun’, ‘Avani ‘email_id’: ['raj@gmail.com', ‘ram@gmail.com', ‘ramya@gnail.com’, ‘vidya@gmail.con’, ‘shanti@gmail.com', ‘bhagya@gmail.com’, ‘arun@gnail.con*, ‘avani@gmail. “Expirence': [2,5,3,4,2,2.5,8,9,6,7], ‘salary’: [25000, 3¢000, 15000, 50000, 42000, 23000, 20000, 35000, 40000,25000],, ‘Place localhost:8888/notebooksiDownloads/Soumya Data Science/Pandas.ipynb® ['Bengalore’,'Hubli', ‘Mysore’, ‘Pune’, 'Bengalore’, ‘Pune’, ‘Hubli’, 'Bengalor ana ‘Wem, 2:19 PM df_det1 = pd.DataFrame(dct1) ‘email_id Expirence 20 Salary 25000 Pandas -Jupyler Notebook Place raj@gmall.com In [37]: df_deta out [37]: 1D Name ot Raj 12 Ram 2 3. Ramya 34 Vidya 45° Vinay 5 6 Shani 6 7 Bhagya 7 8 Ann 8 8 Avani 9 10 Shivw In [11]: ram@gmail.com ramya@gmall.com vidya@gmail.com vinay@gmail.com shanti@gmail.com bhagya@gmail.com ‘arun@gmail.com ‘avani@gmail.com ‘shivu@gmail.com 50 30 40 20 25 a0 80 60 70 # Create a DataFrame Dict of Series: det = (" one pd.Series([1,2,3,4,5,6,7], index = ['a" "two': pd.Series([11,12,13,14,15,16,17], index df3 = pd.DataFrame(dct) print (df3) one two ajoiu bo 2 2 c 3 2B qd 4 14 e 5 15 f 6 16 eo7 30000 15000 50000 42000 23000 20000 35000 40000 25000 Bengalore Hubii Mysore Pune Bengalore Pune Hubii Bengalore Bengalore Bengalore localhost:8888/notebooksiDownloads/Soumya Data Science/Pandas.ipynb® 5in2 ‘Wem, 2:19 PM In [12]: # Adding the columns to the DataFrame df3['three'] = pd.Series([21,22,23,24,25,26,27], index = ['a', Pandas -Jupyler Notebook df3['four'] = pd.Series((31,32,33,34,35,36,37], index df3['five'] = df3['one'] + df3['two'] print (d#3) one two aoiu bo 2 2 c o3 2B qd 4 14 e 5 15 f 6 16 eo7 three 21 22 23 24 25 26 27 four five 31 32 33 34 35, 36 37 12 14 16 18 20 22 24 Importing Dataset: In [13]: # We can import Datasets by several ways: # Using excel fil df_ex = pd.read_excel ("Attribute DataSet.x1sx") In [14]: # Print first 5 Rows of dataset df_ex.head() out(14]: Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline © 1006032852 Sexy Low = 46 | M_ Summer neck sleevless empire 1 1212192089 Casual Low 0.0 ‘Summer o-neck Petal natural on 2 1190380701 vintage High 0.0 Automn — o-neck full natural 3 966005983 Brief Average 46 = L_— Spring neck full natural 4 876339541 cute © Low. 4,5. M_ Summer o-neck buttery natural chi localhost:8888/notebooksiDownloads/Soumya Data Science/Pandas.ipynb® ane ‘Wem, 2:19 PM In [15]: # Print Last 5 rows of dataset df_ex.tail() Pandas -Jupyler Notebook out[15]: Dress_ID Style Price Rating Size Season NeckLine feLength waiseline 495 713391965 Casual Low 4.7 «= M_—Spring—o-neck full natural 496 722565148 Sexy Low 4.3 free Summer —_o-neck full empire 497 532874347 Casual Average 4.7. «= M_ Summer v-neck full empire 498 655464934 Casual Average 4.6 L_—_winler_boatineck sleeviess empire 499 919930954 Casual Low 4.4_free Summer v-neck short empire In [16]: # Inporting Dataset using CSV file df_csv = pd.read_csvi In [17]: # First 5 rows df_csv.head() out [17]: 30 64 4 0 30 62 3 1 30 65 0 231 59 2 33165 4 433 58 10 14 haberman.csv") localhost:8888/notebooksiDownloads/Soumya Data Science/Pandas.ipynb® ma ‘Wem, 2:19 PM In [18]: # Last 5 rows df_csv.tail() out[18]: 30 64444 300 75 621 1 30176 67 0 1 302 77 65 3 1 303 78 65 1 2 306 83 58 2 2 In [19]: # Importing Dataset using html Pandas -Jupyler Notebook df _html = pd.read_html( ‘https: //ww.basketball-reference. com/leagues/NBA_2015_totals.htnl") In [26]: df_html[@] out {20}: Rk Player Pos Age Tm G GS MP FG FGA .. FT% ORB DRE TRE 0 + REY pe 24 NYK 68 22 1287 152 391 .. 78H 79 222 901 102 jordan sg 20 MEM 30 0 248 35 86 609 9 19 28 ‘Adame 2 $0 G24 OKC 70 G7 1771 217 300 .. 502 199 32 500 Jet 24 agli pe 28 MIN 17 0 215 19 M4. 572 29 587 on 4 5 A sq 29 TOT Te 72 2502 %75 804 .. 849 27 220. 247 70 45 TOUS oF 25 TOT 78 68 2494 ASI 958. 655 127 284 att Thaddeus . err 450 TP2BOUS oe 25 MIN 48 48 1605 289 41. 682-75 170. 245 Thaddeus ora 490 TPES oe 25 BRK 28 20 020 «162 927. 608 52 11K 105 ors 41 £2 22 CHO 62 45 Mar 472 979 .. 7TH 97 205. 962 ors 492 E25 BOS G2 50 1781 MO 619 .. 29 48 31945 675 rows * 30 columns localhost:8888/notebooksiDownloads/Soumya Data Science/Pandas.ipynb® ana ‘Wem, 2:19 PM Pandas -Jupyler Notebook In [21]: # Importing Dataset using Json file df_json = pd.read_json( ‘https: //api.github. com/repos/pandas-dev/pandas/ issues") In [22]: df_json[ ‘user ][@] out[22]: (login’: ‘willayd", “id": 609873, *node_id': 'MDQGVXN1cjYWOTe3Mw==", ‘avatar_url': ‘https: //avatars.githubusercontent .com/u/609873? ‘gravatar_id': ‘', ‘url’: ‘https://api.github.com/users/WillAyd’, “html_url': ‘https: //github.com/Willayd' , “followers_url' ‘following_url' y's "gists_url': ‘https: //api.github.com/users/Willayd/gists{/gist_id}', /api .github.con/users/WillAyd/starred{/owner}{/rep *starred_url. o}", *subscriptions_url ‘organizations_url “http: “https: //api.github.com/users/Willayd/followers’ , “https: //api. github. com/users/Willayd/following{/other_use ‘repos_url': ‘https: //api.github.com/users/Willayd/repos' , ‘events url’: ‘https: //api.github.com/users/willAyd/events{/privacy}', *received_events_url type’: ‘User’, ‘site_admin': False) In [23]: # Inport Dataset and get insights from data # import dataset df_att = pd.read_excel("Attribute Dataset.x1sx") In [24]: # First 5 rows df_att-head() out[24]: Dress_ID Style Price Rating Size Season NeckLine “https: //api. github. com/users/Willayd/subscriptions’, “https: //api.github.com/users/WillAyd/orgs' SleeveLength waiseline “https: //api .github.com/users/willayd/received_event © 1006032852 Sexy Low 46 += M Summer oneck 4 1212192089 Casual Low 0.0L. Summer omeck 2 1190380701 vintage High «0.0L Auton omneck 3 966005983 Bret Average 4.6 «= Ls Spring ome 4 876339541 cule = Low = 45M Summer omneck localhost:8888/notebooksiDownloads/Soumya Data Science/Pandas.ipynb® sleeviess Petal full full buttery ‘empire natural natural natural natural chi ana ‘Wem, 2:19 PM In [25]: # Last 5 rows df_att.tail() Pandas -Jupyler Notebook out [25]: Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline 495 713991965 Casual Low 4.7. M_— Spring. omeck full natural 496 722565148 Sexy Low 4.3. free Summer —_omneck ful empire 497 592874347 Casual Average 4.7 M_ Summer —_vineck full empire 498 655464934 Casual Average 4.6 L_—winter boat-neck —steeviess empire 499 919930954 Casual Low 4.4 free Summer veneck short empire In [26]: # Columns of Dataset df_att. columns out [26]: Index(["Dress_ID', ‘Style’, "Price', ‘Rating’, ‘Size’, ‘Season’, ‘NeckLine', "SleeveLength’, ‘waiseline’, ‘Material’, ‘FabricType', ‘Decoration’, "pattern Type", object") dtype In [27]: # Check the bata type df_att.dtypes out[27]: Dress_ID style Price Rating Size Season NeckLine SleeveLength waiseline Material Fabrictype Decoration Pattern Type Recommendation dtype: object inte object object floated object object object object object object object object object inte4 *Recommendation'], localhost:8888/notebooksiDownloads/Soumya Data Science/Pandas.ipynb® sone ‘Wem, 2:19 PM In [28]: # Detail of data df_att.describe() Pandas -Jupyler Notebook out[28]: Dress_ID Rating Recommendation ‘count 5,000000e+02 500,000000 500.000000 mean 9.055417e+08 3.528600 0.420000 std 1.7361900+08 2.005364 0.494053 min 4442620008 0.000000 0.000000 25% 7873164008 3.700000 0.000000 50% 9.083296e+08 4.600000 0.000000 75% 1.039534e+09 4.800000 1.900000 max 1.253973e+09 5.000000 1.900000 In [29]: # Check for Null values df_att.isnull() out [29]: Dress ID Style Price Rating Size Season NeckLine SleeveLength ine Materie 0 Fase False False Fake False Fake False Fase False Tru 1 False False False Fake False Fake False False Fase Fals 2 False False False False False False False Fase Fae Fals 3 Fale False False Fake False Fake False Fase Fake Fals 4 False False False False False Fale False Fase Fake Fals 495 False ‘False False False False False False False Fate Fals 496 False ‘False False False False False False False Fate Fals 497 False ‘False False False False False False Fase Fase Fals 498 False ‘False False False False Fale False Fae Fae Fals 499 False ‘False False False False False False Fase Fale Fals 500 rows * 14 columns localhost:8888/notebooksiDownloads/Soumya Data Science/Pandas.ipynb® anne ‘Wem, 2:19 PM In [30] df_att.isnul1().sum() Out[30] Dress_Ii Style Price Rating Size Season NeckLin Sleevel waiselil Materia D e length ne 1 FabricType Decorat: Pattern Recommendation dtype: In [31] # Chech for duplicate columns ‘ion Type inte 128 266 236 109 df_att.duplicated() out [32] RUNHS 495 496 497 498 499 Length: 508, dtype: bool In{]: False False False False False False False False False False Pandas -Jupyler Notebook localhost:8888/notebooksiDownloads/Soumya Data Science/Pandas.ipynb® sana

You might also like