You are on page 1of 8

5_Sampling_Technique_in_Python http://localhost:8888/nbconvert/html/Test/5_Sampling_Technique_in_Python.ipynb?download...

Population
• The population is the set of all observations (individuals, objects, events, or procedures) and is usually very large and diverse.

Sample
• A sample is a subset of observations from the population that ideally is a true representation of the population.

Loading an example dataset from seaborn

In [1]: import seaborn as sns


df=sns.load_dataset('titanic')

1 of 8 02-03-2023, 14:35
5_Sampling_Technique_in_Python http://localhost:8888/nbconvert/html/Test/5_Sampling_Technique_in_Python.ipynb?download...

1. Random Sampling from numpy library

In [2]: import numpy as np


random=np.random.choice(df.index,replace=False,size=10)
df.iloc[list(random)]
#randomly selecting index value
#choice method replace argument is defining with or without repetation
#size argument return number of sample

Out[2]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone

290 1 1 female 26.0 0 0 78.8500 S First woman False NaN Southampton yes True

261 1 3 male 3.0 4 2 31.3875 S Third child False NaN Southampton yes False

623 0 3 male 21.0 0 0 7.8542 S Third man True NaN Southampton no True

866 1 2 female 27.0 1 0 13.8583 C Second woman False NaN Cherbourg yes False

572 1 1 male 36.0 0 0 26.3875 S First man True E Southampton yes True

318 1 1 female 31.0 0 2 164.8667 S First woman False C Southampton yes False

199 0 2 female 24.0 0 0 13.0000 S Second woman False NaN Southampton no True

186 1 3 female NaN 1 0 15.5000 Q Third woman False NaN Queenstown yes False

565 0 3 male 24.0 2 0 24.1500 S Third man True NaN Southampton no False

696 0 3 male 44.0 0 0 8.0500 S Third man True NaN Southampton no True

2 of 8 02-03-2023, 14:35
5_Sampling_Technique_in_Python http://localhost:8888/nbconvert/html/Test/5_Sampling_Technique_in_Python.ipynb?download...

Random Sampling from random module

In [3]: import random


select_index=[]
for i in range(1,10):
select_index.append(random.randint(1,len(df)))
#generating random index
df.iloc[select_index]

Out[3]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone

556 1 1 female 48.0 1 0 39.6000 C First woman False A Cherbourg yes False

434 0 1 male 50.0 1 0 55.9000 S First man True E Southampton no False

289 1 3 female 22.0 0 0 7.7500 Q Third woman False NaN Queenstown yes True

289 1 3 female 22.0 0 0 7.7500 Q Third woman False NaN Queenstown yes True

876 0 3 male 20.0 0 0 9.8458 S Third man True NaN Southampton no True

200 0 3 male 28.0 0 0 9.5000 S Third man True NaN Southampton no True

418 0 2 male 30.0 0 0 13.0000 S Second man True NaN Southampton no True

64 0 1 male NaN 0 0 27.7208 C First man True NaN Cherbourg no True

659 0 1 male 58.0 0 2 113.2750 C First man True D Cherbourg no False

3 of 8 02-03-2023, 14:35
5_Sampling_Technique_in_Python http://localhost:8888/nbconvert/html/Test/5_Sampling_Technique_in_Python.ipynb?download...

2. Systemic sampling :targeting population from head or tail position or from certain fixed intervals

In [4]: df.head(5)

Out[4]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone

0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False

1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False

2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True

3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False

4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True

In [5]: df.tail(5)

Out[5]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone

886 0 2 male 27.0 0 0 13.00 S Second man True NaN Southampton no True

887 1 1 female 19.0 0 0 30.00 S First woman False B Southampton yes True

888 0 3 female NaN 1 2 23.45 S Third woman False NaN Southampton no False

889 1 1 male 26.0 0 0 30.00 C First man True C Cherbourg yes True

890 0 3 male 32.0 0 0 7.75 Q Third man True NaN Queenstown no True

4 of 8 02-03-2023, 14:35
5_Sampling_Technique_in_Python http://localhost:8888/nbconvert/html/Test/5_Sampling_Technique_in_Python.ipynb?download...

In [13]: select_index=list(range(1,len(df),50))
df.iloc[select_index]
#we choose from 50 fixed interval

Out[13]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone

1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False

51 0 3 male 21.0 0 0 7.8000 S Third man True NaN Southampton no True

101 0 3 male NaN 0 0 7.8958 S Third man True NaN Southampton no True

151 1 1 female 22.0 1 0 66.6000 S First woman False C Southampton yes False

201 0 3 male NaN 8 2 69.5500 S Third man True NaN Southampton no False

251 0 3 female 29.0 1 1 10.4625 S Third woman False G Southampton no False

301 1 3 male NaN 2 0 23.2500 Q Third man True NaN Queenstown yes False

351 0 1 male NaN 0 0 35.0000 S First man True C Southampton no True

401 0 3 male 26.0 0 0 8.0500 S Third man True NaN Southampton no True

451 0 3 male NaN 1 0 19.9667 S Third man True NaN Southampton no False

501 0 3 female 21.0 0 0 7.7500 Q Third woman False NaN Queenstown no True

551 0 2 male 27.0 0 0 26.0000 S Second man True NaN Southampton no True

601 0 3 male NaN 0 0 7.8958 S Third man True NaN Southampton no True

651 1 2 female 18.0 0 1 23.0000 S Second woman False NaN Southampton yes False

701 1 1 male 35.0 0 0 26.2875 S First man True E Southampton yes True

751 1 3 male 6.0 0 1 12.4750 S Third child False E Southampton yes False

801 1 2 female 31.0 1 1 26.2500 S Second woman False NaN Southampton yes False

851 0 3 male 74.0 0 0 7.7750 S Third man True NaN Southampton no True

5 of 8 02-03-2023, 14:35
5_Sampling_Technique_in_Python http://localhost:8888/nbconvert/html/Test/5_Sampling_Technique_in_Python.ipynb?download...

3. By dividing the population into multiple groups based on certain criteria.

In [7]: import pandas as pd


data=pd.DataFrame()
for i in df["embark_town"].unique():
data=pd.concat([data,df[df["embark_town"]==i].head(3)])
display(data)
#diving each by embarked town and concating first 3 entries

survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone

0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False

2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True

3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False

1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False

9 1 2 female 14.0 1 0 30.0708 C Second child False NaN Cherbourg yes False

19 1 3 female NaN 0 0 7.2250 C Third woman False NaN Cherbourg yes True

5 0 3 male NaN 0 0 8.4583 Q Third man True NaN Queenstown no True

16 0 3 male 2.0 4 1 29.1250 Q Third child False NaN Queenstown no False

22 1 3 female 15.0 0 0 8.0292 Q Third child False NaN Queenstown yes True

6 of 8 02-03-2023, 14:35
5_Sampling_Technique_in_Python http://localhost:8888/nbconvert/html/Test/5_Sampling_Technique_in_Python.ipynb?download...

4. Sampling every n'th item.

In [8]: df.iloc[::5].head(10)
#iloc property gets, or sets, the value(s) of the specified indexes.

Out[8]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone

0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False

5 0 3 male NaN 0 0 8.4583 Q Third man True NaN Queenstown no True

10 1 3 female 4.0 1 1 16.7000 S Third child False G Southampton yes False

15 1 2 female 55.0 0 0 16.0000 S Second woman False NaN Southampton yes True

20 0 2 male 35.0 0 0 26.0000 S Second man True NaN Southampton no True

25 1 3 female 38.0 1 5 31.3875 S Third woman False NaN Southampton yes False

30 0 1 male 40.0 0 0 27.7208 C First man True NaN Cherbourg no True

35 0 1 male 42.0 1 0 52.0000 S First man True NaN Southampton no False

40 0 3 female 40.0 1 0 9.4750 S Third woman False NaN Southampton no False

45 0 3 male NaN 0 0 8.0500 S Third man True NaN Southampton no True

7 of 8 02-03-2023, 14:35
5_Sampling_Technique_in_Python http://localhost:8888/nbconvert/html/Test/5_Sampling_Technique_in_Python.ipynb?download...

5. Reproducible Random Sample in Pandas

In [9]: df.sample(5,random_state=1)

Out[9]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone

862 1 1 female 48.0 0 0 25.9292 S First woman False D Southampton yes True

223 0 3 male NaN 0 0 7.8958 S Third man True NaN Southampton no True

84 1 2 female 17.0 0 0 10.5000 S Second woman False NaN Southampton yes True

680 0 3 female NaN 0 0 8.1375 Q Third woman False NaN Queenstown no True

535 1 2 female 7.0 0 2 26.2500 S Second child False NaN Southampton yes False

In [10]: df.sample(5,random_state=1)
#same random sample is produced even if we run the code multiple times

Out[10]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone

862 1 1 female 48.0 0 0 25.9292 S First woman False D Southampton yes True

223 0 3 male NaN 0 0 7.8958 S Third man True NaN Southampton no True

84 1 2 female 17.0 0 0 10.5000 S Second woman False NaN Southampton yes True

680 0 3 female NaN 0 0 8.1375 Q Third woman False NaN Queenstown no True

535 1 2 female 7.0 0 2 26.2500 S Second child False NaN Southampton yes False

8 of 8 02-03-2023, 14:35

You might also like