You are on page 1of 6

PYTHON

FOR
DATASCIENCE

1. PANDAS DATAFRAME.SAMPLE()
2. PANDAS DATAFRAME.SHIFT()
1. Pandas DataFrame.sample()
The Pandas sample() is used to select the rows
and columns from the DataFrame randomly. If
we want to build a model from an extensive
dataset, we have to randomly choose a smaller
sample of the data that is done through a
function sample.
SYNTAX:

DataFrame.sample(n=None, frac=None,
replace=False, weights=None,
random_state=None, axis=None)
Parameters:
n (int): number of random samples to generate.
frac (float): fraction of the total rows to sample.
replace (bool): whether to sample with
replacement or not (default False).
weights (str or ndarray): column name or array of
weights to weight the probabilities of each row.
random_state (int or RandomState): seed for the
random number generator.
axis (int or str): axis to sample from (0 for rows, 1
for columns).
Example1:
import pandas as pd
info = pd.DataFrame({'data1': [2, 4, 8, 0],
'data2': [2, 0, 0, 0], 'data3': [10, 2, 1, 8]},
index=['John', 'Parker', 'Smith', 'William'])
info
info['data1'].sample(n=3, random_state=1)
info.sample(frac=0.5, replace=True, random_state=1)
info.sample(n=2, weights='data3', random_state=1)
Output:

You might also like