Professional Documents
Culture Documents
Unit 3
Unit 3
❖Line chart
A line chart is used to illustrate the relationship between two or more continuous Variables.
Eg: ●Time series lines will be plotted using the matplotlib package and stock price
Information
❖Histogram
✔ Histogram plots are widely used in statistical analysis to
depict the distribution of any continuous variable
2. Consider two cases: merging using an inner join, the default merge type, and the intersection of
keys.
df = pd.merge(left1, right1, left_on='key', right_index=True)
• EDA requires consistent data rearrangement using hierarchical indexing, which involves
stacking and unstacking actions to rotate columns and rows respectively.
The groupby() function allows for grouping based on multiple categories by specifying
column names, which will be done simultaneously with the first, second, and so on
categories.
double_grouping = df.groupby(["body-style","drive-wheels"])
double_grouping.first()
Data aggregation
Aggregation is a mathematical operation in pandas used for data analysis, applied across columns using
the Dataframe.aggregate() function.columns. Some of the most frequently used aggregations are as
follows:
●sum: Returns the sum of the values for the requested axis
●min: Returns the minimum of the values for the requested axis
●max: Returns the maximum of the values for the requested axis
# creating a dataframe
sales_data = pd.DataFrame({
'customer_id':[3005, 3001, 3002, 3009,
3005, 3007,
3002, 3004, 3009, 3008,
3003, 3002],
# Group the data frame df by body-style and drive-wheels and extract stats from each group
df.groupby(
["body-style","drive-wheels"]
).agg(
{
'height':min, # minimum height of car in each group
'length': max, # maximum length of car in each group
'price': 'mean', # average price of car in each group
}
)
pd.crosstab(df["make"], df["body-style"])
# rename the columns and row index for better understanding of crosstab
pd.crosstab([df["make"],df["num-of-doors"]], [df["body-style"],df["drive-wheels"]],
rownames=['Auto Manufacturer', "Doors"],
colnames=['Body Style', "Drive Type"],
margins=True,margins_name="Total Made").head()