You are on page 1of 13

Data Wrangling:

Join, Combine,
and Reshape
8.1 Hierarchical Indexing
Library

Science
Mystery Historical
Fiction

Mys_Book1 SF_Book1 Hist_Book1


Year: 2010, Author: A Year: 2012, Author: C Year: 2016, Author: E

Mys_Book2 SF_Book2 Hist_Book1


Year: 2010, Author: B Year: 2009, Author: D Year: 2018, Author: E
This function generates random numbers from a standard normal The first level is represented by the labels 'a', 'b',
distribution, also known as the standard Gaussian distribution or a 'c', and 'd'. These labels act as the first level of
normal distribution with a mean of 0 and a standard deviation of 1. indexing. You can think of them as categories or
groups.

The MultiIndex is specified


within the index parameter
as a list of lists. It has two
levels

The second level is represented by the numbers 1, 2, and 3. These


numbers act as the second level of indexing and are associated with
the first-level labels. You can think of them as subcategories or
specific identifiers within each category.
Reshaping
Hierarchical indexing plays an important role in reshaping data and group
based operations

This method reshapes a DataFrame by moving one or more levels of the index to become columns,
effectively creating a more traditional tabular format. This can be useful when we want to make
data more accessible for analysis or visualization.

4
It means that this array contains values from 0 to 11 and is
reshaped into a 4x3 array. The second level of the
row MultiIndex
consists of numbers 1,
2, 1, and 2. This means
there are
For the rows:
subcategories within
The first level of the row
each category (1 and 2
MultiIndex consists of
within 'a', and 1 and 2
labels 'a', 'a', 'b', and 'b'.
within 'b').
This means there are two
categories ('a' and 'b’).

The first level of the column


The second level of the MultiIndex consists of 'Ohio',
column MultiIndex consists of 'Ohio', and 'Colorado'. This
'Green', 'Red', and 'Green'. represents different regions or
This represents different states.
colors.
Indexing with a DataFrame’s columns
It’s not unusual to want to use one or more columns from a
DataFrame as the row index; alternatively, you may wish to
move the row index into the DataFrame’s columns.

set_index() is commonly used to set multiple columns as levels of a hierarchical index. This allows you to
create a multi-level index where you can represent and organize data hierarchically.
8.2 Combining and Merging Datasets
Data contained in pandas objects can be combined together in a number of ways:
• pandas.merge connects rows in DataFrames based on one or more keys. This will be
familiar to users of SQL or other relational databases, as it implements database join
operations.
• pandas.concat concatenates or “stacks” together objects along an axis.
Merging on Index
• It refers to the process of using one or more DataFrame indices as the key(s) for
merging two or more DataFrames.
• Unlike traditional column-based merges, where you specify the column name(s)
as the key(s), merging on index means you use the row labels (index values) to
perform the merge.

• DataFrame has a convenient join instance for merging by index. It can also be
used to combine together many DataFrame objects having the same or similar
indexes but non-overlapping columns.
Concatenating Along an Axis
In the context of pandas objects such as Series and DataFrame, having labeled axes enable you to
further generalize array concatenation. In particular, you have a num ‐ ber of additional things to think
about:

• If the objects are indexed differently on the other axes, should we combine the distinct elements in
these axes or use only the shared values (the intersection)?
• Do the concatenated chunks of data need to be identifiable in the resulting object?
• Does the “concatenation axis” contain data that needs to be preserved? In many cases, the default
integer labels in a DataFrame are best discarded during concatenation.

By default concat works along axis=0, producing another Series. If you pass axis=1, the result will
instead be a DataFrame (axis=1 is the columns):
8.3 Reshaping and Pivoting
Thank You

You might also like