December 4, 2019

Aside from using .rename to merge two DataFrames that don’t match, we can use the keywords left_on and right_on
pd.merge(orders, customers, left_on = ‘customer_id’, right_on = ‘id’)

If we do this, we will end up with two columns named id, (id_x and id_y)

We can make this more usable by using the keyword suffixes
pd.merge(orders, customers, left_on = ‘customer_id’, right_on = ‘id’, suffixes = [‘_order’, ‘_customer’])

The type of merge where we only include matching rows is called an inner merge.

If we want to merge tables with losing rows that are missing from one of the tables, we can use an outer join.

An outer join would include all rows from both tables, even if they don’t match. Any missing values are filled with None or NaN (which stands for Not a Number).
pd.merge(company_a, company_b, how = ‘outer’)

A left merge includes all rows from the first (left) table, but only rows from the second (right) table that match the first table:
pd.merge(table1, table2, how=’left’)

REVIEW

Creating a DataFrame made by matching the common columns of two DataFrames is called a merge.

We can specify which columns should be matches by using the keyword arguments left_on and right_on.

We can combine DataFrames whose rows don’t all match using left, right, outer merges, and the how argument.

We can stack or concatenate DataFrames with the same columns using pd.concat.

Previous
Previous

December 14, 2019

Next
Next

December 2, 2019