November 22, 2019

When you have data in a CSV, you can load it into a DataFrame in Pandas using read_csv()
import pandas as pd
pd.read_csv(‘new_file.csv’)

We can also save data to a csv using to_csv()
pd.to_csv(‘new_file.csv’)

If it’s a small DataFrame, you can display it by typing print(df).

The method head() gives you the first 5 rows of a DataFrame. If you want to see more rows, you can pass the positional argument n. For example, df.head(10) would show you the first 10 rows.

The method df.info() gives you some statistics for each column.

Select a column as if you were selecting a value from a dictionary using a key. For example, df[‘age’] to select the column age.

When we select a single column, the result is called a Series.

To select two or more columns from a DataFrame, we use a list of the column names. For example, new_df = orders[[‘last_name’, ‘email’,]] Make sure you use a double set of brackets.

Select multiple columns where a logical evaluation is True
select_request = inventory[(inventory.location == “Brooklyn”) & (inventory.product_type == “seeds”))

Create a new column using a lambda function:
inventory[‘in_stock’] = inventory.apply(lambda row: True if row.quantity > 0 else False, axis=1)

Use axis=1 to run the function across a row instead of down a column.

You can select a subset of a DataFrame by using logical statements:
df[df.MyColumnName == desired_column_value]

In Python, | means “or” and & means “and”.

We could use the isin command to check that df.name is one of a list of values.
df[df.name.isin([‘Levi’, ‘Christina’, ‘Logan’])

When we select a subset of a DataFrame using logic, we end up with non-consecutive indices. This is inelegant and makes it hard to use iloc().

If we use the command df.reset_index(), we get a new DataFrame with a new set of indices.

Note that old indices have been moved into a new column called ‘index’. Unless you need those values for something special, it’s probably good to use the keyword drop=True so you don’t end up with an extra column.
df.reset_index(drop=True)

Using reset_index() will return a new DataFrame, but we usually want to modify our existing DataFrame. If we use the keyword inplace=True, we can just modify our existing DataFrame.

Previous
Previous

November 27, 2019

Next
Next

November 18, 2019