How to handle Data, and Images(18) Pandas Usage
More in depth of some operations that Pandas offers and their usage
Lesson Notes in .ipynb file
How to handle Data, and Images(18) - Pandas Usage
Topics
- Data Frame Masking
- Data Frame Individual Operation(1)
- Data Frame Individual Operation(2)
- Data Frames Grouping(1)
- Data Frames Grouping(2)
- Data Frames Grouping(3)
- Data Frames Grouping(4)
- Data Frames Grouping(5)
- Data Frame’s Multiplexing
- Pivot Table
- Summary
Data Frame Masking
1
2
3
4
5
6
7
8
9
10
11
12
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(1,10,(2,2)), index=[0,1], columns=['a', 'b'])
print(df,'\n')
# from column 'a', return boolean value if it's less or equal to 5
print(df['a'] <= 5, '\n')
#from column 'a' less or equal to 5, and from column 'b', less or equal to 8
print(df.query('a <= 5 and b <= 8'))
Output:
1
2
3
4
5
6
7
8
9
10
a b
0 2 1
1 6 1
0 True
1 False
Name: a, dtype: bool
a b
0 2 1
Data Frame Individual Operation(1)
- pd.DataFrame.apply: Apply a function along an axis of the DataFrame.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import pandas as pd
df = pd.DataFrame([[1, 2, 3, 4], [1, 2, 3, 4]], index=[0, 1], columns=["A", "B", "C", "D"])
print(df, '\n')
# add 1 to all values
df = df.apply(lambda x: x + 1)
print(df, '\n')
# adds 1 to x
def addOne(x):
return x + 1
df = df.apply(addOne)
print(df, '\n')
Output:
1
2
3
4
5
6
7
8
9
10
11
A B C D
0 1 2 3 4
1 1 2 3 4
A B C D
0 2 3 4 5
1 2 3 4 5
A B C D
0 3 4 5 6
1 3 4 5 6
Data Frame Individual Operation(2)
- pd.DataFrame.replace: Replace values given in to_replace with value.
1
2
3
4
5
6
7
8
9
10
11
12
import pandas as pd
df = pd.DataFrame([
['Apple', 'Apple', 'Carrot', 'Banana'],
['Durian', 'Banana', 'Apple', 'Carrot']],
index=[0, 1],
columns=["A", "B", "C", "D"])
print(df, '\n')
# replace 'Apple' with 'Airport'
df = df.replace({"Apple": "Airport"})
print(df)
Output:
1
2
3
4
5
6
7
A B C D
0 Apple Apple Carrot Banana
1 Durian Banana Apple Carrot
A B C D
0 Airport Airport Carrot Banana
1 Durian Banana Airport Carrot
Data Frames Grouping(1)
- pd.DataFrame.groupby: Group DataFrame using a mapper or by a Series of columns.
1
2
3
4
5
6
7
8
9
10
11
12
import pandas as pd
df = pd.DataFrame([
['Apple', 7, 'Fruit'],
['Banana', 3, 'Fruit'],
['Beef', 5, 'Meal'],
['Kimchi', 4, 'Meal']],
columns=["Name", "Frequency", "Type"])
print(df, '\n')
# group DataFrame by Type by summing 'Frequency'
print(df.groupby(['Type'])['Frequency'].sum())
Output:
1
2
3
4
5
6
7
8
9
10
Name Frequency Type
0 Apple 7 Fruit
1 Banana 3 Fruit
2 Beef 5 Meal
3 Kimchi 4 Meal
Type
Fruit 10
Meal 9
Name: Frequency, dtype: int64
Data Frames Grouping(2)
1
2
3
4
5
6
7
8
9
10
11
12
13
import numpy as np
import pandas as pd
df = pd.DataFrame([
['Apple', 7, 5, 'Fruit'],
['Banana', 3, 6, 'Fruit'],
['Beef', 5, 2, 'Meal'],
['Kimchi', 4, 8, 'Meal']],
columns=["Name", "Frequency", "Importance", "Type"])
print(df,'\n')
# after grouping by 'Type' get min, max, average of 'Frequency' and 'Importance'
print(df.groupby(["Type"]).aggregate({'Frequency': ['min', 'max', np.average], 'Importance': ['min', 'max', np.average]}))
Output:
1
2
3
4
5
6
7
8
9
10
11
Name Frequency Importance Type
0 Apple 7 5 Fruit
1 Banana 3 6 Fruit
2 Beef 5 2 Meal
3 Kimchi 4 8 Meal
Frequency Importance
min max average min max average
Type
Fruit 3 7 5.0 5 6 5.5
Meal 4 5 4.5 2 8 5.0
Data Frames Grouping(3)
- pd.DataFrame.filter: Subset the dataframe rows or columns according to the specified index labels.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import pandas as pd
df = pd.DataFrame([
['Apple', 7, 5, 'Fruit'],
['Banana', 3, 6, 'Fruit'],
['Beef', 5, 2, 'Meal'],
['Kimchi', 4, 8, 'Meal']],
columns=["Name", "Frequency", "Importance", "Type"])
def my_filter(data):
return data["Frequency"].mean() >= 5
print(df, '\n')
# After grouping by 'Type' return only the ones where 'Frequency' mean is bigger or equal to 5
df = df.groupby("Type").filter(my_filter)
print(df)
Output:
1
2
3
4
5
6
7
8
9
Name Frequency Importance Type
0 Apple 7 5 Fruit
1 Banana 3 6 Fruit
2 Beef 5 2 Meal
3 Kimchi 4 8 Meal
Name Frequency Importance Type
0 Apple 7 5 Fruit
1 Banana 3 6 Fruit
Data Frames Grouping(4)
- pd.DataFrameGroupBy.get_group: Construct DataFrame from group with provided name
1
2
3
4
5
6
7
8
9
10
11
12
import pandas as pd
df = pd.DataFrame([
['Apple', 7, 5, 'Fruit'],
['Banana', 3, 6, 'Fruit'],
['Beef', 5, 2, 'Meal'],
['Kimchi', 4, 8, 'Meal']],
columns=["Name", "Frequency", "Importance", "Type"])
# after grouping 'Type', get_groups with 'Fruit' as Type
df = df.groupby("Type").get_group("Fruit")
print(df)
Output:
1
2
3
Name Frequency Importance Type
0 Apple 7 5 Fruit
1 Banana 3 6 Fruit
Data Frames Grouping(5)
1
2
3
4
5
6
7
8
9
10
11
12
import pandas as pd
df = pd.DataFrame([
['Apple', 7, 5, 'Fruit'],
['Banana', 3, 6, 'Fruit'],
['Beef', 5, 2, 'Meal'],
['Kimchi', 4, 8, 'Meal']],
columns=["Name", "Frequency", "Importance", "Type"])
# create a column 'Gap' where Gap = Frequency - Frequency.mean() from Type, then reset index (since we're using .apply)
df["Gap"] = df.groupby("Type")["Frequency"].apply(lambda x: x - x.mean()).reset_index(drop = True)
print(df)
Output:
1
2
3
4
5
Name Frequency Importance Type Gap
0 Apple 7 5 Fruit 2.0
1 Banana 3 6 Fruit -2.0
2 Beef 5 2 Meal 0.5
3 Kimchi 4 8 Meal -0.5
Data Frame’s Multiplexing
1
2
3
4
5
6
7
8
9
10
11
12
import numpy as np
import pandas as pd
df = pd.DataFrame(
np.random.randint(1, 10, (4, 4)),
index=[['1st', '1st', '2nd', '2nd'], ['attk', 'defend', 'attk', 'defend']],
columns=['1', '2', '3', '4']
)
print(df, '\n')
# get column '1', '2', from '2nd' as a row
print(df[["1", "2"]].loc["2nd"])
Output:
1
2
3
4
5
6
7
8
9
1 2 3 4
1st attk 5 8 6 6
defend 2 1 8 7
2nd attk 3 7 2 5
defend 8 8 2 8
1 2
attk 3 7
defend 8 8
Pivot Table
- pandas.pivot_table: Create a spreadsheet-style pivot table as a DataFrame.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import numpy as np
import pandas as pd
df = pd.DataFrame([
['Apple', 7, 5, 'Fruit'],
['Banana', 3, 6, 'Fruit'],
['Coconut', 2, 6, 'Fruit'],
['Rice', 8, 2, 'Meal'],
['Beef', 5, 2, 'Meal'],
['Kimchi', 4, 8, 'Meal']],
columns=["Name", "Frequency", "Importance", "Type"])
print(df, '\n')
# from 'Importance', make a new Data Frame with column as 'Type' and get the max value from 'Frequency'
df = df.pivot_table(
index="Importance", columns="Type", values="Frequency",
aggfunc='max'
)
print(df)
Output:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Name Frequency Importance Type
0 Apple 7 5 Fruit
1 Banana 3 6 Fruit
2 Coconut 2 6 Fruit
3 Rice 8 2 Meal
4 Beef 5 2 Meal
5 Kimchi 4 8 Meal
Type Fruit Meal
Importance
2 NaN 8.0
5 7.0 NaN
6 3.0 NaN
8 NaN 4.0
Summary
- pd.DataFrame.apply: Apply a function along an axis of the DataFrame.
- pd.DataFrame.replace: Replace values given in to_replace with value.
- pd.DataFrame.groupby: Group DataFrame using a mapper or by a Series of columns.
- pd.DataFrame.filter: Subset the dataframe rows or columns according to the specified index labels.
- pd.DataFrameGroupBy.get_group: Construct DataFrame from group with provided name
- pandas.pivot_table: Create a spreadsheet-style pivot table as a DataFrame.
This post is licensed under CC BY 4.0 by the author.