How to handle Data, and Images(17) Pandas Operation and Functions
Pandas Library's operation and functions
Lesson Notes in .ipynb file
How to handle Data, and Images(17) - Pandas Operation and Functions
Topics
- Dealing with Null values
- Series Operation
- Data Frame’s math operation
- Data Frame’s Sum
- Data Frame’ sorting function
- Summary
Dealing with Null values
- pd.DataFrame.notnull: Detect existing (non-missing) values.
- pd.DataFrame.isnull: Detect missing values.
- pd.DataFrame.fillna: Fill NA/NaN values using the specified method.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import pandas as pd
import numpy as np
price_dict = {
'Apple': 1.5,
'Banana': 2,
'Carrot': np.nan,
'Durian': 5
}
amount_dict = {
'Apple': 3,
'Banana': 2,
'Carrot': 3,
'Durian': 4
}
# convert dictionary into Pandas' Series
price = pd.Series(price_dict)
amount = pd.Series(amount_dict)
# Merge two Series into Data Frame
summary = pd.DataFrame({
'Price(Dollar)': price,
'Amount': amount
})
print(summary)
print(summary.notnull())
print(summary.isnull())
# fill the NaN value with 'No Data'
summary['Price(Dollar)'] = summary['Price(Dollar)'].fillna('No Data')
print(summary)
Output:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Price(Dollar) Amount
Apple 1.5 3
Banana 2.0 2
Carrot NaN 3
Durian 5.0 4
Price(Dollar) Amount
Apple True True
Banana True True
Carrot False True
Durian True True
Price(Dollar) Amount
Apple False False
Banana False False
Carrot True False
Durian False False
Price(Dollar) Amount
Apple 1.5 3
Banana 2.0 2
Carrot No Data 3
Durian 5.0 4
Series Operation
1
2
3
4
5
6
7
8
import pandas as pd
arr1 = pd.Series([1,2,3], index=['A', 'B', 'C'])
arr2 = pd.Series([4,5,6], index=['B', 'C', 'D'])
# adds values by key
arr = arr1.add(arr2, fill_value=0)
print(arr)
Output:
1
2
3
4
5
A 1.0
B 6.0
C 8.0
D 6.0
dtype: float64
Data Frame’s math operation
Score1 | Score2 | Score1 | Score1 | Score2 | |||||
---|---|---|---|---|---|---|---|---|---|
Tom | 8 | 5 | + | Tom | 5 | = | Tom | 13 | 5 |
Bob | 9 | 7 | + | Bob | 7 | = | Bob | 16 | 7 |
+ | Charles | 8 | = | Charles | 8 | Null |
1
2
3
4
5
6
7
8
9
10
import pandas as pd
arr1 = pd.DataFrame([[1,2], [3,4]], index=['a', 'b'])
arr2 = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]], index = ['b', 'c', 'd'])
print(arr1,'\n')
print(arr2,'\n')
arr = arr1.add(arr2, fill_value=0)
print(arr)
Output:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
0초
import pandas as pd
arr1 = pd.DataFrame([[1,2], [3,4]], index=['a', 'b'])
arr2 = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]], index = ['b', 'c', 'd'])
print(arr1,'\n')
print(arr2,'\n')
arr = arr1.add(arr2, fill_value=0)
print(arr)
0 1
a 1 2
b 3 4
0 1 2
b 1 2 3
c 4 5 6
d 7 8 9
0 1 2
a 1.0 2.0 NaN
b 4.0 6.0 3.0
c 4.0 5.0 6.0
d 7.0 8.0 9.0
Data Frame’s Sum
- pd.DataFrame.sum: Return the sum of the values over the requested axis.
1
2
3
4
5
6
7
8
9
import pandas as pd
arr1 = pd.DataFrame([[1,2], [3,4]], index=['a', 'b'])
arr2 = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]], index = ['b', 'c', 'd'])
arr = arr1.add(arr2, fill_value=0)
print(arr, '\n')
print('sum of column1:', arr[1].sum(), '\n')
print(arr.sum())
Output:
1
2
3
4
5
6
7
8
9
10
11
12
0 1 2
a 1.0 2.0 NaN
b 4.0 6.0 3.0
c 4.0 5.0 6.0
d 7.0 8.0 9.0
sum of column1: 21.0
0 16.0
1 21.0
2 18.0
dtype: float64
Data Frame’ sorting function
- pd.DataFrame.sort_values: Sort by the values along either axis.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import pandas as pd
import numpy as np
price_dict = {
'Apple': 1.5,
'Banana': 2,
'Carrot': 1,
'Durian': 5
}
amount_dict = {
'Apple': 3,
'Banana': 2,
'Carrot': 3,
'Durian': 4
}
# convert dictionary into Pandas' Series
price = pd.Series(price_dict)
amount = pd.Series(amount_dict)
# Merge two Series into Data Frame
summary = pd.DataFrame({
'Price(Dollar)': price,
'Amount': amount
})
print(summary)
summary = summary.sort_values('Price(Dollar)', ascending=False)
print(summary)
Output:
1
2
3
4
5
6
7
8
9
10
Price(Dollar) Amount
Apple 1.5 3
Banana 2.0 2
Carrot 1.0 3
Durian 5.0 4
Price(Dollar) Amount
Durian 5.0 4
Banana 2.0 2
Apple 1.5 3
Carrot 1.0 3
Summary
- pd.DataFrame.notnull: Detect existing (non-missing) values.
- pd.DataFrame.isnull: Detect missing values.
- pd.DataFrame.fillna: Fill NA/NaN values using the specified method.
- pd.DataFrame.sum: Return the sum of the values over the requested axis.
- pd.DataFrame.sort_values: Sort by the values along either axis.
This post is licensed under CC BY 4.0 by the author.