Pandas¶
Pandas is a data manipulation library in python, used for data analysis and cleaning.
It provides two primary structures:
- Series: one-dimension array like object.
- DataFrames: two-dimensional, size-mutable tabular data structure.
Install packages¶
You can use both pip or astral/uv.
Import packages¶
Usage¶
Series¶
Create series from List¶
Series:
0 1
1 2
2 3
3 4
4 5
dtype: int64
Create Series from Dict¶
Series:
a 1
b 2
dtype: int64
Working with Series indexes¶
Series:
a 10
b 20
c 30
dtype: int64
Data frames¶
Loading Data¶
Data frame from a dictionary of lists¶
Name | Age | |
---|---|---|
0 | Pedro | 33 |
1 | James | 27 |
2 | John | 52 |
Data frame from a list of dicts¶
Name | Age | |
---|---|---|
0 | Pedro | 33 |
1 | James | 27 |
2 | John | 52 |
Data frame from json¶
From json
string
Name | Age | |
---|---|---|
0 | Pedro | 33 |
Convert DataFrame back to json
'{"Name":{"0":"Pedro"},"Age":{"0":33}}'
'{"0":{"Name":"Pedro","Age":33}}'
'[{"Name":"Pedro","Age":33}]'
Dataframe attributes¶
Name object Age int64 dtype: object
Dataframe methods¶
DataFrame.head¶
Name | Age | |
---|---|---|
0 | Pedro | 33 |
1 | James | 27 |
DataFrame.info
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 3 non-null object
1 Age 3 non-null int64
dtypes: int64(1), object(1)
memory usage: 180.0+ bytes
DataFrame.describe¶
Age | |
---|---|
count | 3.000000 |
mean | 37.333333 |
std | 13.051181 |
min | 27.000000 |
25% | 30.000000 |
50% | 33.000000 |
75% | 42.500000 |
max | 52.000000 |
DataFrame.tail¶
Name | Age | |
---|---|---|
1 | James | 27 |
2 | John | 52 |
Renaming columns¶
Name | Age | |
---|---|---|
0 | Pedro | 33 |
1 | James | 27 |
2 | John | 52 |
Accessing a Series¶
0 Pedro
1 James
2 John
Name: Name, dtype: object
Accessing an row index¶
Name Pedro Age 33 Name: 0, dtype: object
Accessing a particular value¶
np.int64(27)
np.int64(52)
Dataframe manipulation¶
Name | Age | |
---|---|---|
0 | Pedro | 33 |
1 | James | 27 |
Adding a new column¶
Name | Age | City | |
---|---|---|---|
0 | Pedro | 33 | New York |
1 | James | 27 | Florida |
2 | John | 52 | Los Angeles |
Removing a column¶
Name | Age | |
---|---|---|
0 | Pedro | 33 |
1 | James | 27 |
2 | John | 52 |
Removing a row¶
Name | Age | City | |
---|---|---|---|
0 | Pedro | 33 | New York |
2 | John | 52 | Los Angeles |
Modify column value¶
Name | Age | City | |
---|---|---|---|
0 | Pedro | 34 | New York |
1 | James | 28 | Florida |
2 | John | 53 | Los Angeles |
Handling missing values¶
Name False Age True dtype: bool
Name 0 Age 1 dtype: int64
Name | Age | |
---|---|---|
1 | James | NaN |
Name | Age | |
---|---|---|
0 | Pedro | 33.0 |
1 | James | 0.0 |
2 | John | 52.0 |
Name | Age | Age_fill_NA | |
---|---|---|---|
0 | Pedro | 33.0 | 33.0 |
1 | James | NaN | 42.5 |
2 | John | 52.0 | 52.0 |
Data manipulation¶
Casting¶
Name | Age | |
---|---|---|
0 | Pedro | 33.0 |
1 | James | 27.0 |
2 | John | 52.0 |
Casting a DataFrame Series to integer
Name | Age | |
---|---|---|
0 | Pedro | 33 |
1 | James | 27 |
2 | John | 52 |
Applying function on column¶
Applying a lambda function to a DataFrame Series
Name | Age | BirthYear | |
---|---|---|---|
0 | Pedro | 33 | 1992 |
1 | James | 27 | 1998 |
2 | John | 52 | 1973 |
Aggregation¶
Make a mock dataset
order_id | product | region | price | quantity | order_date | |
---|---|---|---|---|---|---|
0 | 1 | Laptop | North | 655 | 5 | 2023-05-11 |
1 | 2 | Keyboard | East | 892 | 9 | 2023-05-11 |
2 | 3 | Phone | South | 1092 | 7 | 2023-06-23 |
3 | 4 | Keyboard | West | 1864 | 5 | 2023-05-08 |
4 | 5 | Phone | East | 1287 | 8 | 2023-06-08 |
5 | 6 | Mouse | North | 1417 | 1 | 2023-02-09 |
6 | 7 | Keyboard | West | 1299 | 2 | 2023-08-18 |
7 | 8 | Monitor | South | 282 | 4 | 2023-07-23 |
8 | 9 | Laptop | East | 236 | 8 | 2023-01-04 |
9 | 10 | Laptop | East | 844 | 1 | 2023-06-11 |
Group by a column and sum
product | quantity | |
---|---|---|
0 | Keyboard | 16 |
1 | Laptop | 14 |
2 | Monitor | 4 |
3 | Mouse | 1 |
4 | Phone | 15 |
Group by multiple columns and sum
region | product | quantity | |
---|---|---|---|
0 | East | Keyboard | 9 |
1 | East | Laptop | 9 |
2 | East | Phone | 8 |
3 | North | Laptop | 5 |
4 | North | Mouse | 1 |
5 | South | Monitor | 4 |
6 | South | Phone | 7 |
7 | West | Keyboard | 7 |
Aggregate multiple functions
product | mean | sum | count | |
---|---|---|---|---|
0 | Keyboard | 5.333333 | 16 | 3 |
1 | Laptop | 4.666667 | 14 | 3 |
2 | Monitor | 4.000000 | 4 | 1 |
3 | Mouse | 1.000000 | 1 | 1 |
4 | Phone | 7.500000 | 15 | 2 |
Merging and Joining DataFrames¶
Merging inner
\(A \cap B\)
Key | Value_x | Value_y | |
---|---|---|---|
0 | A | 1 | 4 |
1 | B | 2 | 5 |
Merging outer
\(A \cup B\)
Key | Value_x | Value_y | |
---|---|---|---|
0 | A | 1.0 | 4.0 |
1 | B | 2.0 | 5.0 |
2 | C | 3.0 | NaN |
3 | D | NaN | 6.0 |
Merging left
Key | Value_x | Value_y | |
---|---|---|---|
0 | A | 1 | 4.0 |
1 | B | 2 | 5.0 |
2 | C | 3 | NaN |
Merging right
Key | Value_x | Value_y | |
---|---|---|---|
0 | A | 1.0 | 4 |
1 | B | 2.0 | 5 |
2 | D | NaN | 6 |