fillna
method in Pandas is a powerful tool for handling missing values in a DataFrame. However, there are times when users may find that fillna
doesn’t seem to work as expected. This blog post aims to explore the reasons behind this issue, provide solutions, and offer best practices for using fillna
effectively.fillna
fillna
fillna
Doesn’t Workfillna
The fillna
method in Pandas is used to fill missing values (NaN) in a DataFrame or a Series. It can replace these missing values with a scalar value, a dictionary, a Series, or another DataFrame. The method can operate on rows or columns, and it can also propagate non-null values forward or backward.
The basic syntax of fillna
is as follows:
DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
value
: The value to use for filling missing values. It can be a scalar, a dictionary, a Series, or a DataFrame.method
: The method to use for filling missing values. It can be ‘ffill’ (forward fill), ‘bfill’ (backward fill), or None.axis
: The axis along which to fill missing values. It can be 0 (rows) or 1 (columns).inplace
: If True, fill the missing values in the original DataFrame. If False, return a new DataFrame with the missing values filled.limit
: The maximum number of consecutive missing values to fill.downcast
: A dictionary of dtypes to downcast.fillna
Here are some common ways to use fillna
:
import pandas as pd
import numpy as np
# Create a sample DataFrame
data = {'A': [1, np.nan, 3], 'B': [4, 5, np.nan]}
df = pd.DataFrame(data)
# Fill missing values with 0
filled_df = df.fillna(0)
print(filled_df)
In this example, all missing values in the DataFrame are filled with 0.
# Fill missing values in column 'A' with 10 and in column 'B' with 20
fill_dict = {'A': 10, 'B': 20}
filled_df = df.fillna(fill_dict)
print(filled_df)
Here, we use a dictionary to specify different fill values for different columns.
# Forward fill missing values
filled_df = df.fillna(method='ffill')
print(filled_df)
The ‘ffill’ method fills missing values with the previous non-null value.
fillna
Doesn’t WorkThe fillna
method returns a new DataFrame by default. If you don’t assign the result to a variable, the original DataFrame remains unchanged.
import pandas as pd
import numpy as np
data = {'A': [1, np.nan, 3], 'B': [4, 5, np.nan]}
df = pd.DataFrame(data)
# This won't change the original DataFrame
df.fillna(0)
print(df)
# This will change the DataFrame
df = df.fillna(0)
print(df)
inplace
IncorrectlyIf you set inplace=True
, the fillna
method modifies the original DataFrame in place. However, it also returns None
. So, you shouldn’t assign the result to a variable.
import pandas as pd
import numpy as np
data = {'A': [1, np.nan, 3], 'B': [4, 5, np.nan]}
df = pd.DataFrame(data)
# Correct way to use inplace
df.fillna(0, inplace=True)
print(df)
# Incorrect way
df = df.fillna(0, inplace=True) # This will assign None to df
print(df)
The fillna
method only works on NaN values. If your DataFrame contains other types of missing values (e.g., ’nan’, ‘None’, etc.), you need to convert them to NaN first.
import pandas as pd
data = {'A': [1, 'nan', 3], 'B': [4, 5, 'nan']}
df = pd.DataFrame(data)
# Convert 'nan' to NaN
df = df.replace('nan', np.nan)
# Now fill the missing values
filled_df = df.fillna(0)
print(filled_df)
import pandas as pd
import numpy as np
# Create a sample DataFrame with non-NaN missing values
data = {'A': [1, 'nan', 3], 'B': [4, 5, 'nan']}
df = pd.DataFrame(data)
# Convert 'nan' to NaN
df = df.replace('nan', np.nan)
# Fill missing values with a scalar value
filled_df = df.fillna(0)
print("Filled with scalar value:")
print(filled_df)
# Fill missing values with a dictionary
fill_dict = {'A': 10, 'B': 20}
filled_df = df.fillna(fill_dict)
print("\nFilled with a dictionary:")
print(filled_df)
# Forward fill missing values
filled_df = df.fillna(method='ffill')
print("\nForward filled:")
print(filled_df)
fillna
.inplace=True
correctly. Otherwise, assign the result of fillna
to a new variable.The fillna
method in Pandas is a versatile tool for handling missing values. However, it’s important to understand its behavior and potential pitfalls. By following the best practices and being aware of the common reasons why fillna
may not work, you can effectively use this method to clean and preprocess your data.
fillna
?A1: The fillna
method returns a new DataFrame by default. You need to assign the result to a variable or use inplace=True
to modify the original DataFrame.
fillna
handle non-NaN missing values?A2: No, fillna
only works on NaN values. You need to convert non-NaN missing values (e.g., ’nan’, ‘None’) to NaN before using fillna
.
inplace=True
and inplace=False
?A3: If inplace=True
, the fillna
method modifies the original DataFrame in place and returns None
. If inplace=False
, it returns a new DataFrame with the missing values filled, leaving the original DataFrame unchanged.