Sorting is the process of arranging data in a particular order, such as ascending or descending. In Pandas, we can sort a DataFrame based on one or more columns. Sorting helps in quickly identifying patterns, such as the highest or lowest values in a dataset.
Filtering involves selecting a subset of data that meets certain criteria. We can use logical conditions to filter rows in a DataFrame. For example, we can filter out all the rows where a particular column has a value greater than a certain number.
We can use the sort_values()
method to sort a DataFrame by a single column. Here is an example:
import pandas as pd
# Create a sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 22, 30, 27]
}
df = pd.DataFrame(data)
# Sort the DataFrame by the 'Age' column in ascending order
sorted_df = df.sort_values(by='Age')
print(sorted_df)
In this code, we first create a DataFrame with two columns: ‘Name’ and ‘Age’. Then we use the sort_values()
method to sort the DataFrame by the ‘Age’ column in ascending order.
We can also sort by multiple columns. The following example sorts the DataFrame first by ‘Age’ and then by ‘Name’:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 22, 30, 27]
}
df = pd.DataFrame(data)
# Sort the DataFrame by 'Age' and then by 'Name'
sorted_df = df.sort_values(by=['Age', 'Name'])
print(sorted_df)
We can filter rows based on a single condition. For example, to filter out all the rows where the ‘Age’ is greater than 25:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 22, 30, 27]
}
df = pd.DataFrame(data)
# Filter rows where 'Age' is greater than 25
filtered_df = df[df['Age'] > 25]
print(filtered_df)
In this code, we use a boolean expression df['Age'] > 25
inside the indexing operator []
to filter the DataFrame.
We can combine multiple conditions using logical operators such as &
(and) and |
(or). The following example filters rows where the ‘Age’ is greater than 25 and the ‘Name’ starts with ‘C’:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 22, 30, 27]
}
df = pd.DataFrame(data)
# Filter rows where 'Age' > 25 and 'Name' starts with 'C'
filtered_df = df[(df['Age'] > 25) & (df['Name'].str.startswith('C'))]
print(filtered_df)
When dealing with large datasets, sorting and filtering can be memory - intensive. It is advisable to use in - place sorting and filtering whenever possible. For example, we can use the inplace=True
parameter in the sort_values()
method:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 22, 30, 27]
}
df = pd.DataFrame(data)
# Sort the DataFrame in - place
df.sort_values(by='Age', inplace=True)
We can combine sorting and filtering operations. For example, first filter the data and then sort the filtered data:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 22, 30, 27]
}
df = pd.DataFrame(data)
# Filter rows where 'Age' > 25
filtered_df = df[df['Age'] > 25]
# Sort the filtered DataFrame by 'Age'
sorted_filtered_df = filtered_df.sort_values(by='Age')
print(sorted_filtered_df)
df1
and df2
, use names that describe the data, such as filtered_df
or sorted_df
.Sorting and filtering data are essential tasks in data analysis, and Pandas provides powerful and flexible methods to perform these operations. By understanding the fundamental concepts, usage methods, common practices, and best practices, you can efficiently sort and filter your data using Pandas. Whether you are dealing with small or large datasets, Pandas can help you extract the relevant information and gain insights from your data.