Series
and DataFrame
, which allow users to efficiently handle and analyze structured data. Whether you’re working with small datasets for personal projects or large - scale enterprise data, mastering Pandas can significantly enhance your data analysis capabilities.A Series
in Pandas is a one - dimensional labeled array capable of holding any data type (integers, strings, floating - point numbers, Python objects, etc.). It is similar to a column in a spreadsheet or a one - dimensional array.
import pandas as pd
# Create a Series
s = pd.Series([1, 3, 5, np.nan, 6, 8])
print(s)
A DataFrame
is a two - dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or a SQL table.
import pandas as pd
import numpy as np
# Create a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'London', 'Paris']
}
df = pd.DataFrame(data)
print(df)
Pandas can read data from various file formats such as CSV, Excel, JSON, etc.
# Read a CSV file
df = pd.read_csv('data.csv')
print(df.head())
You can select specific columns, rows, or filter data based on certain conditions.
# Select a single column
ages = df['Age']
print(ages)
# Filter data based on a condition
filtered_df = df[df['Age'] > 30]
print(filtered_df)
You can perform operations like adding columns, modifying values, etc.
# Add a new column
df['NewColumn'] = df['Age'] * 2
print(df)
Missing values are common in real - world datasets. Pandas provides methods to handle them.
# Check for missing values
print(df.isnull().sum())
# Fill missing values with a specific value
df = df.fillna(0)
You can group data based on one or more columns and perform aggregation operations.
# Group by a column and calculate the mean
grouped = df.groupby('City')['Age'].mean()
print(grouped)
You can combine multiple DataFrames using methods like merge
and join
.
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value2': [4, 5, 6]})
merged = pd.merge(df1, df2, on='key')
print(merged)
Use meaningful variable names and add comments to your code.
# This code reads a CSV file and prints the first few rows
data = pd.read_csv('data.csv')
print(data.head())
Use vectorized operations instead of loops whenever possible.
# Vectorized operation
df['NewColumn'] = df['Age'] + 10
# Avoid using loops for simple operations
Python Pandas is a powerful library that simplifies data analysis tasks. By understanding the fundamental concepts, usage methods, common practices, and best practices, you can efficiently analyze and manipulate data. Whether you’re a beginner or an experienced data analyst, Pandas provides the tools you need to handle diverse datasets and gain valuable insights.