Pandas is an open - source Python library built on top of NumPy. It provides high - performance, easy - to - use data structures and data analysis tools. The two main data structures in Pandas are Series
and DataFrame
. A Series
is a one - dimensional labeled array capable of holding any data type, while a DataFrame
is a two - dimensional labeled data structure with columns of potentially different types.
import pandas as pd
import numpy as np
# Creating a Series
s = pd.Series([1, 3, 5, np.nan, 6, 8])
print("Series:")
print(s)
# Creating a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print("\nDataFrame:")
print(df)
# Reading a CSV file
df = pd.read_csv('example.csv')
# Selecting columns
subset = df[['Name', 'Age']]
# Filtering rows
filtered_df = df[df['Age'] > 30]
# Adding a new column
df['Age_in_10_years'] = df['Age'] + 10
import matplotlib.pyplot as plt
import seaborn as sns
# Using Matplotlib to create a simple line plot
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
plt.xlabel('X - axis')
plt.ylabel('Y - axis')
plt.title('Simple Line Plot')
plt.show()
# Using Seaborn to create a scatter plot
sns.scatterplot(x='Age', y='Income', data=df)
plt.show()
# Handling missing values
df = df.dropna() # Drop rows with missing values
df = df.fillna(df.mean()) # Fill missing values with the mean
# Removing duplicates
df = df.drop_duplicates()
# Grouping by a column and calculating the mean
grouped = df.groupby('City')['Age'].mean()
print(grouped)
for
loop to add two columns in a DataFrame, use df['col1'] + df['col2']
.dask
which is a parallel computing library that can scale Pandas operations.df1
, use a name like customer_data
.The PyData ecosystem, with Pandas at its core, provides a rich set of tools for data analysis, manipulation, and visualization. By understanding the fundamental concepts, learning the usage methods, following common practices, and implementing best practices, you can efficiently use these tools to handle complex data - related tasks. Whether you are a beginner or an experienced data scientist, the PyData ecosystem offers a wide range of capabilities to meet your needs.