A DataFrame consists of three main components: rows, columns, and values. Each column has a name (label), and each row has an index. The values can be of different data types such as integers, floats, strings, etc.
The index in a DataFrame can be either a simple integer index or a custom index. It is used to identify and access rows in the DataFrame.
Columns in a DataFrame are like series. Each column can have a different data type, and they can be accessed and manipulated independently.
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
data = [
['Alice', 25, 'New York'],
['Bob', 30, 'Los Angeles'],
['Charlie', 35, 'Chicago']
]
columns = ['Name', 'Age', 'City']
df = pd.DataFrame(data, columns=columns)
print(df)
# Select a single column
ages = df['Age']
print(ages)
# Select multiple columns
name_age = df[['Name', 'Age']]
print(name_age)
# Select a single row by index
first_row = df.loc[0]
print(first_row)
# Select a range of rows
rows_1_to_2 = df.loc[1:2]
print(rows_1_to_2)
# Select rows where age is greater than 30
above_30 = df[df['Age'] > 30]
print(above_30)
# Add a new column
df['Country'] = ['USA', 'USA', 'USA']
print(df)
# Remove a column
df = df.drop('Country', axis=1)
print(df)
# Sort the DataFrame by age in ascending order
sorted_df = df.sort_values(by='Age')
print(sorted_df)
import numpy as np
# Create a DataFrame with missing values
data = {
'Name': ['Alice', 'Bob', np.nan],
'Age': [25, np.nan, 35]
}
df = pd.DataFrame(data)
# Check for missing values
print(df.isnull())
# Drop rows with missing values
df = df.dropna()
print(df)
data = {
'Category': ['A', 'B', 'A', 'B'],
'Value': [10, 20, 30, 40]
}
df = pd.DataFrame(data)
# Group by category and calculate the sum
grouped = df.groupby('Category').sum()
print(grouped)
np.int8
or np.int16
instead of np.int64
.DataFrames in Pandas are a versatile and powerful data structure that can handle a wide range of data analysis and manipulation tasks. By understanding the fundamental concepts, learning how to create, select, and manipulate data, and following common and best practices, you can efficiently work with data using Pandas DataFrames. Whether you are dealing with small datasets or large - scale data, Pandas DataFrames provide a flexible and intuitive way to analyze and transform your data.