A Pandas DataFrame is a two - dimensional size - mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). It is similar to a spreadsheet or a SQL table. Each column in a DataFrame can have a different data type (e.g., integer, float, string).
import pandas as pd
# Create a simple DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
}
df = pd.DataFrame(data)
print(df)
In this example, we create a DataFrame with two columns Name
and Age
.
You can create a DataFrame from a one - dimensional list. In this case, each element of the list will become a row in a single - column DataFrame.
import pandas as pd
my_list = [10, 20, 30, 40]
df = pd.DataFrame(my_list)
print(df)
When using a two - dimensional list, each inner list represents a row in the DataFrame.
import pandas as pd
my_2d_list = [['Alice', 25], ['Bob', 30], ['Charlie', 35]]
df = pd.DataFrame(my_2d_list, columns=['Name', 'Age'])
print(df)
A common way to create a DataFrame is by using a dictionary where the keys are the column names and the values are lists of data.
import pandas as pd
data = {
'Fruits': ['Apple', 'Banana', 'Cherry'],
'Quantity': [5, 10, 15]
}
df = pd.DataFrame(data)
print(df)
You can also create a DataFrame from a list of dictionaries. Each dictionary in the list represents a row in the DataFrame.
import pandas as pd
data = [
{'Name': 'Alice', 'Age': 25},
{'Name': 'Bob', 'Age': 30},
{'Name': 'Charlie', 'Age': 35}
]
df = pd.DataFrame(data)
print(df)
Multi - index DataFrames allow you to have hierarchical indexing on rows or columns.
import pandas as pd
# Create a multi - index DataFrame
arrays = [
['bar', 'bar', 'baz', 'baz'],
['one', 'two', 'one', 'two']
]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]], index=index, columns=['A', 'B'])
print(df)
Before creating a DataFrame, ensure that your data is clean and consistent. For example, if you are creating a DataFrame with numerical columns, make sure all values in those columns can be converted to numbers.
Use descriptive column names. This makes it easier to understand the data and perform operations on the DataFrame later.
If you are dealing with large datasets, consider the data types of your columns. Using appropriate data types (e.g., int8
instead of int64
if possible) can significantly reduce memory usage.
import pandas as pd
data = {
'SmallNumbers': [1, 2, 3, 4],
'BigNumbers': [1000, 2000, 3000, 4000]
}
df = pd.DataFrame(data)
df['SmallNumbers'] = df['SmallNumbers'].astype('int8')
print(df.info())
Creating Pandas DataFrames from scratch is a crucial skill for data analysis in Python. Whether you are working with simple lists or more complex multi - index structures, Pandas provides flexible ways to build custom DataFrames. By following common and best practices, you can ensure that your DataFrames are clean, efficient, and easy to work with.