🐼 Creating a Pandas DataFrame¶
Pandas DataFrame is a two-dimensional size-mutable, heterogeneous tabular data structure that allows you to manipulate and analyze data. This tutorial will guide you on how to create a DataFrame in several ways.
1. Importing the Pandas library¶
Before creating a DataFrame, you need to import the pandas library. This is typically imported under the pd
alias.
import pandas as pd
2. Creating a DataFrame from a Dictionary¶
One common way to create a DataFrame is from a dictionary of equal-length lists or NumPy arrays.
data = {
'name': ['John', 'Anna', 'Peter', 'Linda'],
'age': [28, 24, 35, 32],
'city': ['New York', 'Paris', 'Berlin', 'London']
}
df = pd.DataFrame(data)
df
name | age | city | |
---|---|---|---|
0 | John | 28 | New York |
1 | Anna | 24 | Paris |
2 | Peter | 35 | Berlin |
3 | Linda | 32 | London |
The keys of the dictionary become the DataFrame column names.
3. Creating a DataFrame from a List of Dictionaries¶
You can also create a DataFrame from a list of dictionaries.
data = [
{'name': 'John', 'age': 28, 'city': 'New York'},
{'name': 'Anna', 'age': 24, 'city': 'Paris'},
{'name': 'Peter', 'age': 35, 'city': 'Berlin'},
{'name': 'Linda', 'age': 32, 'city': 'London'}
]
df = pd.DataFrame(data)
df
name | age | city | |
---|---|---|---|
0 | John | 28 | New York |
1 | Anna | 24 | Paris |
2 | Peter | 35 | Berlin |
3 | Linda | 32 | London |
Each item in the list corresponds to a row in the DataFrame.
4. Creating a DataFrame from a List of Lists¶
You can also create a DataFrame from a list of lists, where each sublist is a row in the DataFrame.
data = [
['John', 28, 'New York'],
['Anna', 24, 'Paris'],
['Peter', 35, 'Berlin'],
['Linda', 32, 'London']
]
df = pd.DataFrame(data, columns=['name', 'age', 'city'])
df
name | age | city | |
---|---|---|---|
0 | John | 28 | New York |
1 | Anna | 24 | Paris |
2 | Peter | 35 | Berlin |
3 | Linda | 32 | London |
Here, the columns
parameter is used to specify the column names.
5. Creating a DataFrame from a NumPy Array¶
If you have numerical data, it might be stored as a NumPy array. You can also create a DataFrame from a NumPy array.
import numpy as np
data = np.array([[1, 2, 3], [4, 5, 6]])
df = pd.DataFrame(data, columns=['A', 'B', 'C'])
df
A | B | C | |
---|---|---|---|
0 | 1 | 2 | 3 |
1 | 4 | 5 | 6 |
df.to_csv('data.csv', index=False)
6.2 Read CSV to DataFrame¶
df = pd.read_csv('data.csv')
df
A | B | C | |
---|---|---|---|
0 | 1 | 2 | 3 |
1 | 4 | 5 | 6 |
Replace 'data.csv'
with the path to the CSV file you want to read/write. If the CSV file is in the same directory as your Python script or Jupyter notebook, you only need to provide the name of the file. If it's located elsewhere, you need to provide the full path to the file.
This tutorial has covered several ways to create a DataFrame, but these are by no means the only ways. Other methods exist, such as creating a DataFrame from a SQL query, but those are beyond the scope of this basic tutorial. The method you choose will depend on the structure and format of your data.