🐼 Pandas Data Indexing¶
Indexing is a technique in pandas that allows you to access or modify the data in your DataFrame or Series. This tutorial will walk you through the basics of indexing in pandas.
1. Importing the Pandas library¶
First, you need to import the pandas library, typically done under the pd
alias.
import pandas as pd
2. Creating a DataFrame¶
Let's create a simple DataFrame for this tutorial:
data = {
'name': ['John', 'Anna', 'Peter', 'Linda'],
'age': [28, 24, 35, 32],
'city': ['New York', 'Paris', 'Berlin', 'London']
}
df = pd.DataFrame(data)
df
name | age | city | |
---|---|---|---|
0 | John | 28 | New York |
1 | Anna | 24 | Paris |
2 | Peter | 35 | Berlin |
3 | Linda | 32 | London |
3. Column Indexing¶
You can index a column in a DataFrame using its column label with a syntax similar to dictionary key indexing:
df['name']
0 John 1 Anna 2 Peter 3 Linda Name: name, dtype: object
You can also use the .
(dot) operator, which is more convenient but only works if the column name doesn't contain spaces or special characters and doesn't conflict with DataFrame methods:
df.name
0 John 1 Anna 2 Peter 3 Linda Name: name, dtype: object
4. Row Indexing¶
For row indexing, you use the loc
and iloc
indexer.
loc
is label-based data selection method which means that we have to pass the name of the row or column which we want to select. This method includes the last element of the range passed in it, unlike iloc.
df.loc[0] # returns the first row
name John age 28 city New York Name: 0, dtype: object
iloc
is a indexed-based selection method which means that we have to pass integer index in the method to select specific row/column. This method does not include the last element of the range passed in it unlike loc.
df.iloc[0] # returns the first row
name John age 28 city New York Name: 0, dtype: object
Note that the two commands above return the same result because the DataFrame uses integer indices. If the DataFrame used string labels for indices, the results could be different.
5. Multi-axis Indexing¶
You can use loc
and iloc
to index both rows and columns at the same time. The syntax is df.loc[[row_indexer], [col_indexer]]
.
df.loc[0, 'name'] # returns the name of the first person
'John'
With iloc
, you'd use integer indices for both the row and column:
df.iloc[0, 0] # returns the name of the first person
'John'
6. Boolean Indexing¶
You can also use a condition to index data. This is very useful when you want to filter rows that satisfy a certain condition:
df[df.age > 30] # returns people older than 30
name | age | city | |
---|---|---|---|
2 | Peter | 35 | Berlin |
3 | Linda | 32 | London |
7. Setting Indices¶
DataFrame has a set_index
method which takes a column name (or a list of column names) to use as the index of the DataFrame:
df.set_index('name', inplace=True)
df
age | city | |
---|---|---|
name | ||
John | 28 | New York |
Anna | 24 | Paris |
Peter | 35 | Berlin |
Linda | 32 | London |
Now you can use loc
with people's names:
df.loc['John']
age 28 city New York Name: John, dtype: object
This concludes our brief tutorial on the basics of pandas indexing. With these indexing techniques, you can effectively manipulate and access data in your DataFrame.