Перейти к содержимому

Dataframe чем отличается loc и iloc python

  • автор:

�� Разница между pandas loc и iloc: как правильно использовать?

Разница между pandas loc и iloc заключается в том, как они осуществляют доступ к данным в DataFrame. loc используется для доступа по метке (названию) строк и столбцов. Например:

import pandas as pd df = pd.DataFrame() # Доступ к строке по метке row = df.loc[1] print(row) # Доступ к элементу по метке строки и столбца element = df.loc[2, 'B'] print(element)

iloc используется для доступа по числовому индексу строк и столбцов, начиная с нуля. Например:

# Доступ к строке по числовому индексу row = df.iloc[1] print(row) # Доступ к элементу по числовому индексу строки и столбца element = df.iloc[2, 1] print(element)

Ключевое отличие между ними в том, что loc использует метки (названия), а iloc использует числовые индексы.

Детальный ответ

Pandas loc и iloc: в чем разница? Pandas — это мощная библиотека для анализа данных, которая предоставляет удобные средства для манипулирования и обработки табличных данных. Два основных метода для доступа к данным в Pandas — это loc и iloc. В этой статье мы рассмотрим, в чем их разница и как правильно использовать каждый из них. Метод loc Метод loc используется для доступа к данным по меткам (индексам) строк и столбцов. Он позволяет выбирать данные на основе их меток, а не их позиции. Давайте рассмотрим пример:

import pandas as pd data = df = pd.DataFrame(data) print(df.loc[0]) # Выводит первую строку

В этом примере мы создали DataFrame с тремя столбцами (Name, Age, City) и тремя строками. Используя метод loc, мы можем получить доступ к строке с индексом 0, который представляет собой первую строку в DataFrame. Метод iloc Метод iloc, в отличие от метода loc, используется для доступа к данным по их позиции (целочисленным индексам). Этот метод позволяет выбирать данные на основе их позиции в DataFrame. Рассмотрим пример:

import pandas as pd data = df = pd.DataFrame(data) print(df.iloc[0]) # Выводит первую строку

Здесь мы также создали DataFrame с тремя столбцами и тремя строками. Используя метод iloc, мы можем получить доступ к строке с позицией 0, то есть первую строку в DataFrame. Различия между loc и iloc Главное различие между loc и iloc заключается в том, как они выбирают данные. Когда мы используем loc, мы передаем метки (индексы) строк и столбцов для выбора данных, в то время как при использовании iloc мы передаем позиции (целочисленные индексы). Еще одно различие состоит в том, что при использовании loc можно указывать диапазоны меток, а при использовании iloc можно указывать диапазоны позиций. Рассмотрим примеры:

import pandas as pd data = df = pd.DataFrame(data) print(df.loc[0:1]) # Выводит первые две строки print(df.iloc[0:1]) # Выводит первую строку

В первом примере мы использовали метод loc с диапазоном меток от 0 до 1, и он вывел первые две строки DataFrame. Во втором примере мы использовали метод iloc с диапазоном позиций от 0 до 1, и он вывел только первую строку. Вывод В этой статье мы рассмотрели разницу между методами loc и iloc в библиотеке Pandas. Метод loc используется для доступа к данным по меткам (индексам), а метод iloc используется для доступа к данным по позициям (целочисленным индексам). У каждого из этих методов есть свои особенности и применение, которые помогут вам эффективно манипулировать данными в Pandas.

Pandas loc против iloc: в чем разница?

Когда дело доходит до выбора строк и столбцов кадра данных pandas, loc и iloc — это две часто используемые функции.

Вот тонкая разница между двумя функциями:

  • loc выбирает строки и столбцы с определенными метками
  • iloc выбирает строки и столбцы в определенных целочисленных позициях

В следующих примерах показано, как использовать каждую функцию на практике.

Пример 1: Как использовать loc в Pandas

Предположим, у нас есть следующие Pandas DataFrame:

import pandas as pd #create DataFrame df = pd.DataFrame(, index=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']) #view DataFrame df team points assists A A 5 11 B A 7 8 C A 7 10 D A 9 6 E B 12 6 F B 9 5 G B 9 9 H B 4 12 

Мы можем использовать loc для выбора определенных строк DataFrame на основе их индексных меток:

#select rows with index labels 'E' and 'F' df.loc[['E', 'F']] team points assists E B 12 6 F B 9 5 

Мы можем использовать loc для выбора определенных строк и определенных столбцов DataFrame на основе их меток:

#select 'E' and 'F' rows and 'team' and 'assists' columns df.loc[['E', 'F'], ['team', 'assists']] team assists E B 12 F B 9 

Мы можем использовать loc с аргументом : для выбора диапазонов строк и столбцов на основе их меток:

#select 'E' and 'F' rows and 'team' and 'assists' columns df.loc['E ': , :' assists'] team points assists E B 12 6 F B 9 5 G B 9 9 H B 4 12 

Пример 2: Как использовать iloc в Pandas

Предположим, у нас есть следующие Pandas DataFrame:

import pandas as pd #create DataFrame df = pd.DataFrame(, index=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']) #view DataFrame df team points assists A A 5 11 B A 7 8 C A 7 10 D A 9 6 E B 12 6 F B 9 5 G B 9 9 H B 4 12 

Мы можем использовать iloc для выбора определенных строк DataFrame на основе их целочисленной позиции:

#select rows in index positions 4 through 6 (not including 6) df.iloc [4:6] team points assists E B 12 6 F B 9 5 

Мы можем использовать iloc для выбора определенных строк и определенных столбцов DataFrame на основе их позиций в индексе:

#select rows in range 4 through 6 and columns in range 0 through 2 df.iloc [4:6, 0:2] team assists E B 12 F B 9 

Мы можем использовать loc с аргументом : для выбора диапазонов строк и столбцов на основе их меток:

#select rows from 4 through end of rows and columns up to third column df.iloc [4: , :3] team points assists E B 12 6 F B 9 5 G B 9 9 H B 4 12 

Дополнительные ресурсы

В следующих руководствах объясняется, как выполнять другие распространенные операции в pandas:

Pandas loc vs iloc

Pandas loc vs iloc

  1. Select Particular Value From DataFrame Specifying Index and Column Label Using .loc() Method
  2. Select Particular Columns From the DataFrame Using the .loc() Method
  3. Filter Rows by Applying Condition to Columns Using .loc() Method
  4. Filter Rows With Indices Using iloc
  5. Filter Particular Rows and Columns From the DataFrame
  6. Filter Range of Rows and Columns From DataFrame Using iloc
  7. Pandas loc vs iloc

This tutorial explains how we can filter data from a Pandas DataFrame using loc and iloc in Python. To filter entries from the DataFrame using iloc we use the integer index for rows and columns, and to filter entries from the DataFrame using loc , we use row and column names.

To demonstrate data filtering using loc , we will use the DataFrame described in the following example.

import pandas as pd  roll_no = [501, 502, 503, 504, 505]  student_df = pd.DataFrame(    "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],  "Age": [17, 20, 18, 21, 15],  "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],  "Grade": ["A", "B-", "B+", "A-", "A"],  >,  index=roll_no, )  print(student_df) 
 Name Age City Grade 501 Alice 17 New York A 502 Steven 20 Portland B- 503 Neesham 18 Boston B+ 504 Chris 21 Seattle A- 505 Alice 15 Austin A 

Select Particular Value From DataFrame Specifying Index and Column Label Using .loc() Method

We can pass an index label and column label as an argument to the .loc() method to extract the value corresponding to the given index and column label.

import pandas as pd  roll_no = [501, 502, 503, 504, 505]  student_df = pd.DataFrame(    "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],  "Age": [17, 20, 18, 21, 15],  "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],  "Grade": ["A", "B-", "B+", "A-", "A"],  >,  index=roll_no, ) print("The DataFrame of students with marks is:") print(student_df) print("") print("The Grade of student with Roll No. 504 is:") value = student_df.loc[504, "Grade"] print(value) 
The DataFrame of students with marks is:  Name Age City Grade 501 Alice 17 New York A 502 Steven 20 Portland B- 503 Neesham 18 Boston B+ 504 Chris 21 Seattle A- 505 Alice 15 Austin A  The Grade of student with Roll No. 504 is: A- 

It selects the value in the DataFrame with index label as 504 and column label Grade . The first argument to the .loc() method represents the index name, while the second argument refers to the column name.

Select Particular Columns From the DataFrame Using the .loc() Method

We can also filter the required columns from the DataFrame using the .loc() method. We pass the list of required column names as a second argument to the .loc() method to filter specified columns.

import pandas as pd  roll_no = [501, 502, 503, 504, 505]  student_df = pd.DataFrame(    "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],  "Age": [17, 20, 18, 21, 15],  "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],  "Grade": ["A", "B-", "B+", "A-", "A"],  >,  index=roll_no, )  print("The DataFrame of students with marks is:") print(student_df) print("") print("The name and age of students in the DataFrame are:") value = student_df.loc[:, ["Name", "Age"]] print(value) 
The DataFrame of students with marks is:  Name Age City Grade 501 Alice 17 New York A 502 Steven 20 Portland B- 503 Neesham 18 Boston B+ 504 Chris 21 Seattle A- 505 Alice 15 Austin A  The name and age of students in the DataFrame are:  Name Age 501 Alice 17 502 Steven 20 503 Neesham 18 504 Chris 21 505 Alice 15 

The first argument to the .loc() is : , which denotes all the rows in the DataFrame. Similarly we pass [«Name», «Age»] as the second argument to the .loc() method which represents to select only Name and Age columns from the DataFrame.

Filter Rows by Applying Condition to Columns Using .loc() Method

We can also filter rows satisfying the specified condition for column values using the .loc() method.

import pandas as pd  roll_no = [501, 502, 503, 504, 505]  student_df = pd.DataFrame(    "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],  "Age": [17, 20, 18, 21, 15],  "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],  "Grade": ["A", "B-", "B+", "A-", "A"],  >,  index=roll_no, )  print("The DataFrame of students with marks is:") print(student_df) print("") print("Students with Grade A are:") value = student_df.loc[student_df.Grade == "A"] print(value) 
The DataFrame of students with marks is:  Name Age City Grade 501 Alice 17 New York A 502 Steven 20 Portland B- 503 Neesham 18 Boston B+ 504 Chris 21 Seattle A- 505 Alice 15 Austin A  Students with Grade A are:  Name Age City Grade 501 Alice 17 New York A 505 Alice 15 Austin A 

It selects all the students in the DataFrame with grade A .

Filter Rows With Indices Using iloc

import pandas as pd  roll_no = [501, 502, 503, 504, 505]  student_df = pd.DataFrame(    "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],  "Age": [17, 20, 18, 21, 15],  "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],  "Grade": ["A", "B-", "B+", "A-", "A"],  >,  index=roll_no, )  print("The DataFrame of students with marks is:") print(student_df) print("") print("2nd and 3rd rows in the DataFrame:") filtered_rows = student_df.iloc[[1, 2]] print(filtered_rows) 
The DataFrame of students with marks is:  Name Age City Grade 501 Alice 17 New York A 502 Steven 20 Portland B- 503 Neesham 18 Boston B+ 504 Chris 21 Seattle A- 505 Alice 15 Austin A  2nd and 3rd rows in the DataFrame:  Name Age City Grade 502 Steven 20 Portland B- 503 Neesham 18 Boston B+ 

It filters the second and third rows from the DataFrame.

We pass the rows’ integer index as an argument to the iloc method to filter rows from the DataFrame. Here, the integer index for the second and third rows are 1 and 2 respectively, as the index starts from 0 .

Filter Particular Rows and Columns From the DataFrame

import pandas as pd  roll_no = [501, 502, 503, 504, 505]  student_df = pd.DataFrame(    "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],  "Age": [17, 20, 18, 21, 15],  "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],  "Grade": ["A", "B-", "B+", "A-", "A"],  >,  index=roll_no, )  print("The DataFrame of students with marks is:") print(student_df) print("") print("Filtered values from the DataFrame:") filtered_values = student_df.iloc[[1, 2, 3], [0, 3]] print(filtered_values) 
The DataFrame of students with marks is:  Name Age City Grade 501 Alice 17 New York A 502 Steven 20 Portland B- 503 Neesham 18 Boston B+ 504 Chris 21 Seattle A- 505 Alice 15 Austin A  Filtered values from the DataFrame:  Name Grade 502 Steven B- 503 Neesham B+ 504 Chris A- 

It filters the first and last column i.e. Name and Grade of the second, third and fourth row from the DataFrame. We pass the list with integer indices of the row as the first argument and the list with integer indices of the column as the second argument to the iloc method.

Filter Range of Rows and Columns From DataFrame Using iloc

To filter the range of rows and columns, we can use list slicing and pass the slices for each row and column as an argument to the iloc method.

import pandas as pd  roll_no = [501, 502, 503, 504, 505]  student_df = pd.DataFrame(    "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],  "Age": [17, 20, 18, 21, 15],  "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],  "Grade": ["A", "B-", "B+", "A-", "A"],  >,  index=roll_no, )  print("The DataFrame of students with marks is:") print(student_df) print("") print("Filtered values from the DataFrame:") filtered_values = student_df.iloc[1:4, 0:2] print(filtered_values) 
The DataFrame of students with marks is:  Name Age City Grade 501 Alice 17 New York A 502 Steven 20 Portland B- 503 Neesham 18 Boston B+ 504 Chris 21 Seattle A- 505 Alice 15 Austin A  Filtered values from the DataFrame:  Name Age 502 Steven 20 503 Neesham 18 504 Chris 21 

It selects the second, third and fourth rows and the first and second columns from the DataFrame. 1:4 represents the rows with an index ranging from 1 to 3 and 4 is exclusive in the range. Similarly, 0:2 represents columns with an index ranging from 0 to 1 .

Pandas loc vs iloc

To filter the rows and columns from the DataFrame using loc() , we need to pass the name of rows and columns to be filtered out. Similarly, we need to pass the integer indices of rows and columns to be filtered out to filter the values using iloc() .

import pandas as pd  roll_no = [501, 502, 503, 504, 505]  student_df = pd.DataFrame(    "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],  "Age": [17, 20, 18, 21, 15],  "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],  "Grade": ["A", "B-", "B+", "A-", "A"],  >,  index=roll_no, )  print("The DataFrame of students with marks is:") print(student_df) print("") print("Filtered values from the DataFrame using loc:") iloc_filtered_values = student_df.loc[[502, 503, 504], ["Name", "Age"]] print(iloc_filtered_values) print("") print("Filtered values from the DataFrame using iloc:") iloc_filtered_values = student_df.iloc[[1, 2, 3], [0, 3]] print(iloc_filtered_values) 
The DataFrame of students with marks is:  Name Age City Grade 501 Alice 17 New York A 502 Steven 20 Portland B- 503 Neesham 18 Boston B+ 504 Chris 21 Seattle A- 505 Alice 15 Austin A  Filtered values from the DataFrame using loc:  Name Age 502 Steven 20 503 Neesham 18 504 Chris 21  Filtered values from the DataFrame using iloc:  Name Grade 502 Steven B- 503 Neesham B+ 504 Chris A- 

It displays how we can filter the same values from DataFrame using loc and iloc .

Suraj Joshi is a backend software engineer at Matrice.ai.

Related Article — Pandas Filter

  • How to Filter Pandas DataFrame Rows by Regex
  • How to Exclude Pandas DataFrame Column
  • How to Filter Pandas DataFrame With Multiple Conditions

Copyright © 2024. All right reserved

How are iloc and loc different?

Can someone explain how these two methods of slicing are different? I’ve seen the docs and I’ve seen previous similar questions (1, 2), but I still find myself unable to understand how they are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing. For example, say we want to get the first five rows of a DataFrame . How is it that these two work?

df.loc[:5] df.iloc[:5] 

Can someone present cases where the distinction in uses are clearer? Once upon a time, I also wanted to know how these two functions differed from df.ix[:5] but ix has been removed from pandas 1.0, so I don’t care anymore.

18.9k 22 22 gold badges 109 109 silver badges 95 95 bronze badges
asked Jul 23, 2015 at 16:34
14.1k 8 8 gold badges 32 32 silver badges 55 55 bronze badges
Commented May 20, 2016 at 13:08
Note that ix is now planned for deprecation: github.com/pandas-dev/pandas/issues/14218
Commented Dec 20, 2016 at 17:57

7 Answers 7

Label vs. Location

The main distinction between the two methods is:

  • loc gets rows (and/or columns) with particular labels.
  • iloc gets rows (and/or columns) at integer locations.

To demonstrate, consider a series s of characters with a non-monotonic integer index:

>>> s = pd.Series(list("abcdef"), index=[49, 48, 47, 0, 1, 2]) 49 a 48 b 47 c 0 d 1 e 2 f >>> s.loc[0] # value at index label 0 'd' >>> s.iloc[0] # value at index location 0 'a' >>> s.loc[0:1] # rows at index labels between 0 and 1 (inclusive) 0 d 1 e >>> s.iloc[0:1] # rows at index location between 0 and 1 (exclusive) 49 a 

Here are some of the differences/similarities between s.loc and s.iloc when passed various objects:

description s.loc[] s.iloc[]
0 single item Value at index label 0 (the string ‘d’ ) Value at index location 0 (the string ‘a’ )
0:1 slice Two rows (labels 0 and 1 ) One row (first row at location 0)
1:47 slice with out-of-bounds end Zero rows (empty Series) Five rows (location 1 onwards)
1:47:-1 slice with negative step three rows (labels 1 back to 47 ) Zero rows (empty Series)
[2, 0] integer list Two rows with given labels Two rows with given locations
s > ‘e’ Bool series (indicating which values have the property) One row (containing ‘f’ ) NotImplementedError
(s>’e’).values Bool array One row (containing ‘f’ ) Same as loc
999 int object not in index KeyError IndexError (out of bounds)
-1 int object not in index KeyError Returns last value in s
lambda x: x.index[3] callable applied to series (here returning 3 rd item in index) s.loc[s.index[3]] s.iloc[s.index[3]]

loc ‘s label-querying capabilities extend well-beyond integer indexes and it’s worth highlighting a couple of additional examples.

Here’s a Series where the index contains string objects:

>>> s2 = pd.Series(s.index, index=s.values) >>> s2 a 49 b 48 c 47 d 0 e 1 f 2 

Since loc is label-based, it can fetch the first value in the Series using s2.loc[‘a’] . It can also slice with non-integer objects:

>>> s2.loc['c':'e'] # all rows lying between 'c' and 'e' (inclusive) c 47 d 0 e 1 

For DateTime indexes, we don’t need to pass the exact date/time to fetch by label. For example:

>>> s3 = pd.Series(list('abcde'), pd.date_range('now', periods=5, freq='M')) >>> s3 2021-01-31 16:41:31.879768 a 2021-02-28 16:41:31.879768 b 2021-03-31 16:41:31.879768 c 2021-04-30 16:41:31.879768 d 2021-05-31 16:41:31.879768 e 

Then to fetch the row(s) for March/April 2021 we only need:

>>> s3.loc['2021-03':'2021-04'] 2021-03-31 17:04:30.742316 c 2021-04-30 17:04:30.742316 d 

Rows and Columns

loc and iloc work the same way with DataFrames as they do with Series. It’s useful to note that both methods can address columns and rows together.

When given a tuple, the first element is used to index the rows and, if it exists, the second element is used to index the columns.

Consider the DataFrame defined below:

>>> import numpy as np >>> df = pd.DataFrame(np.arange(25).reshape(5, 5), index=list('abcde'), columns=['x','y','z', 8, 9]) >>> df x y z 8 9 a 0 1 2 3 4 b 5 6 7 8 9 c 10 11 12 13 14 d 15 16 17 18 19 e 20 21 22 23 24 

Then for example:

>>> df.loc['c': , :'z'] # rows 'c' and onwards AND columns up to 'z' x y z c 10 11 12 d 15 16 17 e 20 21 22 >>> df.iloc[:, 3] # all rows, but only the column at index location 3 a 3 b 8 c 13 d 18 e 23 

Sometimes we want to mix label and positional indexing methods for the rows and columns, somehow combining the capabilities of loc and iloc .

For example, consider the following DataFrame. How best to slice the rows up to and including ‘c’ and take the first four columns?

>>> import numpy as np >>> df = pd.DataFrame(np.arange(25).reshape(5, 5), index=list('abcde'), columns=['x','y','z', 8, 9]) >>> df x y z 8 9 a 0 1 2 3 4 b 5 6 7 8 9 c 10 11 12 13 14 d 15 16 17 18 19 e 20 21 22 23 24 

We can achieve this result using iloc and the help of another method:

>>> df.iloc[:df.index.get_loc('c') + 1, :4] x y z 8 a 0 1 2 3 b 5 6 7 8 c 10 11 12 13 

get_loc() is an index method meaning «get the position of the label in this index». Note that since slicing with iloc is exclusive of its endpoint, we must add 1 to this value if we want row ‘c’ as well.

3,402 5 5 gold badges 32 32 silver badges 45 45 bronze badges
answered Jul 23, 2015 at 16:59
Alex Riley Alex Riley
175k 46 46 gold badges 269 269 silver badges 244 244 bronze badges

Great explanation! One related question I’ve always had is what relation, if any, loc, iloc and ix have with SettingWithCopy warnings? There is some documentation but to be honest I’m still a little confused pandas.pydata.org/pandas-docs/stable/…

Commented Jul 23, 2015 at 18:36

@measureallthethings: loc , iloc and ix might still trigger the warning if they are chained together. Using the example DataFrame in the linked docs dfmi.loc[:, ‘one’].loc[:, ‘second’] triggers the warning just like dfmi[‘one’][‘second’] because a copy of data (rather than a view) might be returned by the first indexing operation.

Commented Jul 23, 2015 at 18:56

What do you use if you want to lookup a DateIndex with a Date, or something like df.ix[date, ‘Cash’] ?

Commented Apr 29, 2016 at 8:51

@cjm2671: both loc or ix should work in that case. For example, df.loc[‘2016-04-29’, ‘Cash’] will return all row indexes with that particular date from the ‘Cash’ column. (You can be as specific as you like when retrieving indexes with strings, e.g. ‘2016-01′ will select all datetimes falling in January 2016, `’2016-01-02 11’ will select datetimes on January 2 2016 with time 11. )

Commented Apr 29, 2016 at 9:18

In case you want to update this answer at some point, there are suggestions here for how to use loc/iloc instead of ix github.com/pandas-dev/pandas/issues/14218

Commented Dec 20, 2016 at 18:00

iloc works based on integer positioning. So no matter what your row labels are, you can always, e.g., get the first row by doing

df.iloc[0] 

or the last five rows by doing

df.iloc[-5:] 

You can also use it on the columns. This retrieves the 3rd column:

df.iloc[:, 2] # the : in the first position indicates all rows 

You can combine them to get intersections of rows and columns:

df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns) 

On the other hand, .loc use named indices. Let’s set up a data frame with strings as row and column labels:

df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name']) 

Then we can get the first row by

df.loc['a'] # equivalent to df.iloc[0] 

and the second two rows of the ‘date’ column by

df.loc['b':, 'date'] # equivalent to df.iloc[1:, 1] 

and so on. Now, it’s probably worth pointing out that the default row and column indices for a DataFrame are integers from 0 and in this case iloc and loc would work in the same way. This is why your three examples are equivalent. If you had a non-numeric index such as strings or datetimes, df.loc[:5] would raise an error.

Also, you can do column retrieval just by using the data frame’s __getitem__ :

df['time'] # equivalent to df.loc[:, 'time'] 

Now suppose you want to mix position and named indexing, that is, indexing using names on rows and positions on columns (to clarify, I mean select from our data frame, rather than creating a data frame with strings in the row index and integers in the column index). This is where .ix comes in:

df.ix[:2, 'time'] # the first two rows of the 'time' column 

I think it’s also worth mentioning that you can pass boolean vectors to the loc method as well. For example:

 b = [True, False, True] df.loc[b] 

Will return the 1st and 3rd rows of df . This is equivalent to df[b] for selection, but it can also be used for assigning via boolean vectors:

df.loc[b, 'name'] = 'Mary', 'John' 

15.8k 34 34 gold badges 116 116 silver badges 208 208 bronze badges
answered Jul 23, 2015 at 17:17
JoeCondron JoeCondron
8,806 3 3 gold badges 27 27 silver badges 28 28 bronze badges
Is df.iloc[:, :] equivalent to all rows and columns?
Commented May 3, 2017 at 10:03

It is, as would be df.loc[:, :] . It can be used to re-assign the values of the entire DataFrame or create a view of it.

Commented May 3, 2017 at 20:45

hi, do you know why loc and iloc take parameters in between the square parenthesis [ ] and not as a normal method in between classical parenthesis ( ) ?

Commented Jun 10, 2020 at 17:27

@MarineGalantin because they are indicating indexing and slicing operations, not standard methods. You are selecting subsets of data.

Commented Mar 14, 2022 at 15:59

@skan If I consider ‘a’ as a column, then df[‘a’] will return entire column values and df.loc[‘a’] will throw an error, because you have to provide row label as a first value. If I consider ‘a’ as a row, then df[‘a’] will throw an error and df.loc[‘a’] will return entire row values.

Commented Jan 7, 2023 at 4:08

In my opinion, the accepted answer is confusing, since it uses a DataFrame with only missing values. I also do not like the term position-based for .iloc and instead, prefer integer location as it is much more descriptive and exactly what .iloc stands for. The key word is INTEGER — .iloc needs INTEGERS.

See my extremely detailed blog series on subset selection for more

.ix is deprecated and ambiguous and should never be used

Because .ix is deprecated we will only focus on the differences between .loc and .iloc .

Before we talk about the differences, it is important to understand that DataFrames have labels that help identify each column and each index. Let’s take a look at a sample DataFrame:

df = pd.DataFrame(<'age':[30, 2, 12, 4, 32, 33, 69], 'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'], 'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'], 'height':[165, 70, 120, 80, 180, 172, 150], 'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2], 'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX'] >, index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia']) 

enter image description here

All the words in bold are the labels. The labels, age , color , food , height , score and state are used for the columns. The other labels, Jane , Nick , Aaron , Penelope , Dean , Christina , Cornelia are used for the index.

The primary ways to select particular rows in a DataFrame are with the .loc and .iloc indexers. Each of these indexers can also be used to simultaneously select columns but it is easier to just focus on rows for now. Also, each of the indexers use a set of brackets that immediately follow their name to make their selections.

.loc selects data only by labels

We will first talk about the .loc indexer which only selects data by the index or column labels. In our sample DataFrame, we have provided meaningful names as values for the index. Many DataFrames will not have any meaningful names and will instead, default to just the integers from 0 to n-1, where n is the length of the DataFrame.

There are three different inputs you can use for .loc

  • A string
  • A list of strings
  • Slice notation using strings as the start and stop values

Selecting a single row with .loc with a string

To select a single row of data, place the index label inside of the brackets following .loc .

df.loc['Penelope'] 

This returns the row of data as a Series

age 4 color white food Apple height 80 score 3.3 state AL Name: Penelope, dtype: object 

Selecting multiple rows with .loc with a list of strings

df.loc[['Cornelia', 'Jane', 'Dean']] 

This returns a DataFrame with the rows in the order specified in the list:

enter image description here

Selecting multiple rows with .loc with slice notation

Slice notation is defined by a start, stop and step values. When slicing by label, pandas includes the stop value in the return. The following slices from Aaron to Dean, inclusive. Its step size is not explicitly defined but defaulted to 1.

df.loc['Aaron':'Dean'] 

enter image description here

Complex slices can be taken in the same manner as Python lists.

.iloc selects data only by integer location

Let’s now turn to .iloc . Every row and column of data in a DataFrame has an integer location that defines it. This is in addition to the label that is visually displayed in the output. The integer location is simply the number of rows/columns from the top/left beginning at 0.

There are three different inputs you can use for .iloc

  • An integer
  • A list of integers
  • Slice notation using integers as the start and stop values

Selecting a single row with .iloc with an integer

df.iloc[4] 

This returns the 5th row (integer location 4) as a Series

age 32 color gray food Cheese height 180 score 1.8 state AK Name: Dean, dtype: object 

Selecting multiple rows with .iloc with a list of integers

df.iloc[[2, -2]] 

This returns a DataFrame of the third and second to last rows:

enter image description here

Selecting multiple rows with .iloc with slice notation

df.iloc[:5:3] 

enter image description here

Simultaneous selection of rows and columns with .loc and .iloc

One excellent ability of both .loc/.iloc is their ability to select both rows and columns simultaneously. In the examples above, all the columns were returned from each selection. We can choose columns with the same types of inputs as we do for rows. We simply need to separate the row and column selection with a comma.

For example, we can select rows Jane, and Dean with just the columns height, score and state like this:

df.loc[['Jane', 'Dean'], 'height':] 

enter image description here

This uses a list of labels for the rows and slice notation for the columns

We can naturally do similar operations with .iloc using only integers.

df.iloc[[1,4], 2] Nick Lamb Dean Cheese Name: food, dtype: object 

Simultaneous selection with labels and integer location

.ix was used to make selections simultaneously with labels and integer location which was useful but confusing and ambiguous at times and thankfully it has been deprecated. In the event that you need to make a selection with a mix of labels and integer locations, you will have to make both your selections labels or integer locations.

For instance, if we want to select rows Nick and Cornelia along with columns 2 and 4, we could use .loc by converting the integers to labels with the following:

col_names = df.columns[[2, 4]] df.loc[['Nick', 'Cornelia'], col_names] 

Or alternatively, convert the index labels to integers with the get_loc index method.

labels = ['Nick', 'Cornelia'] index_ints = [df.index.get_loc(label) for label in labels] df.iloc[index_ints, [2, 4]] 

Boolean Selection

The .loc indexer can also do boolean selection. For instance, if we are interested in finding all the rows wher age is above 30 and return just the food and score columns we can do the following:

df.loc[df['age'] > 30, ['food', 'score']] 

You can replicate this with .iloc but you cannot pass it a boolean series. You must convert the boolean Series into a numpy array like this:

df.iloc[(df['age'] > 30).values, [2, 4]] 

Selecting all rows

It is possible to use .loc/.iloc for just column selection. You can select all the rows by using a colon like this:

df.loc[:, 'color':'score':2] 

enter image description here

The indexing operator, [] , can select rows and columns too but not simultaneously.

Most people are familiar with the primary purpose of the DataFrame indexing operator, which is to select columns. A string selects a single column as a Series and a list of strings selects multiple columns as a DataFrame.

df['food'] Jane Steak Nick Lamb Aaron Mango Penelope Apple Dean Cheese Christina Melon Cornelia Beans Name: food, dtype: object 

Using a list selects multiple columns

df[['food', 'score']] 

enter image description here

What people are less familiar with, is that, when slice notation is used, then selection happens by row labels or by integer location. This is very confusing and something that I almost never use but it does work.

df['Penelope':'Christina'] # slice rows by label 

enter image description here

df[2:6:2] # slice rows by integer location 

enter image description here

The explicitness of .loc/.iloc for selecting rows is highly preferred. The indexing operator alone is unable to select rows and columns simultaneously.

df[3:5, 'color'] TypeError: unhashable type: 'slice' 

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *