본문 바로가기
카테고리 없음

[pandas] loc[...] = value returns SettingWithCopyWarning

by 단창 2020. 2. 5.

https://github.com/pandas-dev/pandas/issues/17476

 

.loc[...] = value returns SettingWithCopyWarning · Issue #17476 · pandas-dev/pandas

Code Sample # My code df.loc[0, 'column_name'] = 'foo bar' Problem description This code in Pandas 20.3 throws SettingWithCopyWarning and suggests to "Try using .loc[row_indexe...

github.com

 

Q >

Code Sample

# My code df.loc[0, 'column_name'] = 'foo bar'

Problem description

This code in Pandas 20.3 throws SettingWithCopyWarning and suggests to

"Try using .loc[row_indexer,col_indexer] = value instead".

I am already doing so, looks like there is a little bug.

 

 

 

A> 

The issue here is that you're slicing you dataframe first with .loc in line 4. The attempting to assign values to that slice.df_c = df.loc[df.encountry == country, :]

Pandas isn't 100% sure if you want to assign values to just your df_c slice, or have it propagate all the way back up to the original df. To avoid this when you first assign df_c make sure you tell pandas that it is its own data frame (and not a slice) by using

df_c = df.loc[df.encountry == country, :].copy()

Doing this will fix your error. I'll tack on a brief example to help explain the above since I've noticed a lot of users get confused by pandas in this aspect.

Example with made up data

>>> import pandas as pd >>> df = pd.DataFrame({'A':[1,2,3,4,5], 'B':list('QQQCC')}) >>> df A B 0 1 Q 1 2 Q 2 3 Q 3 4 C 4 5 C >>> df.loc[df['B'] == 'Q', 'new_col'] = 'hello' >>> df A B new_col 0 1 Q hello 1 2 Q hello 2 3 Q hello 3 4 C NaN 4 5 C NaN

So the above works as we expect! Now lets try an example that mirrors what you attempted to do with your data.

>>> df = pd.DataFrame({'A':[1,2,3,4,5], 'B':list('QQQCC')}) >>> df_q = df.loc[df['B'] == 'Q'] >>> df_q A B 0 1 Q 1 2 Q 2 3 Q >>> df_q.loc[df['A'] < 3, 'new_col'] = 'hello' /Users/riddellcd/anaconda/lib/python3.6/site-packages/pandas/core/indexing.py:337: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy self.obj[key] = _infer_fill_value(value) >>> df_q A B new_col 0 1 Q hello 1 2 Q hello 2 3 Q NaN

Looks like we hit the same error! But it changed df_q as we expected! This is because df_q is a slice of df so, even though we're using .loc[] df_q pandas is warning us that it won't propagate the changes up to df. To avoid this, we need to be more explicit and say that df_q is its own dataframe, separate from df by explicitly declaring it so.

Lets start back from df_q but use .copy() this time.

>>> df_q = df.loc[df['B'] == 'Q'].copy() >>> df_q A B 0 1 Q 1 2 Q 2 3 Q Lets try to reassign our value now! >>> df_q.loc[df['A'] < 3, 'new_col'] = 'hello' >>> df_q A B new_col 0 1 Q hello 1 2 Q hello 2 3 Q NaN

This works without an error because we've told pandas that df_q is separate from df

If you in fact do want these changes to df_c to propagate up to df thats another point entirely and will answer if you want.

반응형