【问题标题】:UserWarning: Pandas doesn't allow columns to be created via a new attribute nameUserWarning: Pandas doesn\'t allow columns to be created via a new attribute name
【发布时间】:2022-08-25 05:52:19
【问题描述】:

I am stuck with my pandas script.

Actually , i am working with two csv file(one input and the other output file). i want to copy all the rows of two column and want to make calculation and then copy it to another dataframe (output file).

The columns are as follows :

'lat', 'long','PHCount', 'latOffset_1', 'longOffset_1','PH_Lat_1', 'PH_Long_1', 'latOffset_2', 'longOffset_2', 'PH_Lat_2', 'PH_Long_2', 'latOffset_3', 'longOffset_3','PH_Lat_3', 'PH_Long_3',  'latOffset_4', 'longOffset_4','PH_Lat_4', 'PH_Long_4'.

i want to take the column 'lat' and 'latOffset_1' , do some calculation and put it in another new column('PH_Lat_1') which i have already created.

My function is :

def calculate_latoffset(latoffset):  #Calculating Lat offset.
    a=(df2['lat']-(2*latoffset))
    return a

The main code :

for i in range(1,5):
        print(i)
        a='PH_lat_%d' % i 
        print (a)
        b='latOffset_%d' % i
        print (b)
        df2.a = df2.apply(lambda x: calculate_latoffset(x[b]), axis=1)

Since the column name just differ by (1,2,3,4). so i want to call the function calculate_latoffset and calculate the all the rows of all the columns(PH_Lat_1, PH_Lat_2, PH_Lat_3,PH_Lat_4) in one go.

When using the above code i am getting this error :

basic_conversion.py:46: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
  df2.a = df2.apply(lambda x: calculate_latoffset(x[b]), axis=1)

is it possible ? Please kindly help

【问题讨论】:

标签: python pandas dataframe indexing lambda


【解决方案1】:

Simply use df2['a'] instead of df2.a

【讨论】:

    【解决方案2】:

    This is a Warning not an Error, so your code could still run through, but probably not following your intention.

    1. Short answer: To create a new column for DataFrame,never use attribute access, the correct way is touse either [] or .loc indexing:

      >>> df
         a  b
      0  7  6
      1  5  8
      >>> df['c'] = df.a + df.b 
      >>> # OR
      >>> df.loc[:, 'c'] = df.a + df.b
      >>> df # c is an new added column
         a  b   c
      0  7  6  13
      1  5  8  13
      

      More explaination, Seires and DataFrame are core classes and data structures in pandas, and of course they are Python classes too, so there are some minor distinction when involving attribute access between pandas DataFrame and normal Python objects. But it's well documented and can be easily understood. Just few points to note:

      1. In Python, users may dynamically add data attributes of their own to an instance object using attribute access.

        >>> class Dog(object):
        ...     pass
        >>> dog = Dog()
        >>> vars(dog)
        {}
        >>> superdog = Dog()
        >>> vars(superdog)
        {}
        >>> dog.legs = 'I can run.'
        >>> superdog.wings = 'I can fly.'
        >>> vars(dog)
        {'legs': 'I can run.'}
        >>> vars(superdog)
        {'wings': 'I can fly.'}
        
      2. In pandas,indexandcolumnare closely related to the data structure, you mayaccessan index on a Series, column on a DataFrameas an attribute.

        >>> import pandas as pd
        >>> import numpy as np
        >>> data = np.random.randint(low=0, high=10, size=(2,2))
        >>> df = pd.DataFrame(data, columns=['a', 'b'])
        >>> df
           a  b
        0  7  6
        1  5  8
        >>> vars(df)
        {'_is_copy': None, 
         '_data': BlockManager
            Items: Index(['a', 'b'], dtype='object')
            Axis 1: RangeIndex(start=0, stop=2, step=1)
            IntBlock: slice(0, 2, 1), 2 x 2, dtype: int64,
         '_item_cache': {}}
        
      3. But, pandas attribute access ismainly a convinience for reading from and modifying an existing elementof a Series or column of a DataFrame.

        >>> df.a
        0    7
        1    5
        Name: a, dtype: int64
        >>> df.b = [1, 1]
        >>> df
           a  b
        0  7  1
        1  5  1
        
      4. And, the convinience is a tradeoff for full functionality. E.g. you can create a DataFrame object with column names ['space bar', '1', 'loc', 'min', 'index'], but you can't access them as an attribute, because they are either not a valid Python identifier 1, space bar or conflicts with an existing method name.

        >>> data = np.random.randint(0, 10, size=(2, 5))
        >>> df_special_col_names = pd.DataFrame(data, columns=['space bar', '1', 'loc', 'min', 'index'])
        >>> df_special_col_names
           space bar  1  loc  min  index
        0          4  4    4    8      9
        1          3  0    1    2      3
        
      5. In these cases, the .loc, .iloc and [] indexing is the defined way to fullly access/operate index and columns of Series and DataFrame objects.

        >>> df_special_col_names['space bar']
        0    4
        1    3
        Name: space bar, dtype: int64
        >>> df_special_col_names.loc[:, 'min']
        0    8
        1    2
        Name: min, dtype: int64
        >>> df_special_col_names.iloc[:, 1]
        0    4
        1    0
        Name: 1, dtype: int64
        
      6. As to the topic, to create a new column for DataFrame, as you can see, df.c = df.a + df.b justcreated an new attribute along side to the core data structure, sostarting from version 0.21.0 and later, this behavior will raise a UserWarning (silent no more).

        >>> df
           a  b
        0  7  1
        1  5  1
        >>> df.c = df.a + df.b
        __main__:1: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
        >>> df['d'] = df.a + df.b
        >>> df
           a  b  d
        0  7  1  8
        1  5  1  6
        >>> df.c
        0    8
        1    6
        dtype: int64
        >>> vars(df)
        {'_is_copy': None, 
         '_data': 
            BlockManager
            Items: Index(['a', 'b', 'd'], dtype='object')
            Axis 1: RangeIndex(start=0, stop=2, step=1)
            IntBlock: slice(0, 2, 1), 2 x 2, dtype: int64
            IntBlock: slice(2, 3, 1), 1 x 2, dtype: int64, 
         '_item_cache': {},
         'c': 0    8
              1    6
              dtype: int64}
        
      7. Finally, back to the Short answer.

    【讨论】:

      【解决方案3】:

      The solution I can think of is to use .loc to get the column. You can try df.loc[:,a] instead of df.a. Pandas dataframe columns cannot be created using the dot method to avoid potential conflicts with the dataframe attributes. Hope this helps

      【讨论】:

        【解决方案4】:

        although all other answers are likely a much better solution, i figured that it does no harm to just ignore it move on.

        import warnings
        warnings.filterwarnings("ignore","Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access", UserWarning)
        

        using the code above, the script will just disregard the warning and move on.

        【讨论】:

          【解决方案5】:

          In df2.apply(lambda x: calculate_latoffset(x[b]), axis=1) you are creating a 5 column dataframe and you were trying to assign the value to a single field. Do df2[a] = calculate_latoffset(df2[b]) instead should deliver the desired output.

          【讨论】:

            猜你喜欢
            • 2022-12-01
            • 2022-12-27
            • 2022-12-02
            • 2022-12-28
            • 2022-12-26
            • 2022-12-02
            • 2017-01-22
            • 2022-12-02
            • 2023-04-07
            相关资源
            最近更新 更多