【问题标题】:Normalize multiple columns of list/tuple data规范化多列列表/元组数据
【发布时间】:2020-12-13 11:13:20
【问题描述】:

我有一个包含多列元组数据的数据框。我正在尝试对每列每行的元组中的数据进行规范化。这是一个带有列表的示例,但对于元组也应该是相同的概念-

df = pd.DataFrame(np.random.randn(5, 10), columns=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
df['arr1'] = df[['a', 'b', 'c', 'd', 'e']].values.tolist()
df['arr2'] = df[['f', 'g', 'h', 'i', 'j']].values.tolist()

如果我希望为几列标准化每个列表行,我会这样做-

df['arr1'] = [preprocessing.scale(row) for row in df['arr1']]
df['arr2'] = [preprocessing.scale(row) for row in df['arr2']]

但是,由于我的原始数据集中有大约 100 个这样的列,我显然不想手动对每列进行标准化。如何循环遍历所有列?

【问题讨论】:

    标签: python pandas list tuples normalization


    【解决方案1】:

    您可以像这样查看 DataFrame 中的列来处理每一列:

    for col in df.columns:
        df[col] = [preprocessing.scale(row) for row in df[col]]
    

    当然,这仅在您想要处理 DataFrame 中的所有列时才有效。如果您只想要一个子集,您可以先创建一个列列表,或者您可以删除其他列。

    # Here's an example where you manually specify the columns
    cols_to_process = ["arr1", "arr2"]
    
    for col in cols_to_process:
        df[col] = [preprocessing.scale(row) for row in df[col]]
    
    
    # Here's an example where you drop the unwanted columns first
    cols_to_drop = ["a", "b", "c"]
    df = df.drop(columns=cols_to_drop)
    
    for col in cols_to_process:
        df[col] = [preprocessing.scale(row) for row in df[col]]
    
    
    # Or, if you didn't want to actually drop the columns
    # from the original DataFrame you could do it like this:
    cols_to_drop = ["a", "b", "c"]
    for col in df.drop(columns=cols_to_drop):
        df[col] = [preprocessing.scale(row) for row in df[col]]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-05-25
      • 1970-01-01
      • 2017-12-08
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-06-04
      • 2012-01-03
      相关资源
      最近更新 更多