【问题标题】:Apply binning with different bin size on all dataframe columns在所有数据框列上应用具有不同 bin 大小的 binning
【发布时间】:2020-03-06 14:53:15
【问题描述】:

我有一个小问题。我有一个非常大的 df 有很多列。我正在尝试找到最有效的方法来对具有不同 bin 大小的所有列进行 bin 并创建一个新的 df。这是一个仅对单个列进行分箱的示例:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,20,size=(5, 4)), columns=list('ABCD'))
newDF = pd.cut(df.A, 2, precision=0)
newDF 
0    (9.0, 18.0]
1    (-0.0, 9.0]
2    (-0.0, 9.0]
3    (-0.0, 9.0]
4    (9.0, 18.0]
Name: A, dtype: category
Categories (2, interval[float64]): [(-0.0, 9.0] < (9.0, 18.0]]

【问题讨论】:

    标签: python pandas dataframe binning


    【解决方案1】:

    如果要单独处理每一列,请使用DataFrame.apply:

    df = pd.DataFrame(np.random.randint(0,20,size=(5, 4)), columns=list('ABCD'))
    newDF = df.apply(lambda x: pd.cut(x, 2, precision=0))
    print (newDF)
                A            B             C             D
    0  (2.0, 4.0]  (8.0, 15.0]   (7.0, 13.0]  (12.0, 18.0]
    1  (2.0, 4.0]  (8.0, 15.0]   (7.0, 13.0]  (12.0, 18.0]
    2  (4.0, 7.0]  (8.0, 15.0]  (13.0, 19.0]  (12.0, 18.0]
    3  (4.0, 7.0]  (8.0, 15.0]   (7.0, 13.0]   (5.0, 12.0]
    4  (4.0, 7.0]   (1.0, 8.0]   (7.0, 13.0]   (5.0, 12.0]
    

    如果要按相同的 bin 处理所有列,请使用 DataFrame.stack 代替 MultiIndex Series,应用 cut 并通过 Series.unstack 重新整形:

    newDF = pd.cut(df.stack(), 2, precision=0).unstack()
    print (newDF)
                  A             B             C             D
    0  (10.0, 19.0]  (10.0, 19.0]  (10.0, 19.0]  (-0.0, 10.0]
    1  (10.0, 19.0]  (10.0, 19.0]  (-0.0, 10.0]  (-0.0, 10.0]
    2  (-0.0, 10.0]  (10.0, 19.0]  (-0.0, 10.0]  (-0.0, 10.0]
    3  (-0.0, 10.0]  (-0.0, 10.0]  (10.0, 19.0]  (-0.0, 10.0]
    4  (10.0, 19.0]  (10.0, 19.0]  (-0.0, 10.0]  (-0.0, 10.0]
    

    【讨论】:

      猜你喜欢
      • 2020-05-21
      • 1970-01-01
      • 1970-01-01
      • 2018-11-09
      • 1970-01-01
      • 1970-01-01
      • 2018-03-24
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多