【问题标题】:how to convert generated data into pandas dataframe如何将生成的数据转换为熊猫数据框
【发布时间】:2021-07-19 19:09:48
【问题描述】:
from sklearn.datasets import make_classification
         df = make_classification(n_samples=10000, n_features=9, n_classes=1, random_state = 18,
                                     class_sep=2, n_informative=4)

创建数据后。它是元组,在将元组转换为熊猫数据框后

  df = pd.DataFrame(data, columns=["1","2","3","4","5","6","7","8","9"])

所以我有 9 个特征(列),但是当我尝试插入 9 个列时,它说。

ValueError: 传递值的形状为 (2, 1),索引表示 (2, 9)

基本上我想生成数据并将其转换为 pandas 数据框,但无法获取它。 错误是:

【问题讨论】:

    标签: python pandas dataframe machine-learning scikit-learn


    【解决方案1】:

    元组的第一个条目包含特征数据,第二个条目包含类标签。因此,如果您想制作特征数据的pd.dataframe,您应该使用pd.DataFrame(df[0], columns=["1","2","3","4","5","6","7","8","9"])

    【讨论】:

      【解决方案2】:

      make_classification 返回一个包含两个 NumPy 数组的元组。只需使用元组结果的第一个结果即可。

      查看Sklearn 文档中的返回类型。

      import pandas as pd
      pd.DataFrame(df[0])
      

      结果:

                   0         1         2  ...         6         7         8
      0     1.223113 -1.962002 -0.288322  ... -2.152126  1.563291  2.790191
      1    -0.239416 -3.782512 -1.587514  ... -0.519075  1.218147 -0.543413
      2    -1.275076 -1.354999 -1.030673  ... -0.866303  1.915653  2.526826
      3    -0.516765 -2.098868 -1.034506  ...  0.470277  1.917153  0.849975
      4    -0.893197 -2.489030  1.012410  ...  3.562431  2.806255 -2.825570
      ...        ...       ...       ...  ...       ...       ...       ...
      9995 -1.665167 -1.106121 -0.381195  ...  0.543236  2.406625  2.216029
      9996 -0.783265 -1.405607  0.257606  ... -0.251951  2.167685  2.461260
      9997  2.341676 -3.382589 -0.120150  ...  0.066099  2.453412 -0.758382
      9998 -0.662257 -1.531187 -0.709562  ...  0.156203  2.495238  2.452315
      9999 -0.756892 -4.895147 -0.385215  ...  0.898117  2.624591 -2.188389
      

      加:导入和使用不匹配:

      !!! from sklearn.datasets import make_regression
      !!! df = make_classification(…)
      

      【讨论】:

        【解决方案3】:
        df = make_classification(n_samples=10000, n_features=9, n_classes=1, random_state = 18, class_sep=2, n_informative=4)
        

        此行返回一个元组,其中第一个条目具有特征值或“X”,第二个条目具有目标值。

        因此,要使其成为 pandas 数据框,您必须像这样对其进行切片,

        df = pd.DataFrame(df[0], columns=["1","2","3","4","5","6","7","8","9"])
        

        完整代码:

        from sklearn.datasets import make_classification
        import pandas as pd
        
        df = make_classification(n_samples=10000, n_features=9, n_classes=1, random_state = 18,
                                             class_sep=2, n_informative=4)
        
        df = pd.DataFrame(df[0], columns=["1","2","3","4","5","6","7","8","9"])
        print(df)
        

        输出:

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2015-06-11
          • 1970-01-01
          • 2013-09-05
          • 2023-03-18
          • 1970-01-01
          • 2021-08-21
          • 1970-01-01
          相关资源
          最近更新 更多