【问题标题】:Extract index values of a multi-index dataframe as a simple list in python将多索引数据帧的索引值提取为python中的简单列表
【发布时间】:2021-04-16 01:01:56
【问题描述】:

我从 pandas 数据框中提取了索引值,并希望将它们作为列添加到新数据框中。但是python抛出一个错误,表明提取的索引具有结构(行x列)作为从中提取的数据帧。

如何将数据框的索引值提取为可用作普通列表的简单列表?

错误:

ValueError: Shape of passed values is (10, 1), indices imply (10, 10)

我尝试了什么:

## 1
pd.DataFrame(subset_df.index, subset_df[var], percentiles, percentiles_main)

## 2
ix = subset_df.index.get_level_values('College').tolist()
pd.DataFrame(ix, subset_df[var], percentiles, percentiles_main)

## 3
ix =  [i for i in subset_df.index.get_level_values('College')]
pd.DataFrame(ix, subset_df[var], percentiles, percentiles_main)

## 4
ix =  [i for i in subset_df.index.get_level_values('College').values]

## 5
ix =  [i for i in subset_df.index.get_level_values('College').values.tolist()]

## 6
ix =  subset_df.index.get_level_values('College').to_numpy()

## 7
ix = [i for i in subset_df.index.get_level_values('College').array]

## 8
pd.DataFrame(pd.IndexSlice[ix], percentiles, percentiles_main)

## 9
import operator
index = subset_df.index.tolist()
desired_index = list(set(map(operator.itemgetter(1), index)))
pd.DataFrame(desired_index, ptiles, ptiles_main)

上述所有方法都给出了相同的 ValueError。

重现问题:

import numpy as np
import pandas as pd

# Import data
url = "https://statlearning.com/College.csv"
dfo = pd.read_csv(url)
dfo.head(1)

# Add college names as 2nd index
df = dfo.set_index('Unnamed: 0', append=True)
df.rename_axis(index=['SN', 'College'], inplace=True)

# Created a subset of dataframe
subset_df = df.sort_values(by='Top10perc', axis=0, ascending=False)[0:10]
subset_df

# Calculation of percentiles
from scipy.stats import percentileofscore as prtl
ptiles_main = [round(prtl(df['Top10perc'],i,'weak'),2) for i in subset_df['Top10perc']]
ptiles = [round(prtl(df['Grad.Rate'],i,'weak'),2) for i in subset_df['Grad.Rate']]

# Creating a new dataframe with college names and percentiles
## this is where I'm getting ValueError
pd.DataFrame(subset_df.index.get_level_values('College').values.tolist(), ptiles, ptiles_main)
#> ValueError: Shape of passed values is (10, 1), indices imply (10, 10)

# this is the output without trying to add index
pd.DataFrame(ptiles, ptiles_main)
#             0
# 100.00  94.98
# 99.87   76.06
# 99.87   99.87
# 99.87   98.58
# 99.49   97.30
# 99.49   98.58
# 99.49   99.87
# 99.10   61.39
# 98.97   97.94
# 98.97   97.30

期望的输出:

我的问题有 2 个部分:
(更重要的部分)
1) 如何将数据帧的索引值提取为可以以各种方式使用的简单列表可以使用普通列表

(次要部分)
2) 如何在ptile_df中添加大学名称

【问题讨论】:

  • 你能发布一些数据和预期的输出吗?
  • 我已经发布了数据和想要的输出。

标签: python pandas list dataframe indexing


【解决方案1】:

由于您尝试创建数据框的方式而出现错误。试试这个方法:

pd.DataFrame({'College':subset_df.index.get_level_values('College').tolist(), 
              'Grad.Rate':subset_df['Grad.Rate'].values,
              'Percentile':ptiles, 'Percentile_main':ptiles_main})

ptile_df = pd.concat([pd.Series(subset_df.index.get_level_values('College')), 
           pd.Series(subset_df['Grad.Rate'].values), pd.Series(ptiles), 
           pd.Series(ptiles_main)], axis=1)
ptile_df.columns = ['College','Grad.Rate','Percentile','Percentile_main']

【讨论】:

    【解决方案2】:

    如果你的数据集是这样的:

    arrays = [np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
              np.array(["one", "two", "one", "two", "one", "two", "one", "two"])]
    df = pd.Series(np.random.randn(8), index=arrays)
    
    bar  one    1.421473
         two    0.298886
    baz  one    1.538157
         two   -0.229495
    foo  one    2.686094
         two    1.177376
    qux  one    1.550625
         two   -0.142154
    

    如果要获取第一个索引作为列表,可以执行以下操作:

    import operator
    index = df.index.tolist()
    print(index)
    [('bar', 'one'), ('bar', 'two'), ('baz', 'one'), ('baz', 'two'), ('foo', 'one'), ('foo', 'two'), ('qux', 'one'), ('qux', 'two')]
    
    desired_index = list(set(map(operator.itemgetter(0), index)))
    print(desired_index)
    ['qux', 'baz', 'foo', 'bar']
    

    【讨论】:

    • 这种方法也给出了同样的ValueError ValueError: Shape of passed values is (10, 1), indices imply (10, 10)我的代码:import operatorindex = subset_df.index.tolist()desired_index = list(set(map(operator.itemgetter(1), index)))pd.DataFrame(desired_index, ptiles, ptiles_main)
    【解决方案3】:

    另一种方法:

    ptile_df = pd.DataFrame(
        np.column_stack([subset_df.index.get_level_values('College').tolist(), 
                         subset_df['Grad.Rate'], ptiles, ptiles_main]))
    ptile_df.columns = ['College', 'Grad.Rate', 'Percentile', 'Percentile_Main']
    ptile_df
    

    【讨论】:

      【解决方案4】:

      这是你想要做的吗?

      print(df)
      print('______________')
      
      index_list = [i for i in range(len(df))]
      df["index"] = index_list
      print(df)
      

      【讨论】:

      • 这不是我要找的。​​span>
      猜你喜欢
      • 1970-01-01
      • 2021-01-12
      • 2019-01-22
      • 1970-01-01
      • 2022-12-17
      • 1970-01-01
      • 1970-01-01
      • 2021-01-28
      • 2019-01-02
      相关资源
      最近更新 更多