如何从 Python 中等效于 R 的数据框列表中选择特定数据框？答案

【问题标题】：How to select a particular dataframe from a list of dataframes in Python equivalent to R?如何从 Python 中等效于 R 的数据框列表中选择特定数据框？
【发布时间】：2018-01-19 14:03:19
【问题描述】：

我在 R 中有一个数据帧列表，我正在尝试使用它来选择一个特定的数据帧，如下所示：
x = listOfdf$df1$df2$df3
现在，努力在 Python 中找到一种等效的方法。比如，如何从 Pandas Python 中的 DataFrame 列表中选择特定 DataFrame 的语法。

【问题讨论】：

你想要df列表吗？不是更好的数据框字典吗？
说清楚..到目前为止尝试了什么？
实际上，我正在将 R 转换为 Python。因此，我遇到了从 R 中的 df 列表中选择一个特定的 df（如问题中提到的示例）并尝试在 Python 中这样做。在 Python 中是否可能相同，需要一种等效的方法。在 Python 中，从 df 中选择特定列时，可以通过 df['colname'] 完成（而在 R 中，df$colname），同样可以这样做。

标签： r python-3.x list pandas dataframe

【解决方案1】：

找到了从数据框列表中选择特定数据框/数据框_列的解决方案。
在 R 中： x = listOfdf$df1$df2$df3 在 Python 中： x = listOfdf['df1']['df2']['df3']

谢谢你:)

【讨论】：

实际上在 R 中你可以用单括号或双括号来做同样的事情。

【解决方案2】：

我看到你已经回答了你自己的问题，那就是cool。然而，正如 jezrael 在他的评论中暗示的那样，你真的应该考虑使用字典。来自 R 的这听起来可能有点吓人（我自己去过那里，现在我在大多数方面都更喜欢 Python），但这值得你付出努力。

首先，字典是将值或变量映射到键（如名称）的一种方式。您使用大括号 { } 来构建字典，并使用方括号 [ ] 来索引它。

假设您有两个这样的数据框：

np.random.seed(123)
# Reproducible input - Dataframe 1
rows = 10
df_1 = pd.DataFrame(np.random.randint(90,110,size=(rows, 2)), columns=list('AB'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_1['dates'] = datelist 
df_1 = df_1.set_index(['dates'])
df_1.index = pd.to_datetime(df_1.index)

##%%

# Reproducible input - Dataframe 2
rows = 10
df_2 = pd.DataFrame(np.random.randint(10,20,size=(rows, 2)), columns=list('CD'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_2['dates'] = datelist 
df_2 = df_2.set_index(['dates'])
df_2.index = pd.to_datetime(df_2.index)

使用有限数量的数据框，您可以通过这种方式轻松地将它们组织在字典中：

myFrames = {'df_1': df_1,
            'df_2': df_2}

现在您有了对数据框的引用，以及您自己定义的名称或键。你会发现更详细的解释here。

这是你如何使用它：

print(myFrames['df_1'])

您还可以使用该引用来更改您的数据框之一，并将其添加到您的字典中：

df_3 = myFrames['df_1']
df_3 = df_3*10
myFrames.update({'df_3': df_3})
print(myFrames)

现在假设您有一大堆数据框，您希望以相同的方式组织这些数据框。您可以列出所有可用数据框的名称，如下所述。但是，您应该知道，出于多种原因使用eval() 通常是not recommended。

不管怎样，我们开始吧：首先你会得到一个包含所有数据框名称like this的字符串列表：

alldfs = [var for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)]

如果您同时有很多事情发生，您很可能不会对所有这些都感兴趣。因此，可以说您所有特别感兴趣的数据框的名称都以“df_”开头。您可以像这样隔离它们：

dfNames = []
for elem in alldfs:
   if str(elem)[:3] == 'df_':
       dfNames.append(elem)

现在您可以将该列表与eval() 结合使用来制作字典：

myFrames2 = {}
for dfName in dfNames:
    myFrames2[dfName] = eval(dfName)

现在您可以遍历该字典并对它们中的每一个执行一些操作。例如，您可以将每个数据帧的最后一列乘以 10，然后使用这些值创建一个新数据帧：

j = 1
for key in myFrames.keys():

    # Build new column names for your brand new df
    colName = []
    colName.append('column_' + str(j))

    if j == 1:
        # First, make a new df by referencing the dictionary
        df_new = myFrames2[key]

        # Subset the last column and make sure it doesn't
        # turn into a pandas series instead of a dataframe in the process
        df_new = df_new.iloc[:,-1].to_frame()

        # Set new column names
        df_new.columns = colName[:]
    else:
        # df_new already exists, so you can add
        # new columns and names for the rest of the columns
        df_new[colName] = myFrames2[key].iloc[:,-1].to_frame()
    j = j + 1

print(df_new)

希望你会发现这很有用！

顺便说一句...对于您的下一个问题，请提供一些可重现的代码以及您自己尝试过的解决方案的几句话。你可以阅读更多关于如何提出一个很好的问题here。

这是一个简单的复制和粘贴的全部内容：

#%%

# Imports
import pandas as pd
import numpy as np

np.random.seed(123)

# Reproducible input - Dataframe 1
rows = 10
df_1 = pd.DataFrame(np.random.randint(90,110,size=(rows, 2)), columns=list('AB'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_1['dates'] = datelist 
df_1 = df_1.set_index(['dates'])
df_1.index = pd.to_datetime(df_1.index)

##%%

# Reproducible input - Dataframe 2
rows = 10
df_2 = pd.DataFrame(np.random.randint(10,20,size=(rows, 2)), columns=list('CD'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_2['dates'] = datelist 
df_2 = df_2.set_index(['dates'])
df_2.index = pd.to_datetime(df_2.index)

print(df_1)
print(df_2)
##%%


# If you dont have that many dataframes, you can organize them in a dictionary like this:
myFrames = {'df_1': df_1,
            'df_2': df_2}  


# Now you can reference df_1 in that collecton by using:
print(myFrames['df_1'])

# You can also use that reference to make changes to one of your dataframes,
# and add that to your dictionary
df_3 = myFrames['df_1']
df_3 = df_3*10
myFrames.update({'df_3': df_3})

# And now you have a happy little family of dataframes:
print(myFrames)
##%%

# Now lets say that you have whole bunch of dataframes that you'd like to organize the same way.
# You can make a list of the names of all available dataframes like this:
alldfs = [var for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)]

##%%
# It's likely that you won't be interested in all of them if you've got a lot going on.
# Lets say that all your dataframes of interest start with 'df_'
# You get them like this:
dfNames = []
for elem in alldfs:
   if str(elem)[:3] == 'df_':
       dfNames.append(elem)

##%%
# Now you can use that list in combination with eval() to make a dictionary:
myFrames2 = {}
for dfName in dfNames:
    myFrames2[dfName] = eval(dfName)

##%%
# And now you can reference each dataframe by name in that new dictionary:
myFrames2['df_1']

##%%
#Loop through that dictionary and do something with each of them.

j = 1
for key in myFrames.keys():

    # Build new column names for your brand new df
    colName = []
    colName.append('column_' + str(j))

    if j == 1:
        # First, make a new df by referencing the dictionary
        df_new = myFrames2[key]

        # Subset the last column and make sure it doesn't
        # turn into a pandas series instead for a dataframe in the process
        df_new = df_new.iloc[:,-1].to_frame()

        # Set new column names
        df_new.columns = colName[:]
    else:
        # df_new already exists, so you can add
        # new columns and names for the rest of the columns
        df_new[colName] = myFrames2[key].iloc[:,-1].to_frame()
    j = j + 1

print(df_new)

【讨论】：