我看到你已经回答了你自己的问题,那就是cool。然而,正如 jezrael 在他的评论中暗示的那样,你真的应该考虑使用字典。来自 R 的这听起来可能有点吓人(我自己去过那里,现在我在大多数方面都更喜欢 Python),但这值得你付出努力。
首先,字典是将值或变量映射到键(如名称)的一种方式。您使用大括号 { } 来构建字典,并使用方括号 [ ] 来索引它。
假设您有两个这样的数据框:
np.random.seed(123)
# Reproducible input - Dataframe 1
rows = 10
df_1 = pd.DataFrame(np.random.randint(90,110,size=(rows, 2)), columns=list('AB'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_1['dates'] = datelist
df_1 = df_1.set_index(['dates'])
df_1.index = pd.to_datetime(df_1.index)
##%%
# Reproducible input - Dataframe 2
rows = 10
df_2 = pd.DataFrame(np.random.randint(10,20,size=(rows, 2)), columns=list('CD'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_2['dates'] = datelist
df_2 = df_2.set_index(['dates'])
df_2.index = pd.to_datetime(df_2.index)
使用有限数量的数据框,您可以通过这种方式轻松地将它们组织在字典中:
myFrames = {'df_1': df_1,
'df_2': df_2}
现在您有了对数据框的引用,以及您自己定义的名称或键。你会发现更详细的解释here。
这是你如何使用它:
print(myFrames['df_1'])
您还可以使用该引用来更改您的数据框之一,并将其添加到您的字典中:
df_3 = myFrames['df_1']
df_3 = df_3*10
myFrames.update({'df_3': df_3})
print(myFrames)
现在假设您有一大堆数据框,您希望以相同的方式组织这些数据框。您可以列出所有可用数据框的名称,如下所述。但是,您应该知道,出于多种原因使用eval() 通常是not recommended。
不管怎样,我们开始吧:首先你会得到一个包含所有数据框名称like this的字符串列表:
alldfs = [var for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)]
如果您同时有很多事情发生,您很可能不会对所有这些都感兴趣。因此,可以说您所有特别感兴趣的数据框的名称都以“df_”开头。您可以像这样隔离它们:
dfNames = []
for elem in alldfs:
if str(elem)[:3] == 'df_':
dfNames.append(elem)
现在您可以将该列表与eval() 结合使用来制作字典:
myFrames2 = {}
for dfName in dfNames:
myFrames2[dfName] = eval(dfName)
现在您可以遍历该字典并对它们中的每一个执行一些操作。
例如,您可以将每个数据帧的最后一列乘以 10,然后使用这些值创建一个新数据帧:
j = 1
for key in myFrames.keys():
# Build new column names for your brand new df
colName = []
colName.append('column_' + str(j))
if j == 1:
# First, make a new df by referencing the dictionary
df_new = myFrames2[key]
# Subset the last column and make sure it doesn't
# turn into a pandas series instead of a dataframe in the process
df_new = df_new.iloc[:,-1].to_frame()
# Set new column names
df_new.columns = colName[:]
else:
# df_new already exists, so you can add
# new columns and names for the rest of the columns
df_new[colName] = myFrames2[key].iloc[:,-1].to_frame()
j = j + 1
print(df_new)
希望你会发现这很有用!
顺便说一句...对于您的下一个问题,请提供一些可重现的代码以及您自己尝试过的解决方案的几句话。你可以阅读更多关于如何提出一个很好的问题here。
这是一个简单的复制和粘贴的全部内容:
#%%
# Imports
import pandas as pd
import numpy as np
np.random.seed(123)
# Reproducible input - Dataframe 1
rows = 10
df_1 = pd.DataFrame(np.random.randint(90,110,size=(rows, 2)), columns=list('AB'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_1['dates'] = datelist
df_1 = df_1.set_index(['dates'])
df_1.index = pd.to_datetime(df_1.index)
##%%
# Reproducible input - Dataframe 2
rows = 10
df_2 = pd.DataFrame(np.random.randint(10,20,size=(rows, 2)), columns=list('CD'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_2['dates'] = datelist
df_2 = df_2.set_index(['dates'])
df_2.index = pd.to_datetime(df_2.index)
print(df_1)
print(df_2)
##%%
# If you dont have that many dataframes, you can organize them in a dictionary like this:
myFrames = {'df_1': df_1,
'df_2': df_2}
# Now you can reference df_1 in that collecton by using:
print(myFrames['df_1'])
# You can also use that reference to make changes to one of your dataframes,
# and add that to your dictionary
df_3 = myFrames['df_1']
df_3 = df_3*10
myFrames.update({'df_3': df_3})
# And now you have a happy little family of dataframes:
print(myFrames)
##%%
# Now lets say that you have whole bunch of dataframes that you'd like to organize the same way.
# You can make a list of the names of all available dataframes like this:
alldfs = [var for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)]
##%%
# It's likely that you won't be interested in all of them if you've got a lot going on.
# Lets say that all your dataframes of interest start with 'df_'
# You get them like this:
dfNames = []
for elem in alldfs:
if str(elem)[:3] == 'df_':
dfNames.append(elem)
##%%
# Now you can use that list in combination with eval() to make a dictionary:
myFrames2 = {}
for dfName in dfNames:
myFrames2[dfName] = eval(dfName)
##%%
# And now you can reference each dataframe by name in that new dictionary:
myFrames2['df_1']
##%%
#Loop through that dictionary and do something with each of them.
j = 1
for key in myFrames.keys():
# Build new column names for your brand new df
colName = []
colName.append('column_' + str(j))
if j == 1:
# First, make a new df by referencing the dictionary
df_new = myFrames2[key]
# Subset the last column and make sure it doesn't
# turn into a pandas series instead for a dataframe in the process
df_new = df_new.iloc[:,-1].to_frame()
# Set new column names
df_new.columns = colName[:]
else:
# df_new already exists, so you can add
# new columns and names for the rest of the columns
df_new[colName] = myFrames2[key].iloc[:,-1].to_frame()
j = j + 1
print(df_new)