【发布时间】:2021-11-06 14:19:14
【问题描述】:
我已将以下文件转换为 pandas df:
https://www.fca.org.uk/publication/data/position-limits-contract-names-vpc.xlsx
我已将相关行(为我自己)转换为字典。 dict 的格式为{principal: [spot, aggregate, set(product codes)]}。我已使用以下代码将其转换为此字典:
ifeu_dict = defaultdict(lambda: [0, 0, set()])
for (_, row) in df.iterrows():
if row.loc["Venue MIC"] == "IFEU":
ifeu_dict[row.loc["Principal Venue Product Code"]][2].add(row.loc["Venue Product Codes"])
if type(row.loc["Spot month single limit#"]) == int:
# no need for append as default is to create a dict
ifeu_dict[row.loc["Principal Venue Product Code"]][0] = row.loc["Spot month single limit#"]
ifeu_dict[row.loc["Principal Venue Product Code"]][1] = row.loc["Other month limit#"]
if type(row.loc["Spot month single limit#"]) == str:
try:
val = int(str(row.loc["Spot month single limit#"]).split()[0].replace(",", ""))
val_2 = int(str(row.loc["Other month limit#"]).split()[0].replace(",", ""))
ifeu_dict[row.loc["Principal Venue Product Code"]][0] = val
ifeu_dict[row.loc["Principal Venue Product Code"]][1] = val_2
except ValueError:
pass
但是,这确实效率低下,所以我一直在尝试改变我创建这本词典的方式。
一种尝试如下:
ifeu_dict_2 = defaultdict(lambda: [0, 0, set()])
ifeu_mask = df["Venue MIC"] == "IFEU"
ifeu_df = df.loc[ifeu_mask]
spot_mask_int = ifeu_df["Spot month single limit#"].apply(type) == int
def spot_transform(x):
try:
return int(str(x).split()[0].replace(",", ""))
except ValueError:
return
ifeu_df["Spot month single limit#"] = ifeu_df.loc[~spot_mask_int, "Spot month single limit#"].apply(spot_transform)
ifeu_df["Other month limit#"] = ifeu_df.loc[~spot_mask_int, "Other month limit#"].apply(spot_transform)
spot_mask_int = ifeu_df["Spot month single limit#"].apply(type) == int
然后尝试:
temp_df = [~spot_mask_int, ["Principal Venue Product Code", "Spot month single limit#", "Other month limit#"]]
ifeu_dict_2[temp_df.loc["Principal Venue Product Code"]][0] = temp_df.loc["Spot month single limit#"]
# this gives me AttributeError: 'list' object has no attribute 'loc'
或:
ifeu_dict_2[ifeu_df.loc[spot_mask_int, "Principal Venue Product Code"]][2].add(ifeu_df.loc["Venue Product Codes"])
ifeu_dict_2[ifeu_df.loc[spot_mask_int, "Principal Venue Product Code"]][0] = ifeu_df.loc[spot_mask_int, "Spot month single limit#"]
ifeu_dict_2[ifeu_df.loc[spot_mask_int, "Principal Venue Product Code"]][1] = ifeu_df.loc[spot_mask_int, "Other month limit#"]
# this gives me TypeError: 'Series' objects are mutable, thus they cannot be hashed
卡了很长时间,不知道如何继续。任何帮助将不胜感激,无论是答案还是有用的链接! (对于链接,我是编码新手,所以示例对我有帮助)。
如果你想玩 df:
Index(['Commodity Derivative Name\n(including associated contracts)',
'Venue MIC', 'Name of Trading Venue', 'Venue Product Codes',
'Principal Venue Product Code', 'Spot month single limit#',
'Other month limit#', 'Conversion Factor', 'Unit of measurement',
'Definition of spot month'],
dtype='object')
API2 Rotterdam Coal Average Price Options (Futures Style Margin),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,RCA,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
Gasoil Diff - Gasoil 50ppm FOB Rotterdam Barges vs Low Sulphur Gasoil 1st Line Future,IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ULH,ULH,2500,2500,nan,Lots,Calendar Month
Marine Fuel 0.5% FOB Rotterdam Barges (Platts) Future,IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,MF3,MF3,2500,2500,nan,Lots,Calendar Month
API2 Rotterdam Coal (supporting Cal 1x Options),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ATC,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
API2 Rotterdam Coal (supporting Qtr 1x Options),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ATQ,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
API2 Rotterdam Coal Cal 1x Options (Futures Style Margin),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ATD,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
API2 Rotterdam Coal Early (122 days) Single Expiry Option (Futures Style Margin),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,RDE,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
API2 Rotterdam Coal Early (214 days) Single Expiry Option (Futures Style Margin),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,RDF,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
API2 Rotterdam Coal Early (305 days) Single Expiry Option (Futures Style Margin),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,RDG,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
API2 Rotterdam Coal Futures,IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ATW,ATW,5,550 (24.9%),38,800 (20.5%),nan,Lots,Calendar Month
API2 Rotterdam Coal Options (Futures Style Margin),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,RCO,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
API2 Rotterdam Coal Qtr 1x Options (Futures Style Margin),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ATH,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
完成的字典中的条目应如下所示:
ATW = [5550, 38800, {'ATH', 'ATC', 'RDF', 'ATQ', 'RCA', 'ATD', 'RCO', 'RDG', 'RDE', 'ATW'}]
【问题讨论】:
-
是什么让你说这是低效的,你需要以不同的方式来做?它是否需要比您的要求更长的时间才能发挥作用?
-
@scign 经理基本上是这么说的......他更喜欢我不使用 interrows,因为它们之间没有依赖关系。就脚本运行所需的时间而言,这样做也确实需要更长的时间。
-
将 df 转换为 numpy 数组并对其进行迭代,您必须准备使用的列的索引
标签: python python-3.x pandas performance dictionary