【发布时间】:2022-01-18 07:41:57
【问题描述】:
我正在通过功能工具处理数据,并在检查结果后。我发现 count distinct day() 可能比 count() 更好。但我是 ft 新手,找不到获得这些功能的方法。有什么想法吗?
a = pd.DataFrame({'ID1':['A01','A01','A02','A02','A02'],'ID2':['B02','B03','B04','B05','B06'],'f1':[1,1,2,2,2],'f2':[9,1,2,3,4],'f3':['click','end',"start",'click','end'],
'mytime':pd.to_datetime(['2021-01-20 14:44:00','2021-01-18 12:30:04','2021-01-13 11:33:31','2021-01-15 18:31:19','2021-01-19 21:09:32'])})
es = ft.EntitySet(id = 'test1')
es.entity_from_dataframe(entity_id = 'a',
dataframe = a,
index = 'ID2',
time_index='mytime')
es.normalize_entity(base_entity_id='a',
new_entity_id='b',
index = 'ID1',
additional_variables = ['f1'])
feature_matrix, feature_names = ft.dfs(entityset=es,
target_entity = 'b',
max_depth = 6,
verbose = 1,
n_jobs = -1,
chunk_size = 100,
agg_primitives=['count'],
trans_primitives=['day'])
feature_matrix
【问题讨论】: