【发布时间】:2020-06-03 17:35:52
【问题描述】:
我遇到了以下情况:
我的输入数据包含几个具有前任和后继 id 的元素 - 但不是它自己的 id。
如下表所示
+----------+-------------+-------------+
| Element | NextBlockID | PrevBlockID |
+----------+-------------+-------------+
| Block623 | c308002017 | 2a08003017 |
+----------+-------------+-------------+
| Block625 | 0 | c308002017 |
+----------+-------------+-------------+
| Block622 | 2808003017 | c208002017 |
+----------+-------------+-------------+
| Block620 | c208002017 | 0 |
+----------+-------------+-------------+
| Block621 | 2a08003017 | be08003017 |
+----------+-------------+-------------+
| Block624 | 2908002017 | 2808003017 |
+----------+-------------+-------------+
现在我想确定每个元素的元素 id 并将其添加为新列。
我现在正在做的是确定 PrevBlockID 为零的第一个元素。然后查找我的第一个元素的 NextBlockID 与另一个元素的 PrevBlockID 匹配的位置,并将其添加到具有 Block622、Block624、... 的所有渴望元素 ID 的列表中,直到不再匹配 NextBlockID。
然后我查看 NextBlockID = 0 的元素。这是最后一个元素。如果它的值 PrevBlockID 与 NextBlockID 匹配,我会一个一个地获取 uneager 元素 ID(Block623、Block621)
所以之后我想要一个这样的输出表
+----------+-------------+-------------+------------+
| Element | NextBlockID | PrevBlockID | ElementID |
+----------+-------------+-------------+------------+
| Block623 | c308002017 | 2a08003017 | 2808003017 |
+----------+-------------+-------------+------------+
| Block625 | 0 | c308002017 | 2908002017 |
+----------+-------------+-------------+------------+
| Block622 | 2808003017 | c208002017 | 2a08003017 |
+----------+-------------+-------------+------------+
| Block620 | c208002017 | 0 | be08003017 |
+----------+-------------+-------------+------------+
| Block621 | 2a08003017 | be08003017 | c208002017 |
+----------+-------------+-------------+------------+
| Block624 | 2908002017 | 2808003017 | c308002017 |
+----------+-------------+-------------+------------+
我的输入数据存储在熊猫数据框中。是否有任何更智能/更快的解决方案,然后逐个迭代这些值?
2020 年 6 月 14 日下午 2:07 更新: 抱歉,这里的混淆是我到目前为止得到的代码:
import pandas as pd
f1 = 'DF_DetermineBlockID.csv'
df = pd.read_csv(f1, sep=';')
Ids = pd.Series([], dtype=object)
df = df.sort_values("PrevBlock") # sorted to get 0 value in first pos
df.index = pd.RangeIndex(len(df.index))
successor = df[df.index == 0].squeeze()["NextBlock"]
Ids = Ids.append(pd.Series(successor, index=[1]))
a = 1
PrevBlockFound = not df[df["PrevBlock"] == successor].empty
while PrevBlockFound:
a += 2
successor = df[df["PrevBlock"] == successor].squeeze()["NextBlock"]
Ids = Ids.append(pd.Series(successor, index=[a]))
PrevBlockFound = not df[df["PrevBlock"] == successor].empty
predecessor = df[df["NextBlock"] == "0"].squeeze()["PrevBlock"]
a -= 1
Ids = Ids.append(pd.Series(predecessor, index=[a]))
NextBlockFound = not df[df["NextBlock"] == predecessor].empty
while NextBlockFound:
a -= 2
predecessor = df[df["NextBlock"] == predecessor].squeeze()["PrevBlock"]
Ids = Ids.append(pd.Series(predecessor, index=[a]))
NextBlockFound = not df[df["NextBlock"] == predecessor].empty
df = pd.merge(df, Ids.rename('BlockID'), left_index=True, right_index=True)
【问题讨论】:
标签: python pandas list dataframe