当只知道前任和后继 id 时确定元素 id 的巧妙方法 - Python答案

【问题标题】：Neat way to determine element id when only predecessor and successor ids are known - Python当只知道前任和后继 id 时确定元素 id 的巧妙方法 - Python
【发布时间】：2020-06-03 17:35:52
【问题描述】：

我遇到了以下情况：
我的输入数据包含几个具有前任和后继 id 的元素 - 但不是它自己的 id。
如下表所示

+----------+-------------+-------------+
| Element  | NextBlockID | PrevBlockID |
+----------+-------------+-------------+
| Block623 | c308002017  | 2a08003017  |
+----------+-------------+-------------+
| Block625 | 0           | c308002017  |
+----------+-------------+-------------+
| Block622 | 2808003017  | c208002017  |
+----------+-------------+-------------+
| Block620 | c208002017  | 0           |
+----------+-------------+-------------+
| Block621 | 2a08003017  | be08003017  |
+----------+-------------+-------------+
| Block624 | 2908002017  | 2808003017  |
+----------+-------------+-------------+

现在我想确定每个元素的元素 id 并将其添加为新列。
我现在正在做的是确定 PrevBlockID 为零的第一个元素。然后查找我的第一个元素的 NextBlockID 与另一个元素的 PrevBlockID 匹配的位置，并将其添加到具有 Block622、Block624、... 的所有渴望元素 ID 的列表中，直到不再匹配 NextBlockID。
然后我查看 NextBlockID = 0 的元素。这是最后一个元素。如果它的值 PrevBlockID 与 NextBlockID 匹配，我会一个一个地获取 uneager 元素 ID（Block623、Block621）

所以之后我想要一个这样的输出表

+----------+-------------+-------------+------------+
| Element  | NextBlockID | PrevBlockID | ElementID  |
+----------+-------------+-------------+------------+
| Block623 | c308002017  | 2a08003017  | 2808003017 |
+----------+-------------+-------------+------------+
| Block625 | 0           | c308002017  | 2908002017 |
+----------+-------------+-------------+------------+
| Block622 | 2808003017  | c208002017  | 2a08003017 |
+----------+-------------+-------------+------------+
| Block620 | c208002017  | 0           | be08003017 |
+----------+-------------+-------------+------------+
| Block621 | 2a08003017  | be08003017  | c208002017 |
+----------+-------------+-------------+------------+
| Block624 | 2908002017  | 2808003017  | c308002017 |
+----------+-------------+-------------+------------+

我的输入数据存储在熊猫数据框中。是否有任何更智能/更快的解决方案，然后逐个迭代这些值？

2020 年 6 月 14 日下午 2:07 更新：抱歉，这里的混淆是我到目前为止得到的代码：

import pandas as pd
f1 = 'DF_DetermineBlockID.csv'
df = pd.read_csv(f1, sep=';')
Ids = pd.Series([], dtype=object)
df = df.sort_values("PrevBlock")  # sorted to get 0 value in first pos
df.index = pd.RangeIndex(len(df.index))

successor = df[df.index == 0].squeeze()["NextBlock"]
Ids = Ids.append(pd.Series(successor, index=[1]))
a = 1
PrevBlockFound = not df[df["PrevBlock"] == successor].empty

while PrevBlockFound:
    a += 2
    successor = df[df["PrevBlock"] == successor].squeeze()["NextBlock"]
    Ids = Ids.append(pd.Series(successor, index=[a]))
    PrevBlockFound = not df[df["PrevBlock"] == successor].empty

predecessor = df[df["NextBlock"] == "0"].squeeze()["PrevBlock"]
a -= 1
Ids = Ids.append(pd.Series(predecessor, index=[a]))
NextBlockFound = not df[df["NextBlock"] == predecessor].empty

while NextBlockFound:
    a -= 2
    predecessor = df[df["NextBlock"] == predecessor].squeeze()["PrevBlock"]
    Ids = Ids.append(pd.Series(predecessor, index=[a]))
    NextBlockFound = not df[df["NextBlock"] == predecessor].empty

df = pd.merge(df, Ids.rename('BlockID'), left_index=True, right_index=True)

【问题讨论】：

标签： python pandas list dataframe

【解决方案1】：

找出您正在寻找的逻辑有点困难 - 但这段代码会产生您期望的输出：

df = df.sort_values("Element")
df["ElementID"] = df.PrevBlockID.shift(-1)
df.loc[df.ElementID.isna(), "ElementID"] = df.NextBlockID.shift()
df.sort_index(inplace = True)

输出是：

    Element NextBlockID PrevBlockID   ElementID
0  Block623  c308002017  2a08003017  2808003017
1  Block625           0  c308002017  2908002017
2  Block622  2808003017  c208002017  2a08003017
3  Block620  c208002017           0  be08003017
4  Block621  2a08003017  be08003017  c208002017
5  Block624  2908002017  2808003017  c308002017

【讨论】：

这是否符合您的要求？
您好，感谢您的意见。 shift 命令在这里非常方便。这就是我正在寻找的东西。它可以工作并创建我想要的输出，但不幸的是我不能依赖 Element 列的顺序正确。其中可能有任何空闲字符串。我的数据摘录可能是我的一个坏例子。没有元素列有没有办法做到这一点？
如果你迭代事物，你必须有一些顺序。你想使用什么顺序？换句话说，你在帖子标题中提到的'前任和继任者'是什么？更一般地说，如果您能解释您正在尝试做的事情的逻辑，那就太好了。
嗨 Roy，我编辑了我的初始帖子并将我的代码放在那里。之后我不想要任何特定的订单。我得到的是外部数据，其中包含有关彼此链接的块的信息。所以每个块都跟随另一个块，有一个前任（PrevBlockID）和后继（NextBlockID）。块本身可以有任何名称。我缺少的信息是块 ID（示例中的我的列 ElementID）。我正在寻找的是一种获取 ElementID 的方法。我找到了一个，但在我看来它似乎效率不高......考虑到我稍后将在数据框中有大约 10 万行
ID（elementID、nextBlockID 或 prevBlockID）是否按词典编纂顺序排列？那就是——这对他们有意义吗？或者，它只是一个没有名字意义的双链表？