通过遍历数据来连接元素答案

【问题标题】：Join elements by iterating through the data通过遍历数据来连接元素
【发布时间】：2020-06-23 14:28:17
【问题描述】：

我在表格中有一些数据：

ID A B VALUE         EXPECTED RESULT
1  1 2 5               GROUP1
2  2 3 5               GROUP1
3  3 4 6               GROUP2
4  3 5 5               GROUP1
5  6 4 5               GROUP3

我想要做的是遍历数据（数千行）并创建一个公共字段，这样我就可以轻松地加入数据（*A-> start Node, B->End Node Value-> Order ...数据形成了一条链，其中只有邻居共享一个共同的 A 或 B）

加入规则：

组中所有元素的值相等
元素一的 A 等于元素二的 B（或相反但不是 A=A' 或 B=B'）
最困难的一个：将形成一系列相交节点的所有顺序数据分配给同一组。

即第一个元素 [1 1 2 5] 必须与 [2 2 3 5] 连接，然后与 [4 3 5 5] 连接

知道在迭代大量数据时如何稳健地完成此任务吗？我对第 3 条规则有疑问，其他规则很容易应用。对于有限的数据，我取得了一些成功，但这取决于我开始检查数据的顺序。这不适用于大型数据集。我可以使用 arcpy（最好）甚至 Python 或 R 或 Matlab 来解决这个问题。 已尝试 arcpy 没有成功，所以我正在检查替代方案。

在 ArcPy 中，此代码工作正常，但扩展有限（即在具有许多段的大型特征中，我得到 3-4 个组而不是 1 个）：

TheShapefile="c:/Temp/temp.shp"
desc = arcpy.Describe(TheShapefile)
flds = desc.fields
fldin = 'no'
for fld in flds:        #Check if new field exists
    if fld.name == 'new':
        fldin = 'yes'
if fldin!='yes':                    #If not create
    arcpy.AddField_management(TheShapefile, "new", "SHORT")
arcpy.CalculateField_management(TheShapefile,"new",'!FID!', "PYTHON_9.3")  # Copy FID to new
with arcpy.da.SearchCursor(TheShapefile, ["FID","NODE_A","NODE_B","ORDER_","new"]) as TheSearch:
    for SearchRow in TheSearch:
        if SearchRow[1]==SearchRow[4]:
            Outer_FID=SearchRow[0]
        else:
            Outer_FID=SearchRow[4]
        Outer_NODEA=SearchRow[1]
        Outer_NODEB=SearchRow[2]
        Outer_ORDER=SearchRow[3]
        Outer_NEW=SearchRow[4]
        with arcpy.da.UpdateCursor(TheShapefile, ["FID","NODE_A","NODE_B","ORDER_","new"]) as TheUpdate:
                    for UpdateRow in TheUpdate:
                        Inner_FID=UpdateRow[0]
                        Inner_NODEA=UpdateRow[1]
                        Inner_NODEB=UpdateRow[2]
                        Inner_ORDER=UpdateRow[3]
                        if Inner_ORDER==Outer_ORDER and (Inner_NODEA==Outer_NODEB or Inner_NODEB==Outer_NODEA):
                            UpdateRow[4]=Outer_FID
                            TheUpdate.updateRow(UpdateRow)

还有shapefile form and dbf form中的一些数据

【问题讨论】：

R、python 还是 matlab？？
最好是 Python .. 但是我可以使用其他语言来解决这个问题。我只需要解决它。（是的，我再次询问了 Python 并在等待数天后没有回复，所以也许在 Python 中很难实现？）
请向我们展示您到目前为止所做的尝试。请以某种结构提供数据，而不是人类可读的表格 - 我们应该如何知道您使用什么来存储它？
我不认为它值得DV，但我无法想象一个简单快速的方法。也许您应该诚实地尝试证明您之前的研究，并帮助其他人了解程序的第一个元素。
这似乎是某种形式的图形问题。但是，我不明白为什么节点 5 没有连接到组 1，因为它的值为 5 并指向节点 3，已经在组中。这是来自订购吗？是否不允许“备份”到之前的节点？

标签： python r matlab gis

【解决方案1】：

使用matlab：

A = [1 1 2 5               
     2 2 3 5               
     3 3 4 6               
     4 3 5 5
     5 6 4 5]               

%% Initialization
% index of matrix line sharing the same group     
ind = 1 
% length of the index
len = length(ind)
% the group array
g   = []
% group counter
c   = 1

% Start the small algorithm
while 1
    % Check if another line with the same "Value" share some common node
    ind = find(any(ismember(A(:,2:3),A(ind,2:3)) & A(:,4) == A(ind(end),4),2));

    % If there is no new line, we create a group with the discovered line
    if length(ind) == len
        %group assignment
        g(A(ind,1)) = c
        c = c+1
        % delete the already discovered line (or node...)
        A(ind,:) = []
        % break if no more node
        if isempty(A)
            break
        end
        % reset the index for the next group
        ind = 1;
    end
    len = length(ind);
end

这是输出：

g =

   1   1   2   1   3

如预期的那样

【讨论】：

谢谢！我现在正在完整数据集上运行代码！手指交叉它会工作的！
测试过了！花了一些时间，但一切都像魅力一样！非常感谢！
欢迎您！是的，我还没有真正优化过这个算法。如果您很好奇并且如果您的所有节点都有几何，您还可以创建多点几何特征（每个都将包含 2 个节点）并使用搜索距离 = 0 的 DBSCAN 算法来创建您的集群.在arcpy 中有这些算法的实现，这个算法速度惊人。