【问题标题】:Python : build hierarchical JSON from CSVPython:从 CSV 构建分层 JSON
【发布时间】:2020-07-16 10:51:38
【问题描述】:

我想从CSV 构建一个JSON 文件来表示我的数据的层次关系。关系是父母和孩子:一个孩子可以有一个或多个父母,一个父母可以有一个或多个孩子。一个孩子也可以有孩子的价值观,多层次是可能的。我认为D3 中的dendrogram 可能是一个很好的可视化效果。

我的CSV 源文件包含数千行这样的:

parent         | children       | date
---------------------------------------------
830010000C0419 | 830010000C1205 | 1993/09/15
830010000C0947 | 830010000C1205 | 1993/09/15
830010000C0948 | 830010000C1205 | 1993/09/15
830010000B0854 | 830010000B1196 | 1994/03/11
830010000B0854 | 830010000B1197 | 1994/03/11
830010000B0721 | 830010000B1343 | 1988/12/05
830010000B1343 | 830010000B1344 | 1988/12/05
830010000B0721 | 830010000B1345 | 1988/12/05
830010000B1345 | 830010000B1344 | 1986/12/05
...

我不想用这种结构生成JSON 文件:

var treeData = [
  {
    "name": "Root",
    "parent": "null",
    "children": [
      {
        "name": "830010000B0854",
        "parent": "Top Level",
        "children": [
          {
            "name": "830010000B1196",
            "parent": "830010000B0854"
          },
          {
            "name": "830010000B1197",
            "parent": "830010000B0854"
          }
        ]
      },
      {
        "name": "830010000B0721",
        "parent": "Top Level",
        "children": [
          {
            "name": "830010000B1343",
            "parent": "830010000B0721",
            "children": [
                {
                "name": "830010000B1344",
                "parent": "830010000B1343"
                }
            ]
          }
        ]
      },
      {
        "name": "830010000C0419",
        "parent": "Top Level",
        "children": [
          {
            "name": "830010000C1205",
            "parent": "830010000C0419"
          }
        ]
      },
      {
        "name": "830010000C0947",
        "parent": "Top Level",
        "children": [
          {
            "name": "830010000C1205",
            "parent": "830010000C0947"
          }
        ]
      },
      {
        "name": "830010000C0948",
        "parent": "Top Level",
        "children": [
          {
            "name": "830010000C1205",
            "parent": "830010000C0948"
          }
        ]
      }
    ]
  }
];

请注意,在此示例中,我无法建立一个孩子有多个父母的关系,可能需要更复杂的树状图。

如何使用Python 构建这种结构?

【问题讨论】:

  • 嗯,如果一个孩子最多可以有一个父母,那么层次树是有意义的,你可以使用 JSON 示例。但是如果一个孩子可以有多个父母,你将以一个非层次图结束,如果没有重复子树,你的 json 结构将无法使用,并且如果你没有循环......
  • 好的,你知道是否有更好的方法来表示这个吗?我找到了这个 D3 树状图,但也许其他库可能会有所帮助。

标签: python csv d3.js dendrogram


【解决方案1】:

我将首先构建一个节点字典,其中键是节点名称,值是包含父列表和子列表的元组。为了有一种更简单的方法来构建树,我还会保留所有顶级节点的集合(没有父节点)。

从该字典中,可以递归地构建类似 json 的数据,可用于构建真正的 json 字符串。

但正如您所显示的 不是 csv 格式,我使用 re.split 来解析输入:

import re

# First the data
t = '''parent         | children       | date
---------------------------------------------
830010000C0419 | 830010000C1205 | 1993/09/15
830010000C0947 | 830010000C1205 | 1993/09/15
830010000C0948 | 830010000C1205 | 1993/09/15
830010000B0854 | 830010000B1196 | 1994/03/11
830010000B0854 | 830010000B1197 | 1994/03/11
830010000B0721 | 830010000B1343 | 1988/12/05
830010000B1343 | 830010000B1344 | 1988/12/05
'''

rx = re.compile(r'\s*\|\s*')

# nodes is a dictionary of nodes, nodes[None] is the set of top-level names
nodes = {None: set()}
with io.StringIO(t) as fd:
    _ = next(fd)              # skip initial lines
    _ = next(fd)
    for linenum, line in enumerate(fd, 1):
        p, c = rx.split(line.strip())[:2]   # parse a line
        if p == c:            # a node cannot be its parent
            raise ValueError(f'Same node as parent and child {p} at line {linenum}')
        # process the nodes
        if c not in nodes:
            nodes[c] = ([], [])
        elif c in nodes[None]:
            nodes[None].remove(c)
        if p not in nodes:
            nodes[p] = ([], [c])
            nodes[None].add(p)
        else:
            nodes[p][1].append(c)
        nodes[c][0].append(p)


def subtree(node, nodes, parent=None, seen = None):
    """Builds a dict with the subtree of a node.
        node is a node name, nodes the dict, parent is the parent name,
        seen is a list of all previously seen node to prevent cycles
    """
    if seen is None:
        seen = [node]
    elif node in seen:    # special processing to break possible cycles
        return {'name': node, 'parent': parent, 'children': '...'}
    else:
        seen.append(node)
    return {'name': node, 'parent': parent, 'children':
            [subtree(c, nodes, node, seen) for c in nodes[node][1]]}

# We can now build the json data
js = {node: subtree(node, nodes) for node in nodes[None]}

pprint.pprint(js)

它给出:

{'830010000B0721': {'children': [{'children': [{'children': [],
                                                'name': '830010000B1344',
                                                'parent': '830010000B1343'}],
                                  'name': '830010000B1343',
                                  'parent': '830010000B0721'}],
                    'name': '830010000B0721',
                    'parent': None},
 '830010000B0854': {'children': [{'children': [],
                                  'name': '830010000B1196',
                                  'parent': '830010000B0854'},
                                 {'children': [],
                                  'name': '830010000B1197',
                                  'parent': '830010000B0854'}],
                    'name': '830010000B0854',
                    'parent': None},
 '830010000C0419': {'children': [{'children': [],
                                  'name': '830010000C1205',
                                  'parent': '830010000C0419'}],
                    'name': '830010000C0419',
                    'parent': None},
 '830010000C0947': {'children': [{'children': [],
                                  'name': '830010000C1205',
                                  'parent': '830010000C0947'}],
                    'name': '830010000C0947',
                    'parent': None},
 '830010000C0948': {'children': [{'children': [],
                                  'name': '830010000C1205',
                                  'parent': '830010000C0948'}],
                    'name': '830010000C0948',
                    'parent': None}}

【讨论】:

  • 感谢这个例子。为什么一个节点键可以是如何子的?以 830010000B0721 为例? (第一个)
  • @GeoGyro 我使用 pprint 来获得更好的格式。 parent=830010000B0721 不在 830010000B0721 节点中,而是在其子节点中,即名为 830010000B1343 的节点。
【解决方案2】:

我在下面首先想到了这一点。请注意,这还没有完成,您需要添加某种形式的递归/迭代以深入到子节点,但我认为逻辑应该非常相似。

all_parents = df.parent

def get_children(parent_name):
    children = [child for child in df[df.parent == parent_name].children]
    return [{"name": name, "parent": parent_name} for name in children]

def get_node_representation(parent_name):
    if parent_name in all_parents:
        parent = "Top Level"
    else:
        # Your logic here
        parent = "null"
    return {"name": parent_name, "parent": parent, "children": get_children(parent_name)}

# this assumes all children are also parent which is not necessarily true of course, so you want to create some kind of recursion/iteration on calling node_representation on the children nodes
all_nodes = [get_node_representation(node) for node in df.parent]

【讨论】:

  • 这里的df 是什么?
  • 对不起。它是您提供的 CSV 文件的 pandas 数据框。这只是处理 CSV 数据的一种更方便的方式。您可以简单地使用read_csv() 创建它
【解决方案3】:

我发现了这个method,它允许父母和孩子之间存在多重关系。

这是一个包含我的数据的演示:

var width = 800,
    height = 800,
    boxWidth = 150,
    boxHeight = 20,
    gap = {
        width: 150,
        height: 12
    },
    margin = {
        top: 16,
        right: 16,
        bottom: 16,
        left: 16
    },
    svg;
    
var data = {
    "Nodes": [
    
    // Level 0
    {
            "lvl": 0,
            "name": "830010000C0419"
        },
        {
            "lvl": 0,
            "name": "830010000C0947"
        },
        {
            "lvl": 0,
            "name": "830010000C0948"
        },
        {
            "lvl": 0,
            "name": "830010000B0854"
        },
        {
            "lvl": 0,
            "name": "830010000B0721"
        },
        
    // Level 1
        
        {
            "lvl": 1,
            "name": "830010000C1205"
        },
        {
            "lvl": 1,
            "name": "830010000B1196"
        },
        {
            "lvl": 1,
            "name": "830010000B1197"
        },
        {
            "lvl": 1,
            "name": "830010000B1343"
        },
        {
            "lvl": 1,
            "name": "830010000B1345"
        },
        
    // Level 2
        {
            "lvl": 2,
            "name": "830010000B1344"
        }
        
    ],
    "links": [
        {
            "source": "830010000C0419",
            "target": "830010000C1205"
        },
        {
            "source": "830010000C0947",
            "target": "830010000C1205"
        },
        {
            "source": "830010000C0948",
            "target": "830010000C1205"
        },
        {
            "source": "830010000B0854",
            "target": "830010000B1196"
        },
        {
            "source": "830010000B0854",
            "target": "830010000B1197"
        },
        {
            "source": "830010000B0721",
            "target": "830010000B1343"
        },
        {
            "source": "830010000B1343",
            "target": "830010000B1344"
        },
        {
            "source": "830010000B0721",
            "target": "830010000B1345"
        },      
        {
        
            "source": "830010000B1345",
            "target": "830010000B1344"
        }
    ]
};

// test layout
var Nodes = [];
var links = [];
var lvlCount = 0;

var diagonal = d3.svg.diagonal()
    .projection(function (d) {
        "use strict";
        return [d.y, d.x];
    });

function find(text) {
    "use strict";
    var i;
    for (i = 0; i < Nodes.length; i += 1) {
        if (Nodes[i].name === text) {
            return Nodes[i];
        }
    }
    return null;
}

function mouse_action(val, stat, direction) {
    "use strict";
    d3.select("#" + val.id).classed("active", stat);
    
    links.forEach(function (d) {
        if (direction == "root") {
            if (d.source.id === val.id) {
                d3.select("#" + d.id).classed("activelink", stat); // change link color
                d3.select("#" + d.id).classed("link", !stat); // change link color
                if (d.target.lvl < val.lvl)
                    mouse_action(d.target, stat, "left");
                else if (d.target.lvl > val.lvl)
                    mouse_action(d.target, stat, "right");
            }
            if (d.target.id === val.id) {
                d3.select("#" + d.id).classed("activelink", stat); // change link color
                d3.select("#" + d.id).classed("link", !stat); // change link color
                if (direction == "root") {
                    if(d.source.lvl < val.lvl)
                        mouse_action(d.source, stat, "left");
                    else if (d.source.lvl > val.lvl)
                        mouse_action(d.source, stat, "right");
                }
            }
        }else if (direction == "left") {
            if (d.source.id === val.id && d.target.lvl < val.lvl) {
                d3.select("#" + d.id).classed("activelink", stat); // change link color
                d3.select("#" + d.id).classed("link", !stat); // change link color

                mouse_action(d.target, stat, direction);
            }
            if (d.target.id === val.id && d.source.lvl < val.lvl) {
                d3.select("#" + d.id).classed("activelink", stat); // change link color
                d3.select("#" + d.id).classed("link", !stat); // change link color
                mouse_action(d.source, stat, direction);
            }
        }else if (direction == "right") {
            if (d.source.id === val.id && d.target.lvl > val.lvl) {
                d3.select("#" + d.id).classed("activelink", stat); // change link color
                d3.select("#" + d.id).classed("link", !stat); // change link color
                mouse_action(d.target, stat, direction);
            }
            if (d.target.id === val.id && d.source.lvl > val.lvl) {
                d3.select("#" + d.id).classed("activelink", stat); // change link color
                d3.select("#" + d.id).classed("link", !stat); // change link color
                mouse_action(d.source, stat, direction);
            }
        }
    });
}

function unvisite_links() {
    "use strict";
    links.forEach(function (d) {
        d.visited = false;
    });
}

function renderRelationshipGraph(data) {
    "use strict";
    var count = [];

    data.Nodes.forEach(function (d) {
        count[d.lvl] = 0;
    });
    lvlCount = count.length;

    data.Nodes.forEach(function (d, i) {
        d.x = margin.left + d.lvl * (boxWidth + gap.width);
        d.y = margin.top + (boxHeight + gap.height) * count[d.lvl];
        d.id = "n" + i;
        count[d.lvl] += 1;
        Nodes.push(d);
    });

    data.links.forEach(function (d) {
        links.push({
            source: find(d.source),
            target: find(d.target),
            id: "l" + find(d.source).id + find(d.target).id
        });
    });
    unvisite_links();

    svg.append("g")
        .attr("class", "nodes");

    var node = svg.select(".nodes")
        .selectAll("g")
        .data(Nodes)
        .enter()
        .append("g")
        .attr("class", "unit");

    node.append("rect")
        .attr("x", function (d) { return d.x; })
        .attr("y", function (d) { return d.y; })
        .attr("id", function (d) { return d.id; })
        .attr("width", boxWidth)
        .attr("height", boxHeight)
        .attr("class", "node")
        .attr("rx", 6)
        .attr("ry", 6)
        .on("mouseover", function () {
            mouse_action(d3.select(this).datum(), true, "root");
            unvisite_links();
        })
        .on("mouseout", function () {
            mouse_action(d3.select(this).datum(), false, "root");
            unvisite_links();
        });

    node.append("text")
        .attr("class", "label")
        .attr("x", function (d) { return d.x + 14; })
        .attr("y", function (d) { return d.y + 15; })
        .text(function (d) { return d.name; });

    links.forEach(function (li) {
        svg.append("path", "g")
            .attr("class", "link")
            .attr("id", li.id)
            .attr("d", function () {
                var oTarget = {
                    x: li.target.y + 0.5 * boxHeight,
                    y: li.target.x
                };
                var oSource = {
                    x: li.source.y + 0.5 * boxHeight,
                    y: li.source.x
                };
                
                if (oSource.y < oTarget.y) {
                    oSource.y += boxWidth;
                } else {
                    oTarget.y += boxWidth;
                }
                return diagonal({
                    source: oSource,
                    target: oTarget
                });
            });
    });
}

svg = d3.select("#tree").append("svg")
    .attr("width", width)
    .attr("height", height)
    .append("g");
    
    renderRelationshipGraph(data);
rect {
  fill: #CCC;
  cursor: pointer;
}
.active {
  fill: orange;
  stroke: orange;
}
.activelink {
  fill: none;
  stroke: orange;
  stroke-width: 2.5px;
}
.label {
  fill: white;
  font-family: sans-serif;
  pointer-events: none;
}
.link {
  fill: none;
  stroke: #ccc;
  stroke-width: 2.5px;
}
<script src="https://d3js.org/d3.v3.min.js"></script>
<div id="tree"></div>

我需要知道一个脚本来生成节点和链接结构

【讨论】:

    猜你喜欢
    • 2016-09-12
    • 2015-11-14
    • 2013-08-04
    • 1970-01-01
    • 2021-04-17
    • 2017-06-05
    • 2020-10-26
    • 1970-01-01
    相关资源
    最近更新 更多