【问题标题】:Save DOM tree into a graph database: Connect related nodes将 DOM 树保存到图形数据库中:连接相关节点
【发布时间】:2021-12-31 17:40:12
【问题描述】:

我正在将由 DOM 树构成的分层数据插入到图形数据库中,但是我无法获得父 ID,这是在子 ID 与其父 ID 之间创建关系所需的。

下面的代码说明了遍历 DOM 节点、插入标签并获取最后插入的 id。我需要插入并获取孩子和父母的 id 才能创建它们的关系。

from lxml import HTML
import age  # from AgensGraph
from age.gen.ageParser import *

GRAPH_NAME = "demo_graph"
DSN = "host=localhost port=5432 dbname=demodb user=userdemo 
password=demo234"

ag = age.connect(graph=GRAPH_NAME, dsn=DSN)
tree = html.parse("demo.html")
for element in tree.getiterator():
    if parent := element.getparent():        
        parent = None
        cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))        
        b = [x[0].id for x in cursor]  # get last inserted ID 
        print(b[0])        
        ag.execCypher("MATCH (c:node), (p:node) WHERE c.id = %s AND p.id = %s CREATE (a)-[r:connects}]->(b)") # Match child node 'c', parent node: p and join  C Connects P (P is unknown)

这里是演示文件:demo.html

<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8"/>
    <title>Document</title>
  </head>
  <body>
    <ul class="menu">
      <div class="itm">home</div>
      <div class="itm">About us</div>
      <div class="itm">Contact us</div>
    </ul>
    <div id="idone" class="classone">
      <li class="item1">First</li>
      <li class="item2">Second</li>
      <li class="item3">Third</li>
      <div id="innerone"><h1>This Title</h1></div>
      <div id="innertwo"><h2>Subheads</h2></div>      
    </div>
    <div id="second" class="below">
      <div class="inner">
        <h1>welcome</h1>
        <h1>another</h1>
        <h2>third</h2>
      </div>
    </div>
  </body>
</html>

这是提取的 DOM 树:

tag: head attrib: None parent: html
tag: meta attrib: ('charset', 'UTF-8') parent: head
tag: title attrib: None parent: head
tag: body attrib: None parent: html
tag: h1 attrib: None parent: div
tag: h1 attrib: None parent: div
tag: h2 attrib: None parent: div
/tmp/ipykernel_27254/2858024143.py:4: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  if parent := element.getparent():

【问题讨论】:

    标签: python cypher lxml hierarchical-data agens-graph


    【解决方案1】:

    执行 CREATE 语句在提交会话后生效。 您应该在 execCypher(...)

    之后commit()
    cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))        
    b = [x[0].id for x in cursor]
    ag.commit()
    

    试试下面的代码:

    ag = age.connect(graph=GRAPH_NAME, dsn=DSN)
    tree = html.parse("demo.html")
    for element in tree.getiterator():
        if parent := element.getparent():        
            parent = None
            cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))        
            b = [x[0].id for x in cursor]  # get last inserted ID 
            ag.commit()
            print(b[0])        
            ag.execCypher("MATCH (c:node), (p:node) WHERE c.id = %s AND p.id = %s CREATE (a)-[r:connects}]->(b)") # Match child node 'c', parent node: p and join  C Connects P (P is unknown)
    

    【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2018-10-01
    • 1970-01-01
    • 1970-01-01
    • 2016-08-30
    • 1970-01-01
    • 1970-01-01
    • 2021-06-20
    • 2019-12-20
    相关资源
    最近更新 更多