【问题标题】:Loading data to neo4j from XML using py2neo使用 py2neo 从 XML 将数据加载到 neo4j
【发布时间】:2016-05-04 08:38:14
【问题描述】:

我正在尝试使用 py2neo 从 xml 文件将数据加载到 neo4j db

这个 python 脚本工作正常,但它太慢了,因为我先添加节点,然后添加两个异常处理程序的关系。除此之外,XML 文件大小约为 200MB。

我想知道是否有更快的方法来执行此任务?

XML 文件:

<Persons>
    <person>
        <id>XA123</id>
        <first_name>Adam</first_name>
        <last_name>John</last_name>
        <phone>01-12322222</phone>
    </person>
    <person>
        <id>XA7777</id>
        <first_name>Anna</first_name>
        <last_name>Watson</last_name>
        <relationship>
            <type>Friends</type>
            <to>XA123</to>
        </relationship>
    </person>
</Persons>

python 脚本:

#!/usr/bin/python3

from xml.dom import minidom
from py2neo import Graph, Node, Relationship, authenticate


graph = Graph("http://localhost:7474/db/data/")
authenticate("localhost:7474", "neo4j", "admin")

xml_file = open("data.xml")
xml_doc = minidom.parse(xml_file)
persons = xml_doc.getElementsByTagName('person')

# Adding Nodes
for person in persons:
    ID_ = person.getElementsByTagName('id')[0].firstChild.data
    fName = person.getElementsByTagName('first_name')[0].firstChild.data
    lName = person.getElementsByTagName('last_name')[0].firstChild.data

    # not every person has phone number
    try:
        phone = person.getElementsByTagName('phone')[0].firstChild.data
    except IndexError:
        phone = "None"

    label = "Person"
    node = Node(label, ID=ID_, LastName=fName, FirstName=lName, Phone=phone)
    graph.create(node)


# Adding Relationships
for person in persons:
    ID_ = person.getElementsByTagName('id')[0].firstChild.data

    label = "Person"
    node1 = graph.find_one(label, property_key="ID", property_value=ID_)

    # relationships
    try:
        has_relations = person.getElementsByTagName('relationship')
        for relation in has_relations:
            node2 = graph.find_one(label,
                                   property_key="ID",
                                   property_value=relation.getElementsByTagName('to')[0].firstChild.data)

            relationship = Relationship(node1,
                                        relation.getElementsByTagName('type')[0].firstChild.data, node2)
            graph.create(relationship)
    except IndexError:
        continue

【问题讨论】:

    标签: python xml neo4j py2neo


    【解决方案1】:

    通过对特定标签使用独特的属性约束,将数据加载到 neo4j 所需的时间显着减少。

    graph.cypher.execute("CREATE CONSTRAINT ON (n:Person) ASSERT n.ID IS UNIQUE")
    

    【讨论】:

      猜你喜欢
      • 2015-06-03
      • 1970-01-01
      • 2015-04-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-12-30
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多