【问题标题】:Neo4j SDN 4 GraphId performance vs IndexNeo4j SDN 4 GraphId 性能与索引
【发布时间】:2017-05-26 19:55:47
【问题描述】:

在我的 Neo4j/SDN 4 应用程序中,我所有的 Cypher 查询都基于内部 Neo4j ID。

这是一个问题,因为我不能在我的 Web 应用程序 URL 中依赖这些 ID。 Neo4j 可以重复使用这些 ID,因此很有可能在未来某个时间在相同 ID 下我们绝对可以找到另一个节点。

我尝试根据以下解决方案重新实现此逻辑:Using the graph to control unique id generation,但发现查询性能下降。

从理论上讲,Cypher 查询是否应该基于 @Index(unique = true, primary = true 的属性)

例如:

@Index(unique = true, primary = true)
private Long uid;

entity.uid = {someId}

与基于内部 Neo4j ID 的 Cypher 查询具有相同的性能:

id(entity) = {someId} 

更新

这是:schema 输出:

Indexes
   ON :BaseEntity(uid) ONLINE
   ON :Characteristic(lowerName) ONLINE
   ON :CharacteristicGroup(lowerName) ONLINE
   ON :Criterion(lowerName) ONLINE
   ON :CriterionGroup(lowerName) ONLINE
   ON :Decision(lowerName) ONLINE
   ON :FlagType(name) ONLINE (for uniqueness constraint)
   ON :HAS_VALUE_ON(value) ONLINE
   ON :HistoryValue(originalValue) ONLINE
   ON :Permission(code) ONLINE (for uniqueness constraint)
   ON :Role(name) ONLINE (for uniqueness constraint)
   ON :User(email) ONLINE (for uniqueness constraint)
   ON :User(username) ONLINE (for uniqueness constraint)
   ON :Value(value) ONLINE

Constraints
   ON ( flagtype:FlagType ) ASSERT flagtype.name IS UNIQUE
   ON ( permission:Permission ) ASSERT permission.code IS UNIQUE
   ON ( role:Role ) ASSERT role.name IS UNIQUE
   ON ( user:User ) ASSERT user.email IS UNIQUE
   ON ( user:User ) ASSERT user.username IS UNIQUE

如你所见,我在:BaseEntity(uid) 上有一个索引

BaseEntity 是我的实体层次结构中的一个基类,例如:

@NodeEntity
public abstract class BaseEntity {

    @GraphId
    private Long id;

    @Index(unique = false)
    private Long uid;

    private Date createDate;

    private Date updateDate;

...

}

@NodeEntity
public class Commentable extends BaseEntity {
...
}

@NodeEntity
public class Decision extends Commentable {

    private String name;

}

当我在寻找 (d:Decision) WHERE d.uid = {uid} 的例子时,会使用这个 uid 索引吗?

PROFILE 结果 - 内部 ID 与索引属性

根据内部ID查询

PROFILE MATCH (parentD)-[:CONTAINS]->(childD:Decision) 
WHERE id(parentD) = 1474333 
MATCH (childD)-[relationshipValueRel1475199:HAS_VALUE_ON]-(filterCharacteristic1475199) 
WHERE id(filterCharacteristic1475199) = 1475199 
WITH relationshipValueRel1475199, childD 
WHERE  ([1, 19][0] <= relationshipValueRel1475199.value <=  [1, 19][1] )  
WITH childD  
MATCH (childD)-[relationshipValueRel1474358:HAS_VALUE_ON]-(filterCharacteristic1474358) 
WHERE id(filterCharacteristic1474358) = 1474358 
WITH relationshipValueRel1474358, childD 
WHERE  (ANY (id IN ['Compact'] WHERE id IN relationshipValueRel1474358.value ))  
WITH childD  
MATCH (childD)-[relationshipValueRel1475193:HAS_VALUE_ON]-(filterCharacteristic1475193) 
WHERE id(filterCharacteristic1475193) = 1475193 
WITH relationshipValueRel1475193, childD 
WHERE  (ANY (id IN ['16:9', '3:2', '4:3', '1:1'] 
WHERE id IN relationshipValueRel1475193.value ))  
WITH childD  
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c) 
WHERE id(c) IN [1474342, 1474343, 1474340, 1474339, 1474336, 1474352, 1474353, 1474350, 1474351, 1474348, 1474346, 1474344] 
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes 
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User)  
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes  
ORDER BY  weight DESC 
SKIP 0 LIMIT 10 
RETURN ru, u, childD AS decision, weight, totalVotes, 
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: id(entity),  types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups, 
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD) | {criterionId: id(c1),  weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria, 
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD)  WHERE NOT ((ch1)<-[:DEPENDS_ON]-())  | {characteristicId: id(ch1),  value: v1.value, totalHistoryValues: toInt(v1.totalHistoryValues), description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics

配置文件输出:

Cypher 版本:CYPHER 3.1,规划器:COST,运行时:INTERPRETED。 238 毫秒内总共 350554 次 db 命中。

基于索引属性uid的查询

PROFILE MATCH (parentD)-[:CONTAINS]->(childD:Decision) 
WHERE parentD.uid = 61 
MATCH (childD)-[relationshipValueRel1475199:HAS_VALUE_ON]-(filterCharacteristic1475199) 
WHERE filterCharacteristic1475199.uid = 15 
WITH relationshipValueRel1475199, childD 
WHERE  ([1, 19][0] <= relationshipValueRel1475199.value <=  [1, 19][1] )  
WITH childD  
MATCH (childD)-[relationshipValueRel1474358:HAS_VALUE_ON]-(filterCharacteristic1474358) 
WHERE filterCharacteristic1474358.uid = 10 
WITH relationshipValueRel1474358, childD 
WHERE  (ANY (id IN ['Compact'] WHERE id IN relationshipValueRel1474358.value ))  
WITH childD  
MATCH (childD)-[relationshipValueRel1475193:HAS_VALUE_ON]-(filterCharacteristic1475193) 
WHERE filterCharacteristic1475193.uid = 14 
WITH relationshipValueRel1475193, childD 
WHERE  (ANY (id IN ['16:9', '3:2', '4:3', '1:1'] 
WHERE id IN relationshipValueRel1475193.value ))  
WITH childD  
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c) 
WHERE c.uid IN [26, 27, 24, 23, 20, 36, 37, 34, 35, 32, 30, 28] 
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes 
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User)  
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes  
ORDER BY  weight DESC 
SKIP 0 LIMIT 10 
RETURN ru, u, childD AS decision, weight, totalVotes, 
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: id(entity),  types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups, 
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD) | {criterionId: id(c1),  weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria, 
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD)  WHERE NOT ((ch1)<-[:DEPENDS_ON]-())  | {characteristicId: id(ch1),  value: v1.value, totalHistoryValues: toInt(v1.totalHistoryValues), description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics

Cypher 版本:CYPHER 3.1,规划器:COST,运行时:INTERPRETED。 671326 总 db hits in 426 ms。

有没有机会根据 uid 提高性能?

【问题讨论】:

    标签: neo4j cypher spring-data-neo4j-4


    【解决方案1】:

    你不要在 web url 中使用 Neo4j 内部 id 是正确的,因为它们可以在节点被删除等后重用。

    从性能的角度来看,内部 id 尽可能快 - 它实际上是带有节点/关系记录的文件中的偏移量(您可能已经注意到这些是 2 个单独的 id 序列,您可以使用 id= 的节点z 和具有相同 id=x 的关系。

    任何索引的使用都必须更慢,因为数据库首先进行索引查找,获取内部 id,然后读取节点记录。

    但是对于绝大多数应用程序性能差异可以忽略不计 - 可能比网络延迟或一般 OGM 开销小得多。

    如果您看到明显的差异

    • 验证数据库中是否存在索引(例如:Neo4j 浏览器中的:schema
    • 打开日志记录并验证您的查询具有正确的标签(为org.neo4j.ogm 设置info 级别)
    • 如果索引存在且查询包含正确的标签,则使用PROFILE检查查询计划

    更新

    是的,索引将用于以下查询:

    MATCH (d:Decision) WHERE d.uid = {uid} ...
    

    应该由

    生成
    session.load(Decision.class, uid)
    

    如果您的索引是主索引或findByUid 上的DecisionRepository

    请注意,当 where 子句出现在查询中间时,可能不会使用索引:

    ...
    WITH x
    MATCH (x)-[...]-(d) WHERE d.uid = {uid} ...
    

    这取决于查询计划,您应该使用PROFILE 进行调查。

    【讨论】:

    • 感谢您的回答。现在,我正在尝试采用一种方法来重构我的系统,以避免 ID 重用问题,我看到了以下架构 - 在我的 web url 中,我将使用代理 uid。如果不需要将 id 放置在 web url 中,我将使用内部 Neo4j id。因此,代理 uuid 将仅用于 web url,否则在客户端的所有其他位置,我将使用内部 Neo4j ID。有意义吗?
    • 有两种通过 id 访问实体的方法可能会使事情变得不必要地复杂化。我只会使用自定义 uuid。正如我所说的索引 快速,内部 id 和索引查找之间的差异将比网络延迟或一般 OGM 开销小一个数量级。
    • 我在单个查询 (stackoverflow.com/questions/43824894/…) 中非常频繁地使用不同的 Id,因此人眼可以注意到基于纯 UID 的方法的性能下降..
    • 我已经更新了我的问题并提供了 :schema 命令和我的 SDN 实体层次结构的输出。你能看一下吗?
    • 我添加了内部 id 与索引 uid 的配置文件信息 - 差异是 238 与 426 毫秒。是否有机会根据 uid 提高性能?
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2015-10-09
    • 1970-01-01
    • 2017-05-24
    • 1970-01-01
    • 1970-01-01
    • 2015-03-13
    • 2017-07-02
    相关资源
    最近更新 更多