在 Neo4j 中，可以找到关系是另一个节点关系超集的所有节点吗？答案

【问题标题】：In Neo4j, can one find all nodes whose relationships are a superset of another node's relationships?在 Neo4j 中，可以找到关系是另一个节点关系超集的所有节点吗？
【发布时间】：2017-03-01 06:11:50
【问题描述】：

给定以下人为的数据库：

CREATE (a:Content {id:'A'}),
  (b:Content {id:'B'}),
  (c:Content {id:'C'}),
  (d:Content {id:'D'}),
  (ab:Container {id:'AB'}),
  (ab2:Container {id:'AB2'}),
  (abc:Container {id:'ABC'}),
  (abcd:Container {id:'ABCD'}),
  ((ab)-[:CONTAINS]->(a)),
  ((ab)-[:CONTAINS]->(b)),
  ((ab2)-[:CONTAINS]->(a)),
  ((ab2)-[:CONTAINS]->(b)),
  ((abc)-[:CONTAINS]->(a)),
  ((abc)-[:CONTAINS]->(b)),
  ((abc)-[:CONTAINS]->(c)),
  ((abcd)-[:CONTAINS]->(a)),
  ((abcd)-[:CONTAINS]->(b)),
  ((abcd)-[:CONTAINS]->(c)),
  ((abcd)-[:CONTAINS]->(d))

是否有一个查询可以检测所有Container 节点对，其中一个CONTAINS 是Content 节点的超集或与另一个Container 节点相同？

对于我的示例数据库，我希望查询返回：

(ABCD) is a superset of (ABC), (AB), and (AB2)
(ABC) is a superset of (AB), and (AB2)
(AB) and (AB2) contain the same nodes

如果 cypher 不适合这个，但另一种查询语言非常适合它，或者如果 Neo4j 不适合这个，但另一个数据库非常适合它，我也很感谢您对此提供意见。

回答查询性能（截至 2017-02-28T21:56Z）

我对 Neo4j 或图形数据库查询的经验不足以分析答案的性能，我还没有构建我的大型数据集以进行更有意义的比较，但我认为我会使用 PROFILE 运行每个数据集命令并列出数据库命中成本。我省略了时序数据，因为我无法使其与这么小的数据集保持一致或有意义。

stdob--：总共 129 次 db 命中
Dave Bennett：总分贝命中 46 次
InverseFalcon：总 db 命中 27 次

【问题讨论】：

Dave Bennett 和 stdob-- 的答案似乎都给了我我要求的结果，谢谢。我对两者都投了赞成票，一旦我在更大的数据集上尝试过它们，我就会给出答案，因为我不得不选择一个。
大数据集中大概有多少个Container节点？
我还没有组装它（这需要做一些事情，现在我知道我有可行的工具来完成后面的计算，这是我议程上的下一个）。然而，70,000 个容器似乎是一个现实的估计。每个容器的内容从几个到几百不等，但平均可能是 30 个。

标签： neo4j cypher

【解决方案1】：

// Get contents for each container
MATCH (SS:Container)-[:CONTAINS]->(CT:Content)
      WITH SS, 
           collect(distinct CT) as CTS
// Get all container not equal SS
MATCH (T:Container) 
      WHERE T <> SS
// For each container get their content
MATCH (T)-[:CONTAINS]->(CT:Content)
      // Test if nestd
      WITH SS, 
      CTS, 
      T, 
      ALL(ct in collect(distinct CT) WHERE ct in CTS) as test 
      WHERE test = true
RETURN SS, collect(T)

【讨论】：

【解决方案2】：

这是第一次尝试。我相信这可以使用一些改进，但这应该会让你继续前进。

// find the containers and their contents
match (n:Container)-[:CONTAINS]->(c:Content)

// group the contents per container
with n as container, collect(c.id) as contents

// combine the continers and their contents
with collect(container{.id, contents: contents}) as containers

// loop through the list of containers
with containers, size(containers) as container_size
unwind range(0, container_size -1) as i
unwind range(0, container_size -1) as j

// for each container pair compare the contents
with containers, i, j
where i <> j
and all(content IN containers[j].contents WHERE content in containers[i].contents)
with containers[i].id as superset, containers[j].id as subset
return superset, collect(subset) as subsets

【讨论】：

【解决方案3】：

在获取容器及其收集的内容后，我将使用的方法是通过内容计数过滤出哪些容器相互比较，然后运行apoc.coll.containsAll() from APOC Procedures 来过滤超集/同等集。最后，您可以比较内容的数量以确定它是超集还是同集，然后收集。

类似这样的：

match (con:Container)-[:CONTAINS]->(content)
with con, collect(content) as contents
with collect({con:con, contents:contents, size:size(contents)}) as all
unwind all as first
unwind all as second
with first, second
where first <> second and first.size >= second.size
with first, second
where apoc.coll.containsAll(first.contents, second.contents)
with first, 
 case when first.size = second.size and id(first.con) < id(second.con) then second end as same, 
 case when first.size > second.size then second end as superset
with first.con as container, collect(same.con) as sameAs, collect(superset.con) as supersetOf
where size(sameAs) > 0 or size(supersetOf) > 0
return container, sameAs, supersetOf
order by size(supersetOf) desc, size(sameAs) desc

【讨论】：