【问题标题】:How to do Complex sub queries using cypher queries in Neo4j如何在 Neo4j 中使用密码查询进行复杂的子查询
【发布时间】:2016-06-19 23:39:19
【问题描述】:

我正在从事与飞行数据集相关的项目。我有一个以下格式的数据框:它有航班号、承运人名称、始发地、目的地、承运人延误、天气延误、nas 延误、安全延误和飞机延误详细信息(以分钟为单位)。

FL_NUM  CARRIER ORIGIN  DEST    carr_del    weather_del   nas_del   sec_del   aircraft_del

   1     AA      JFK    LAX        0             0            0         0     0
   1     AS      DCA    SEA        0             0            0         0     0
   1     B6      JFK    FLL        12            0            12        0     0
   1     HA      LAX    HNL        405           0            5         0     0
   1     VX      SFO    DCA        24            20           50        0     0
   1     WN      ATL    MDW         0             0            0        0     0
   1     WN      DAL    HOU         27            0            0        0     0

我在 Neo4j 中使用密码查询形成了如下关系:

   MERGE (origin:origin_airport {name: row.ORIGIN})
   MERGE (destination:dest_airport {name: row.DEST})
   MERGE (carrier:Carrier {name: row.UNIQUE_CARRIER})
   MERGE (flight:Flight {name: row.FL_NUM})
   MERGE (flight)-[:from {flnum: row.FL_NUM}]->(origin)
   MERGE (flight)-[:to {flnum: row.FL_NUM}]->(destination)
   MERGE (flight)-[:operated_by {carrier: row.UNIQUE_CARRIER}]->(carrier)
   MERGE (origin)-[r:delayed_by]->(destination)
   SET  r.carr_delay=row.carr_delay, r.weather_delay=row.weather_delay, 
   r.nas_delay=row.nas_delay, r.sec_delay=row.sec_delay,
   r.aircraft_delay=row.aircraft_delay
   MERGE (flight)-[r1:delayed_by]->(origin)
   SET  r1.carr_delay=row.carr_delay, r1.weather_delay=row.weather_delay, 
   r1.nas_delay=row.nas_delay, r1.sec_delay=row.sec_delay,
   r1.aircraft_delay=row.aircraft_delay
   ")

关系是:

1) Flight number linked to origin airport(ORIGIN)
2) Flight number linked to destination airport(DEST)
3) Flight number linked to Unique carrier
4) Origin airport linked by delay to destination airport.
Delay parameter holds the value of carrier delay, weather delay, nas delay,   
security and late aircraft delay
5) Flight linked by delay to origin airport
Here again, delay parameter holds the value of carrier delay, weather delay,  
nas delay, security and late aircraft delay

在这里,我希望回答前 10 名运营商 - 领先的延迟类型的问题。

我正在使用以下代码来获取与航班相关的前 10 名航空公司。

MATCH (f:Flight)-[:operated_by]->(c:Carrier)
WITH c, COUNT(f) AS flights
RETURN c.name,flights
ORDER BY flights DESC
LIMIT 10

我需要进行下一步并计算与每个运营商相关的最大延迟。在这里,我指定了以分钟为单位的延迟值,我的查询需要计算哪个延迟值更高,并返回该特定运营商的延迟名称。

从示例中,如果您注意到 HA,carr_del 具有更高的值,因此输出应该是这样的:

  Carrier        Cause of delay
   HA                Carrier delay
   VX                nas delay

是否可以在 Neo4j 中使用密码查询来实现?还是我需要改变关系结构?

如果上述结果很复杂,是否有可能获得与任何特定延迟相关的顶级运营商,例如运营商延迟?这里的运营商延迟具有所有运营商的价值,它应该根据最高值返回运营商。 我知道它开始有点像下面,但不知道如何结束。

    MATCH (c)<-[:operated_by]-(:Flight)-[r1:DELAYED_BY]

有人可以帮我吗?

【问题讨论】:

    标签: r neo4j cypher


    【解决方案1】:

    1)我认为您的模型有错误(您保留了冗余数据,并丢失了航班信息,执行了特定的承运人。)应该是这样的:

    MERGE (carrier:Carrier {name: row.UNIQUE_CARRIER})
    MERGE (flight:Flight {name: row.FL_NUM})
    MERGE (destination:Airport {name: row.DEST})
    MERGE (origin:Airport {name: row.ORIGIN})
    MERGE (origin)-[:from]->(flight)-[:to]->(destination)
    MERGE (flight)-[:flight_details]->
    // Stores information about the flight, perform a specific carrier
          (:FlightByCarrierDetails {
            name: 'Detail of ' + flight.name + ' by ' + carrier.name, 
            carr_del: row.carr_del, weather_del: row.weather_del, 
            nas_del: row.nas_del, sec_del: row.sec_del, aircraft_del: row.aircraft_del})
          -[:operated_by]->(carrier)
    

    2) 那么你的第一个查询是:

    MATCH (f:Flight)
          -[:flight_details]->(:FlightByCarrierDetails)
          -[:operated_by]->(c:Carrier)
    RETURN c.name as `Carrier name`, COUNT(f) AS flights
    ORDER BY flights DESC LIMIT 10
    

    3) 搜索频繁延迟的原因是:

    MATCH (f:Flight)
          -[:flight_details]->(d:FlightByCarrierDetails)
          -[:operated_by]-(c:Carrier)
    WITH c,
         // reasons of delay
         {carr: SUM(d.carr_del), weather: SUM(d.weather_del), 
          nas: SUM(d.nas_del), sec: SUM(d.sec_del), 
          aircraft: SUM(d.aircraft_del)} as rD
    UNWIND [rD.carr, rD.weather, rD.nas, rD.sec, rD.aircraft] as delay
    WITH c, rD, max(delay) as mD
    RETURN c.name as `Carrier name`,  
           REDUCE ( acc=0, r in keys(rD) | acc + rD[r] ) as `Total delay`,
           FILTER(r in keys(rD) WHERE rD[r]>=mD) as `Cause of delay`
    ORDER BY `Total delay` DESC
    

    【讨论】:

    • 这是一个很好的答案。工作没有任何问题。非常感谢!!!...
    • 尝试学习代码,请您解释一下以下两个步骤如何工作。 REDUCE ( acc=0, r in keys(rD) | acc + rD[r] ) as Total delay, FILTER(r in keys(rD) WHERE rD[r]>=mD) as Cause of delay
    • REDUCE 计算延迟的总和 FITER 得到延迟的原因大于或等于最大延迟。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-09-29
    • 1970-01-01
    • 2020-01-05
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多