【发布时间】:2016-06-19 23:39:19
【问题描述】:
我正在从事与飞行数据集相关的项目。我有一个以下格式的数据框:它有航班号、承运人名称、始发地、目的地、承运人延误、天气延误、nas 延误、安全延误和飞机延误详细信息(以分钟为单位)。
FL_NUM CARRIER ORIGIN DEST carr_del weather_del nas_del sec_del aircraft_del
1 AA JFK LAX 0 0 0 0 0
1 AS DCA SEA 0 0 0 0 0
1 B6 JFK FLL 12 0 12 0 0
1 HA LAX HNL 405 0 5 0 0
1 VX SFO DCA 24 20 50 0 0
1 WN ATL MDW 0 0 0 0 0
1 WN DAL HOU 27 0 0 0 0
我在 Neo4j 中使用密码查询形成了如下关系:
MERGE (origin:origin_airport {name: row.ORIGIN})
MERGE (destination:dest_airport {name: row.DEST})
MERGE (carrier:Carrier {name: row.UNIQUE_CARRIER})
MERGE (flight:Flight {name: row.FL_NUM})
MERGE (flight)-[:from {flnum: row.FL_NUM}]->(origin)
MERGE (flight)-[:to {flnum: row.FL_NUM}]->(destination)
MERGE (flight)-[:operated_by {carrier: row.UNIQUE_CARRIER}]->(carrier)
MERGE (origin)-[r:delayed_by]->(destination)
SET r.carr_delay=row.carr_delay, r.weather_delay=row.weather_delay,
r.nas_delay=row.nas_delay, r.sec_delay=row.sec_delay,
r.aircraft_delay=row.aircraft_delay
MERGE (flight)-[r1:delayed_by]->(origin)
SET r1.carr_delay=row.carr_delay, r1.weather_delay=row.weather_delay,
r1.nas_delay=row.nas_delay, r1.sec_delay=row.sec_delay,
r1.aircraft_delay=row.aircraft_delay
")
关系是:
1) Flight number linked to origin airport(ORIGIN)
2) Flight number linked to destination airport(DEST)
3) Flight number linked to Unique carrier
4) Origin airport linked by delay to destination airport.
Delay parameter holds the value of carrier delay, weather delay, nas delay,
security and late aircraft delay
5) Flight linked by delay to origin airport
Here again, delay parameter holds the value of carrier delay, weather delay,
nas delay, security and late aircraft delay
在这里,我希望回答前 10 名运营商 - 领先的延迟类型的问题。
我正在使用以下代码来获取与航班相关的前 10 名航空公司。
MATCH (f:Flight)-[:operated_by]->(c:Carrier)
WITH c, COUNT(f) AS flights
RETURN c.name,flights
ORDER BY flights DESC
LIMIT 10
我需要进行下一步并计算与每个运营商相关的最大延迟。在这里,我指定了以分钟为单位的延迟值,我的查询需要计算哪个延迟值更高,并返回该特定运营商的延迟名称。
从示例中,如果您注意到 HA,carr_del 具有更高的值,因此输出应该是这样的:
Carrier Cause of delay
HA Carrier delay
VX nas delay
是否可以在 Neo4j 中使用密码查询来实现?还是我需要改变关系结构?
如果上述结果很复杂,是否有可能获得与任何特定延迟相关的顶级运营商,例如运营商延迟?这里的运营商延迟具有所有运营商的价值,它应该根据最高值返回运营商。 我知道它开始有点像下面,但不知道如何结束。
MATCH (c)<-[:operated_by]-(:Flight)-[r1:DELAYED_BY]
有人可以帮我吗?
【问题讨论】: