【问题标题】:How to join in spark graphx given multiple vertex types如何在给定多种顶点类型的情况下加入 spark graphx
【发布时间】:2017-03-23 18:22:21
【问题描述】:

我对 spark graphx 比较陌生。基本上我的图表有:

  1. 2 种顶点类型:人和车
  2. edge 描述了谁拥有哪辆车

我想给图中所有的人顶点,遍历边收集每个人的汽车列表

例如

person1 -> [car1, car2]
person2 -> [car3]

【问题讨论】:

    标签: scala apache-spark spark-graphx


    【解决方案1】:

    您可以通过一些 SQL 来实现这一点。

    假设您有以下图表:

    import org.apache.spark.graphx
    import org.apache.spark.rdd.RDD
    
    // Create an RDD for the vertices
    val v: RDD[(VertexId, (String))] =
      sc.parallelize(Array((1L, ("car1")), (2L, ("car2")),
                           (3L, ("car3")), (4L, ("person1")),(5L, ("person2"))))
    // Create an RDD for edges
    val e: RDD[Edge[Int]] =
      sc.parallelize(Array(Edge(4L, 1L,1),    Edge(4L, 2L, 1),
                           Edge(5L, 1L,1)))
    
    
    val graph = Graph(v,e)
    

    现在将边和顶点提取到数据帧中:

    val vDf = graph.vertices.toDF("vId","vName")
    val eDf =graph.edges.toDF("person","car","attr")
    

    将数据转换成想要的输出

    eDf.drop("attr").join(vDf,'person === 'vId).drop("vId","person").withColumnRenamed("vName","person")
    .join(vDf,'car === 'vId).drop("car","vId")
    .groupBy("person")
    .agg(collect_set('vName)).toDF("person","car")
    .show()
    
    
    +-------+------------+
    | person|         car|
    +-------+------------+
    |person2|      [car1]|
    |person1|[car2, car1]|
    +-------+------------+
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-10-31
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多