【问题标题】:spark Scala for loop inside again for loop再次在for循环中触发Scala for循环
【发布时间】:2020-04-23 14:16:03
【问题描述】:

我有一个计数,如下所示

   My data in count as below
   (Focus,37)
   (Test,26)

我的代码如下。

         for (i <- count ) {
              for(x <- i {
                 if(x == "Focus"){
                     Focus_cnt=i(x) }
                        else if(x == "Test"){
                           Test_cnt=i(x) }
                           else {
            pass
        }
    }
}

我面临的错误是 - for(x

在 spark Scala 中获得计数的更好方法。

【问题讨论】:

  • 这与 Spark 有什么关系。你也可以修复你的代码吗?
  • 如果您的计数是Tuple 的集合,如图所示,您为什么期望for( x &lt;- i) 循环工作?
  • 你能把你的完整代码贴在这里吗?
  • 需要对代码提出建议。
  • @Srinivas,代码已经在那里了。

标签: scala apache-spark


【解决方案1】:

你能检查一下吗?如果我理解错误,请告诉我。 在此,我正在应用过滤器并进行计数。

scala> Seq(("fa","fb","fc",5,"fe","ff","Focus"),("fba","fbb","fbc",16,"bd","be","Focus"),("fba","fbb","fbc",54,"bd","be","Focus"),("fca","fcb","fcc",135,"fcd","fef","Focus"),("a","b","c",5,"e","f","Test"),("aa","ba","ca",56,"ea","fa","Test"),("ab","cb","cc",35,"de","df","Test")).toDF("a","b","c","d","e","f","status")
res29: org.apache.spark.sql.DataFrame = [a: string, b: string ... 5 more fields]

scala> val df = Seq(("fa","fb","fc",5,"fe","ff","Focus"),("fba","fbb","fbc",16,"bd","be","Focus"),("fba","fbb","fbc",54,"bd","be","Focus"),("fca","fcb","fcc",135,"fcd","fef","Focus"),("a","b","c",5,"e","f","Test"),("aa","ba","ca",56,"ea","fa","Test"),("ab","cb","cc",35,"de","df","Test")).toDF("a","b","c","d","e","f","status")
df: org.apache.spark.sql.DataFrame = [a: string, b: string ... 5 more fields]

scala> val newDF = df.groupBy("status").agg(count("status").as("count"))
newDF: org.apache.spark.sql.DataFrame = [status: string, count: bigint]

scala> val focus_cnt = newDF.filter($"status" === "Focus").select("count").map(_.getAs[Long](0)).head
focus_cnt: Long = 4

scala> val test_cnt  = newDF.filter($"status" === "Test").select("count").map(_.getAs[Long](0)).head
test_cnt: Long = 3

【讨论】:

  • 你的理解是正确的。但我无法转换为数据框。因为我从数据帧中提取计数并写入列表。
  • 你为什么要写一个列表?您不能使用 spark 数据框做同样的事情,还是可以显示完整数据框的架构?
  • 架构如下: root |-- a: string (nullable = true) |-- b: string (nullable = true) |-- c: string (nullable = true) |-- d: long (nullable = true) |-- e: string (nullable = true) |-- f: string (nullable = true) |-- status: string (nullable = true) 我只得到状态和计数数据框并写入不同的变量。
  • 我已经根据给定的架构更新了上述答案。现在检查?
  • .head 无法识别。因此将其删除。之后我在 map(_.getAs[Long](0)) 遇到错误,错误是 x$1: .
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2021-12-25
  • 1970-01-01
  • 2017-09-20
  • 2016-01-09
  • 2011-10-18
  • 1970-01-01
相关资源
最近更新 更多