【问题标题】:How do I extract each words from a text file in scala如何从scala中的文本文件中提取每个单词
【发布时间】:2020-01-08 16:25:45
【问题描述】:

我对 Scala 非常陌生。我有一个文本文件,其中只有一行文件单词由分号 (;) 分隔。 我想提取每个单词,删除空格,将所有转换为小写并根据每个单词的索引调用它们。以下是我的处理方式:

newListUpper2.txt contains (Bed;  chairs;spoon; CARPET;curtains )
val file = sc.textFile("myfile.txt")
val lower = file.map(x=>x.toLowerCase)
val result = lower.flatMap(x=>x.trim.split(";"))
result.collect.foreach(println)

以下是我执行代码时的 REPL 副本

    scala> val file = sc.textFile("newListUpper2.txt")
    file: org.apache.spark.rdd.RDD[String] = newListUpper2.txt MapPartitionsRDD[5] at textFile at 
    <console>:24
    scala> val lower = file.map(x=>x.toLowerCase)
    lower: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[6] at map at <console>:26
    scala> val result = lower.flatMap(x=>x.trim.split(";"))
    result: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[7] at flatMap at <console>:28
    scala> result.collect.foreach(println)
bed                                                                          
 chairs
spoon
 carpet
curtains
scala> result(0)
<console>:31: error: org.apache.spark.rdd.RDD[String] does not take parameters
       result(0)

结果未修剪,然后将索引作为参数传递以获取该索引处的单词会产生错误。如果我将每个单词的索引作为参数传递,我的预期结果应该如下所述

result(0)= bed
result(1) = chairs
result(2) = spoon
result(3) = carpet
result(4) = curtains

我做错了什么?

【问题讨论】:

    标签: scala apache-spark indexing text-files


    【解决方案1】:
    newListUpper2.txt contains (Bed;  chairs;spoon; CARPET;curtains )
    val file = sc.textFile("myfile.txt")
    val lower = file.map(x=>x.toLowerCase)
    val result = lower.flatMap(x=>x.trim.split(";")) // x = `bed;  chairs;spoon; carpet;curtains` , x.trim does not work. trim func effective for head and tail only
    result.collect.foreach(println)
    

    试试看:

    val result = lower.flatMap(x=>x.split(";").map(x=>x.trim))
    

    【讨论】:

      【解决方案2】:

      1) 问题 1

      scala> result(0)
      <console>:31: error: org.apache.spark.rdd.RDD[String] does not take parameters
      

      result 是一个 RDD,它不能接受这种格式的参数。相反,您可以使用result.show(10,false)

      2) 第 2 期 - 实现这样的目标 - result(0)= bed ,result(1) = chairs.....

      scala> var result = scala.io.Source.fromFile("/path/to/File").getLines().flatMap(x=>x.split(";").map(x=>x.trim)).toList
      result: List[String] = List(Bed, chairs, spoon, CARPET, curtains)
      
      scala> result(0)
      res21: String = Bed
      
      scala> result(1)
      res22: String = chairs
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2016-02-12
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多