【发布时间】:2016-10-24 08:30:06
【问题描述】:
RDD 数据将被转换为数据帧。但我无法这样做。 ToDf 不工作,我也尝试使用数组 RDD 到数据帧。请给我建议。这个程序用于使用 scala 和 spark 解析示例 excel
import java.io.{File, FileInputStream}
import org.apache.poi.xssf.usermodel.XSSFCell
import org.apache.poi.xssf.usermodel.{XSSFSheet, XSSFWorkbook}
import org.apache.poi.ss.usermodel.Cell._
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.types.{ StructType, StructField, StringType, IntegerType };
object excel
{
def main(args: Array[String]) =
{
val sc = new SparkContext(new SparkConf().setAppName("Excel Parsing").setMaster("local[*]"))
val file = new FileInputStream(new File("test.xlsx"))
val wb = new XSSFWorkbook(file)
val sheet = wb.getSheetAt(0)
val rowIterator = sheet.iterator()
val builder = StringBuilder.newBuilder
var column = ""
while (rowIterator.hasNext())
{
val row = rowIterator.next();
val cellIterator = row.cellIterator();
while (cellIterator.hasNext())
{
val cell = cellIterator.next();
cell.getCellType match {
case CELL_TYPE_NUMERIC ⇒builder.append(cell.getNumericCellValue + ",")
case CELL_TYPE_BOOLEAN ⇒ builder.append(cell.getBooleanCellValue + ",")
case CELL_TYPE_STRING ⇒ builder.append(cell.getStringCellValue + ",")
case CELL_TYPE_BLANK ⇒ builder.append(",")
}
}
column = builder.toString()
println(column)
builder.setLength(0)
}
val data= sc.parallelize(column)
println(data)
}
}
【问题讨论】:
-
您在此处列出的代码中没有使用 Spark 做任何事情...
-
哦,我没有在底部看到它。无论如何,@Shivansh Srivastava 已经给你答案了。
标签: excel apache-spark dataframe rdd