您可以使用 row_number() 添加索引:
val myDataframe = sc.parallelize(List("a", "b", "c", "d")).toDF("value")
val withIndex = myDataframe.select(row_number().over(Window.orderBy('value)).as("index").cast("INT"), '*)
myDataframe.foreach { row =>
for (i <- 0 until (row.length)) {
val rowNum = row.getInt(0)
val colNum = i
}
}
但是如果你想将 df 保存到 excel 文件中,你应该收集你的数据。
然后将其转换为数组数组/二维数组。
val list: Array[Array[String]] = withIndex
.select(concat_ws(",", withIndex.columns.map(withIndex(_)): _*))
.map(s => s.getString(0))
.collect()
.map(s => s.toString.split(","))
for (elem <- 0 until list.length) {
for (elem2 <- 0 until list.apply(elem).length) {
println(list.apply(elem).apply(elem2),", row:"+elem+", col:"+elem2)
}
}
(1,, row:0, col:0)
(a,, row:0, col:1)
(2,, row:1, col:0)
(b,, row:1, col:1)
(3,, row:2, col:0)
(c,, row:2, col:1)
(4,, row:3, col:0)
(d,, row:3, col:1)
我不知道 apache poi 在 scala 中是如何工作的,但在 java 中它应该是这样的:
FileInputStream inputStream = new FileInputStream(new File(excelFilePath));
Workbook workbook = WorkbookFactory.create(inputStream);
Sheet newSheet = workbook.createSheet("spark");
// your data from DataFrame
Object[][] bookComments = {
{"1", "a"},
{"2", "b"},
{"3", "c"},
{"4", "d"},
};
int rowCount = 0;
for (Object[] aBook : bookComments) {
Row row = newSheet.createRow(++rowCount);
int columnCount = 0;
for (Object field : aBook) {
Cell cell = row.createCell(++columnCount);
if (field instanceof String) {
cell.setCellValue((String) field);
} else if (field instanceof Integer) {
cell.setCellValue((Integer) field);
}
}
}
FileOutputStream outputStream = new FileOutputStream("JavaBooks.xlsx");
workbook.write(outputStream);
workbook.close();
outputStream.close();