【发布时间】:2017-03-23 12:05:02
【问题描述】:
我是 Scala 和 Spark 的新手,并尝试在我找到的一些示例上进行构建。本质上,我正在尝试从数据框中调用一个函数,以使用 Google API 从邮政编码中获取状态。 我有代码单独工作但不能一起工作;( 这是一段代码不起作用...
Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type Unit is not supported
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:716)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:654)
at org.apache.spark.sql.functions$.udf(functions.scala:2837)
at MovieRatings$.getstate(MovieRatings.scala:51)
at MovieRatings$$anonfun$4.apply(MovieRatings.scala:48)
at MovieRatings$$anonfun$4.apply(MovieRatings.scala:47)...
Line 51 starts with def getstate = udf {(zipcode:String)...
...
代码:
userDF.createOrReplaceTempView("Users")
// SQL statements can be run by using the sql methods provided by Spark
val zipcodesDF = spark.sql("SELECT distinct zipcode, zipcode as state FROM Users")
// zipcodesDF.map(zipcodes => "zipcode: " + zipcodes.getAs[String]("zipcode") + getstate(zipcodes.getAs[String]("zipcode"))).show()
val colNames = zipcodesDF.columns
val cols = colNames.map(cName => zipcodesDF.col(cName))
val theColumn = zipcodesDF("state")
val mappedCols = cols.map(c =>
if (c.toString() == theColumn.toString()) getstate(c).as("transformed") else c)
val newDF = zipcodesDF.select(mappedCols:_*).show()
}
def getstate = udf {(zipcode:String) => {
val url = "http://maps.googleapis.com/maps/api/geocode/json?address="+zipcode
val result = scala.io.Source.fromURL(url).mkString
val address = parse(result)
val shortnames = for {
JObject(address_components) <- address
JField("short_name", short_name) <- address_components
} yield short_name
val state = shortnames(3)
//return state.toString()
val stater = state.toString()
}
}
【问题讨论】:
-
您的
UDF没有返回任何内容,因为您注释掉了return state.toString()部分。
标签: scala function apache-spark dataframe