【发布时间】:2021-06-16 23:48:56
【问题描述】:
我整理了一些我想用作本地数据类型的自定义类是 SparkSQL。我看到 UDT 刚刚向公众开放,但它们很难弄清楚。有什么办法可以做到吗?
示例
case class IPv4(ipAddress: String){
// IPv4 converted to a number
val addrL: Long = IPv4ToLong(ipAddress)
}
// Will read in a bunch of random IPs in the form {"ipAddress": "60.80.39.27"}
val IPv4DF: DataFrame = spark.read.json(path)
IPv4DF.createOrReplaceTempView("IPv4")
spark.sql(
"""SELECT *
FROM IPv4
WHERE ipAddress.addrL > 100000"""
)
【问题讨论】:
标签: scala apache-spark serialization apache-spark-sql user-defined-types