【问题标题】:in spark sql getting problem while joining two dataframes . pleses solve my problem在火花 sql 中加入两个数据帧时出现问题。请解决我的问题
【发布时间】:2020-08-04 14:39:23
【问题描述】:

我有两个数据框,第一个数据框 10 列,(街道,州(行是 CA,US).etc)和第二个数据框两列(州和州全名)我想加入这两个数据使用状态的框架,但我不希望以全名替换状态中的状态列。

我用过,

tranDF.join(stateDF,tranDF("state")===stateDF("state"),"inner").show(false)

我需要的列是

street city state_NM beds ...etc

我想要 stateDF 中的一列应该替换 tranDF 中的 state 列,请任何人回答我的问题

【问题讨论】:

  • 这里是发帖时的一些建议。 1)问题可能有明确的数据示例(您可以发布示例 csv 或数据框),没有这些示例很难给出可生产的答案。 2)措辞和句子应该清楚,应该表明你的意图。 3) 避免拼写错误 4) 不要急于发布问题 检查您的问题,就好像您是正在阅读问题的其他人一样。有了这些,您将更好地回答您的问题。请记住...

标签: pandas apache-spark apache-spark-sql


【解决方案1】:

检查以下代码是否适合您,

joinDF= (tranDF.alias("a").join(stateDF.alias("b"), 
         col("a.state") == col("b.state") ,how='inner') 
         .drop(col("a.state")).drop(col("b.state")))

【讨论】:

  • 我不知道spark API中是否存在类似,how=这样的关键字
  • 这个在 pyspark 中可用。
【解决方案2】:

下面的方法应该可行...


  trandf.join(statedf,trandf("state")===statedf("state"),"inner")
    .selectExpr("trans.street", "trans.city", "state.statefullname", "trans.type")  
    .show(false)

解释:为每个 df 创建别名为 'trans' 和 'state' 内部连接后,只选择您想要的并且相关的列。使用selectselectExpr 如下所示。


带有沃尔玛数据的scala的完整示例..

package examples

import examples.JoinDemo.trandf
import org.apache.log4j.Level
import org.apache.spark.sql.{DataFrame, Row, SparkSession}
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._

object JoinDemo extends App {
  val logger = org.apache.log4j.Logger.getLogger("org")
  logger.setLevel(Level.WARN)

  val spark = SparkSession.builder().appName("JoinDemo").master("local").getOrCreate()

  import spark.implicits._

  val mycsvdata = """
    |"statefullname","state"
    |"Alabama","AL"
    |"Alaska","AK"
    |"Arizona","AZ"
    |"Arkansas","AR"
    |"California","CA"
    |"Colorado","CO"
    |"Connecticut","CT"
    |"Delaware","DE"
    |"District of Columbia","DC"
    |"Florida","FL"
    |"Georgia","GA"
    |"Hawaii","HI"
    |"Idaho","ID"
    |"Illinois","IL"
    |"Indiana","IN"
    |"Iowa","IA"
    |"Kansas","KS"
    |"Kentucky","KY"
    |"Louisiana","LA"
    |"Maine","ME"
    |"Montana","MT"
    |"Nebraska","NE"
    |"Nevada","NV"
    |"New Hampshire","NH"
    |"New Jersey","NJ"
    |"New Mexico","NM"
    |"New York","NY"
    |"North Carolina","NC"
    |"North Dakota","ND"
    |"Ohio","OH"
    |"Oklahoma","OK"
    |"Oregon","OR"
    |"Maryland","MD"
    |"Massachusetts","MA"
    |"Michigan","MI"
    |"Minnesota","MN"
    |"Mississippi","MS"
    |"Missouri","MO"
    |"Pennsylvania","PA"
    |"Rhode Island","RI"
    |"South Carolina","SC"
    |"South Dakota","SD"
    |"Tennessee","TN"
    |"Texas","TX"
    |"Utah","UT"
    |"Vermont","VT"
    |"Virginia","VA"
    |"Washington","WA"
    |"West Virginia","WV"
    |"Wisconsin","WI"
    |"Wyoming","WY"
  """.stripMargin.lines.toList.toDS
val mycsvdata1 =
  """
    |"opendate","street","city","state","long","lat","type"
    |1962-03-01,"5801 SW Regional Airport Blvd","Bentonville","AR",-94.239816,36.350885,"DistributionCenter"
    |1962-07-01,"2110 WEST WALNUT","Rogers","AR",-94.07141,36.342235,"SuperCenter"
    |1964-08-01,"1417 HWY 62/65 N","Harrison","AR",-93.09345,36.236984,"SuperCenter"
    |1965-08-01,"2901 HWY 412 EAST","Siloam Springs","AR",-94.50208,36.179905,"SuperCenter"
    |1967-10-01,"3801 CAMP ROBINSON RD.","North Little Rock","AR",-92.30229,34.813269,"Wal-MartStore"
    |1967-10-01,"1621 NORTH BUSINESS 9","Morrilton","AR",-92.75858,35.156491,"SuperCenter"
    |1968-03-01,"1303 SOUTH MAIN","Sikeston","MO",-89.58355,36.891163,"SuperCenter"
    |1968-03-01,"65 WAL-MART DRIVE","Mountain Home","AR",-92.35781,36.329026,"SuperCenter"
    |1968-07-01,"2020 SOUTH MUSKOGEE","Tahlequah","OK",-94.97185,35.923658,"SuperCenter"
    |1968-07-01,"1500 LYNN RIGGS BLVD","Claremore","OK",-95.61192,36.327143,"SuperCenter"
    |1968-11-01,"2705 GRAND AVE","Carthage","MO",-94.31164,37.168985,"SuperCenter"
    |1969-04-01,"1800 S JEFFERSON","Lebanon","MO",-92.64733,37.678528,"SuperCenter"
    |1969-04-01,"2214 FAYETTEVILLE RD","Van Buren","AR",-94.34581,35.456536,"SuperCenter"
    |1969-05-01,"1310 PREACHER RD/HGWY 160","West Plains","MO",-91.87408,36.719145,"SuperCenter"
    |1969-05-01,"3200 LUSK DRIVE","Neosho","MO",-94.39016,36.86429,"SuperCenter"
    |1969-11-01,"2500 MALCOLM ST/HWY 67 NORTH","Newport","AR",-91.24695,35.586065,"Wal-MartStore"
    |1970-03-01,"185 ST ROBERT BLVD","St. Robert","MO",-92.135741,37.827415,"SuperCenter"
    |1970-10-01,"1712 EAST OHIO","Clinton","MO",-93.76042,38.364214,"SuperCenter"
    |1970-10-01,"4901 SO. MILL ROAD","Pryor","OK",-95.30295,36.294174,"SuperCenter"
    |1970-11-01,"1201 N SERVICE ROAD EAST","Ruston","LA",-92.64696,32.52476,"SuperCenter"
    |1970-11-01,"3450 S. 4TH TRAFFICWAY","Leavenworth","KS",-94.93555,39.298776,"Wal-MartStore"
    |1971-02-01,"4820 SO. CLARK ST","Mexico","MO",-91.88404,39.179316,"SuperCenter"
    |1971-02-01,"1101 HWY 32 WEST","Salem","MO",-91.51423,37.630896,"SuperCenter"
    |1971-04-01,"2000 JOHN HARDEN DR","Jacksonville","AR",-92.12244,34.879419,"SuperCenter"
    |1971-05-01,"2415 N.W. MAIN ST","Miami","OK",-94.87142,36.880746,"SuperCenter"
    |1971-06-01,"3108 N BROADWAY","Poteau","OK",-94.61829,35.052793,"SuperCenter"
    |1971-06-01,"2050 WEST HWY 76","Branson","MO",-93.25668,36.64417,"Wal-MartStore"
    |1971-06-01,"1710 SO. 4TH ST","Nashville","AR",-93.85214,33.985613,"SuperCenter"
    |1971-08-01,"724 STADIUM WEST BLVD","Jefferson City","MO",-92.25329,38.568287,"SuperCenter"
    |1971-09-01,"701 WALTON DRIVE","Farmington","MO",-90.41404,37.779206,"SuperCenter"
    |1971-10-01,"101 EAST BLUEMONT AVENUE","Manhattan","KS",-96.56932,39.184986,"SuperCenter"
    |1971-11-01,"2025 BUS. HWY 60 WEST","Dexter","MO",-89.97428,36.784453,"SuperCenter"
    |1971-11-01,"2250 LINCOLN AVENUE","Nevada","MO",-94.35075,37.838563,"SuperCenter"
    |1971-11-01,"2802 WEST KINGS HIGHWAY","Paragould","AR",-90.5102,36.065711,"SuperCenter"
    |1971-11-01,"1301 HWY 24 EAST","Moberly","MO",-92.4344,39.420353,"SuperCenter"
    |1971-12-09,"1907 SE WASHINGTON ST.","Idabel","OK",-94.83154,33.883578,"SuperCenter"
    |1972-02-01,"1802 SOUTH BUSINESS HWY 54","Eldon","MO",-92.58395,38.311355,"Wal-MartStore"
    |1972-03-01,"2400 SOUTH MAIN","Fort Scott","KS",-94.73389,37.823295,"Wal-MartStore"
    |1972-05-01,"1155 HWY 65 NORTH","Conway","AR",-92.43401,35.075467,"SuperCenter"
    |1972-05-01,"4000 GREEN COUNTRY RD","Bartlesville","OK",-95.92404,36.733398,"SuperCenter"
  """.stripMargin.lines.toList.toDS
  val trandf: DataFrame = spark.read.option("header", true)
    .option("sep", ",")
    .option("inferSchema", true)
    .csv(mycsvdata1).as("trans")

  val statedf: DataFrame = spark.read.option("header", true)
    .option("sep", ",")
    .option("inferSchema", true)
    .csv(mycsvdata).as("state")

  trandf.join(statedf,trandf("state")===statedf("state"),"inner")
    .selectExpr("trans.street", "trans.city", "state.statefullname", "trans.type") // you want only columns from state df
    .show(false)

}

结果:

+--------------------------+--------------+-------------+-------------+
|street                    |city          |statefullname|type         |
+--------------------------+--------------+-------------+-------------+
|1201 N SERVICE ROAD EAST  |Ruston        |Louisiana    |SuperCenter  |
|1303 SOUTH MAIN           |Sikeston      |Missouri     |SuperCenter  |
|2705 GRAND AVE            |Carthage      |Missouri     |SuperCenter  |
|1800 S JEFFERSON          |Lebanon       |Missouri     |SuperCenter  |
|1310 PREACHER RD/HGWY 160 |West Plains   |Missouri     |SuperCenter  |
|3200 LUSK DRIVE           |Neosho        |Missouri     |SuperCenter  |
|185 ST ROBERT BLVD        |St. Robert    |Missouri     |SuperCenter  |
|1712 EAST OHIO            |Clinton       |Missouri     |SuperCenter  |
|4820 SO. CLARK ST         |Mexico        |Missouri     |SuperCenter  |
|1101 HWY 32 WEST          |Salem         |Missouri     |SuperCenter  |
|2050 WEST HWY 76          |Branson       |Missouri     |Wal-MartStore|
|724 STADIUM WEST BLVD     |Jefferson City|Missouri     |SuperCenter  |
|701 WALTON DRIVE          |Farmington    |Missouri     |SuperCenter  |
|2025 BUS. HWY 60 WEST     |Dexter        |Missouri     |SuperCenter  |
|2250 LINCOLN AVENUE       |Nevada        |Missouri     |SuperCenter  |
|1301 HWY 24 EAST          |Moberly       |Missouri     |SuperCenter  |
|1802 SOUTH BUSINESS HWY 54|Eldon         |Missouri     |Wal-MartStore|
|3450 S. 4TH TRAFFICWAY    |Leavenworth   |Kansas       |Wal-MartStore|
|101 EAST BLUEMONT AVENUE  |Manhattan     |Kansas       |SuperCenter  |
|2400 SOUTH MAIN           |Fort Scott    |Kansas       |Wal-MartStore|
+--------------------------+--------------+-------------+-------------+
only showing top 20 rows

【讨论】:

  • 谢谢,spark sql 命令在我的 cmd 中不起作用,例如 spark.sql("select * from table") 这不起作用但 joinDf.select("name").show它的工作我不知道它背后的原因。你也可以回答一下吗
  • @uppalaadarsh 人,请在提交之前校对您的文本。真的很难读。而且如果你想在Spark中使用SQL,你必须先将Dataframes注册为SQL表,否则Spark怎么知道table这个名字在所有Dataframes中指的是什么?
  • 如果你还好,请注意接受the answer as ownervote-up
猜你喜欢
  • 2019-11-25
  • 2015-08-18
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-06-21
  • 1970-01-01
  • 2020-01-10
  • 1970-01-01
相关资源
最近更新 更多