【问题标题】:How to run pySpark with snowflake JDBC connection driver in AWS glue如何在 AWS 胶水中使用雪花 JDBC 连接驱动程序运行 pySpark
【发布时间】:2020-10-17 18:13:27
【问题描述】:
I am trying to run the below code in AWS glue:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from py4j.java_gateway import java_import
SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"

## @params: [JOB_NAME, URL, ACCOUNT, WAREHOUSE, DB, SCHEMA, USERNAME, PASSWORD]
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'URL', 'ACCOUNT', 'WAREHOUSE', 'DB', 'SCHEMA', 'USERNAME', 'PASSWORD'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
java_import(spark._jvm, "net.snowflake.spark.snowflake")

## uj = sc._jvm.net.snowflake.spark.snowflake
spark._jvm.net.snowflake.spark.snowflake.SnowflakeConnectorUtils.enablePushdownSession(spark._jvm.org.apache.spark.sql.SparkSession.builder().getOrCreate())

options = {
"sfURL" : args['URL'],
"sfAccount" : args['ACCOUNT'],
"sfUser" : args['USERNAME'],
"sfPassword" : args['PASSWORD'],
"sfDatabase" : args['DB'],
"sfSchema" : args['SCHEMA'],
"sfWarehouse" : args['WAREHOUSE'],
}

df = spark.read \
  .format("snowflake") \
  .options(**options) \
  .option("dbtable", "STORE") \
  .load()

display(df)

## Perform any kind of transformations on your data and save as a new Data Frame: “df1”
##df1 = [Insert any filter, transformation, etc]

## Write the Data Frame contents back to Snowflake in a new table
##df1.write.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions).option("dbtable", "[new_table_name]").mode("overwrite").save()
job.commit()

并得到一个错误。

Traceback (most recent call last): File "/tmp/spark_snowflake", line 35, in <module> 
.option("dbtable", "STORE") \ File 
"/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 172, in load return 
self._df(self._jreader.load()) File "/opt/amazon/spark/python/lib/py4j-0.10.7- 

src.zip/py4j/java_gateway.py",第 1257 行,在 call 答案中,self.gateway_client,self.target_id, self.name)文件“/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py”,第 63 行,在 deco 返回 f(*a, **kw) 文件“/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py”,行 328, 在 get_return_value 格式(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error 调用 o78.load 时发生。 :java.lang.ClassNotFoundException:找不到数据源: 雪花。请在http://spark.apache.org/third-party-projects.html 找到软件包 org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657) 在 org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194) 在 org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167) 在 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 在 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 在 java.lang.reflect.Method.invoke(Method.java:498) 在 py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) 在 py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 在 py4j.Gateway.invoke(Gateway.java:282) 在 py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 在 py4j.commands.CallCommand.execute(CallCommand.java:79) 在 py4j.GatewayConnection.run(GatewayConnection.java:238) 在 java.lang.Thread.run(Thread.java:748) 引起:java.lang.ClassNotFoundException: snowflake.DefaultSource at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 在 java.lang.ClassLoader.loadClass(ClassLoader.java:418) 在 sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) 在 java.lang.ClassLoader.loadClass(ClassLoader.java:351) 在

org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scal a:634) 在

【问题讨论】:

    标签: python apache-spark pyspark snowflake-task aws-glue-spark


    【解决方案1】:

    错误消息显示“java.lang.ClassNotFoundException:无法找到数据源:雪花”。创建工作时,您是否使用了正确的 jar 并将其传递给 Glue?这里有一些例子

    Running custom Java class in PySpark

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-01-28
      • 1970-01-01
      • 1970-01-01
      • 2021-04-09
      • 1970-01-01
      • 2020-01-24
      相关资源
      最近更新 更多