【问题标题】:How to convert Dataframe to dynamic frame如何将数据框转换为动态框
【发布时间】:2021-11-10 14:35:48
【问题描述】:

我是 AWS 胶水的新手,我正在尝试使用 pyspark 运行一些转换过程。我成功运行了我的 ETL,但我正在寻找另一种将数据帧转换为动态帧的方法。

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

glueContext = GlueContext(SparkContext.getOrCreate())

# load data from crawler
students = glueContext.create_dynamic_frame.from_catalog(database="example_db", table_name="samp_csv")

# move data into a new variable for transformation
students_trans = students

# convert dynamicframe(students_trans) to dataframe
students_= students_trans.toDF()

# run transformation change column names/ drop columns
students_1= students_.withColumnRenamed("state","County").withColumnRenamed("capital","cap").drop("municipal",'metropolitan')
#students_1.printSchema()

#convert df back to dynamicframe
from awsglue.dynamicframe import DynamicFrame

students_trans = students_trans.fromDF(students_1, glueContext, "students_trans")

#load into s3 bucket
glueContext.write_dynamic_frame.from_options(frame = students_trans,
              connection_type = "s3",
              connection_options = {"path": "s3://kingb/target/"},
              format = "csv")

【问题讨论】:

    标签: pyspark aws-glue


    【解决方案1】:
    from awsglue import DynamicFrame
    
    
    students_trans = DynamicFrame.fromDF(students_1, self._glue_context, "df")
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-05-26
      • 2019-05-12
      • 2019-11-04
      • 2015-06-11
      • 1970-01-01
      • 2017-09-22
      • 2022-01-07
      相关资源
      最近更新 更多