【问题标题】:AWSGLUE python package - ls cannot access dirAWSGLUE python 包 - ls 无法访问目录
【发布时间】:2021-03-05 11:29:37
【问题描述】:

我正在尝试在本地机器上安装本地 awsglue 包以用于开发目的(Windows + Git Bash)

https://github.com/awslabs/aws-glue-libs/tree/glue-1.0

https://support.wharton.upenn.edu/help/glue-debugging

下面提到的Spark目录和py4j错误确实存在但仍然报错

我触发 sh 的目录如下:

user@machine xxxx64~/Desktop/lm_aws_glue/aws-glue-libs-glue-1.0/bin
$ ./glue-setup.sh
ls: cannot access 'C:\Spark\spark-3.1.1-bin-hadoop2.7/python/lib/py4j-*-src.zip': No such file or directory
rm: cannot remove 'PyGlue.zip': No such file or directory
./glue-setup.sh: line 14: zip: command not found

ls 结果:

$ ls -l
total 7
-rwxr-xr-x 1 n1543781 1049089 135 May  5  2020 gluepyspark*
-rwxr-xr-x 1 n1543781 1049089 114 May  5  2020 gluepytest*
-rwxr-xr-x 1 n1543781 1049089 953 Mar  5 11:10 glue-setup.sh*
-rwxr-xr-x 1 n1543781 1049089 170 May  5  2020 gluesparksubmit*

【问题讨论】:

  • 你用的是linux还是windows?
  • Windows + Git Bash
  • 这并不能回答您的问题,但如果您愿意使用 docker 映像进行本地 Glue 开发,see this aws blog

标签: python etl aws-glue


【解决方案1】:

原始安装代码需要一些调整并且工作正常。仍然需要zip 的解决方法。

#!/usr/bin/env bash

#original code
#ROOT_DIR="$(cd $(dirname "$0")/..; pwd)"
#cd $ROOT_DIR

#re-written
ROOT_DIR="$(cd /c/aws-glue-libs; pwd)" 
cd $ROOT_DIR

SPARK_CONF_DIR=$ROOT_DIR/conf
GLUE_JARS_DIR=$ROOT_DIR/jarsv1

#original code
#PYTHONPATH="$SPARK_HOME/python/:$PYTHONPATH"
#PYTHONPATH=`ls $SPARK_HOME/python/lib/py4j-*-src.zip`:"$PYTHONPATH"

#re-written
PYTHONPATH="/c/Spark/spark-3.1.1-bin-hadoop2.7/python/:$PYTHONPATH"
PYTHONPATH=`ls /c/Spark/spark-3.1.1-bin-hadoop2.7/python/lib/py4j-*-src.zip`:"$PYTHONPATH"

# Generate the zip archive for glue python modules
rm PyGlue.zip
zip -r PyGlue.zip awsglue
GLUE_PY_FILES="$ROOT_DIR/PyGlue.zip"
export PYTHONPATH="$GLUE_PY_FILES:$PYTHONPATH"

# Run mvn copy-dependencies target to get the Glue dependencies locally
#mvn -f $ROOT_DIR/pom.xml -DoutputDirectory=$ROOT_DIR/jarsv1 dependency:copy-dependencies

export SPARK_CONF_DIR=${ROOT_DIR}/conf
mkdir $SPARK_CONF_DIR
rm $SPARK_CONF_DIR/spark-defaults.conf
# Generate spark-defaults.conf
echo "spark.driver.extraClassPath $GLUE_JARS_DIR/*" >> $SPARK_CONF_DIR/spark-defaults.conf
echo "spark.executor.extraClassPath $GLUE_JARS_DIR/*" >> $SPARK_CONF_DIR/spark-defaults.conf

# Restore present working directory
cd -

【讨论】:

    猜你喜欢
    • 2016-02-26
    • 1970-01-01
    • 2019-03-31
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多