【发布时间】:2018-10-31 10:03:10
【问题描述】:
我需要能够在我的本地计算机上运行 spark 以访问 azure wasb 和 adl url,但我无法让它工作。我在这里有一个精简的例子:
maven pom.xml(全新的pom,只设置了依赖):
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.8.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-azure-datalake</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-storage</artifactId>
<version>6.0.0</version>
</dependency>
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-data-lake-store-sdk</artifactId>
<version>2.2.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-azure</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-storage</artifactId>
<version>7.0.0</version>
</dependency>
Java 代码(不需要是 java - 可以是 scala):
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.sql.SparkSession;
public class App {
public static void main(String[] args) {
SparkConf config = new SparkConf();
config.setMaster("local");
config.setAppName("app");
SparkSession spark = new SparkSession(new SparkContext(config));
spark.read().parquet("wasb://container@host/path");
spark.read().parquet("adl://host/path");
}
}
无论我尝试什么,我都会得到:
Exception in thread "main" java.io.IOException: No FileSystem for scheme: wasb
adl 也是如此。我可以找到的每个文档都只是说添加我已经完成的 azure-storage 依赖项,或者说要使用 HDInsight。
有什么想法吗?
【问题讨论】:
标签: azure apache-spark azure-blob-storage azure-data-lake