【发布时间】:2021-02-21 15:52:11
【问题描述】:
通过使用以下代码(来源:https://docs.microsoft.com/en-us/azure/databricks/kb/python/hdfs-to-read-files)
URI = sc._gateway.jvm.java.net.URI
Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
conf = sc._jsc.hadoopConfiguration()
conf.set(
"fs.azure.account.key.<account-name>.blob.core.windows.net,
"<account-access-key>")
fs = Path('wasbs://<container-name>@<account-name>.blob.core.windows.net/<file-path>/').getFileSystem(sc._jsc.hadoopConfiguration())
istream = fs.open(Path('wasbs://<container-name>@<account-name>.blob.core.windows.net/<file-path>/'))
reader = sc._gateway.jvm.java.io.BufferedReader(sc._jvm.java.io.InputStreamReader(istream))
while True:
thisLine = reader.readLine()
if thisLine is not None:
print(thisLine)
else:
break
istream.close()
我收到了 java.io.BufferedReader 类型的对象读取器,我想用它来读取 pandas、geopandas 或其他库(不像示例中那样逐行读取和打印)。
你能帮帮我吗?
谢谢 卢卡斯
【问题讨论】: