【发布时间】:2020-06-10 01:49:44
【问题描述】:
我无法在 Windows 上通过 pyarrow 读取 snappy parquet 文件。
import dask.dataframe as dd
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(15, 4)), columns=list('ABCD'))
dd_df = dd.from_pandas(df, npartitions=1)
dd_df.to_parquet("my_df.snappy.parquet", engine="pyarrow", compression="snappy")
dd_df_copy = dd.read_parquet("my_df.snappy.parquet", engine="pyarrow")
dd_df_copy.compute() #<--- This is where it crashes
我已经用 Python 3.8 在干净的 Anaconda 环境中复制了这个问题。创建环境后,我跑了pip install "dask[complete]"和pip install pyarrow
错误是:
Problem signature:
Problem Event Name: APPCRASH
Application Name: python.exe
Application Version: 3.8.3150.1013
Application Timestamp: 5ed53446
Fault Module Name: arrow.dll
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 5ebd3029
Exception Code: c000001d
Exception Offset: 00000000007abfc7
OS Version: 6.3.9600.2.0.0.16.7
Locale ID: 1033
Additional Information 1: d8e4
Additional Information 2: d8e42c04b828d96accf490cd13472bea
Additional Information 3: aebe
Additional Information 4: aebe917bfb5c1b58e884baa1f9c3d3d2
当我尝试使用conda -c conda-forge dask pyarrow 时会出现类似的崩溃版本:
Problem signature:
Problem Event Name: APPCRASH
Application Name: python.exe
Application Version: 3.8.3150.1013
Application Timestamp: 5ed53446
Fault Module Name: arrow.dll
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 5ecf56ac
Exception Code: c000001d
Exception Offset: 0000000000521587
OS Version: 6.3.9600.2.0.0.16.7
Locale ID: 1033
Additional Information 1: e863
Additional Information 2: e8638a01b9fb70505b0604ef9b98f3c6
Additional Information 3: 1e47
Additional Information 4: 1e47c852f479606e071f3ea8f80878a1
【问题讨论】:
-
如果您在 anaconda 环境中,请尝试使用
conda而不是pip安装软件包?这可能会解决问题。 -
我会重做以确保。我只是在 conda 失败后才尝试 pip 并在网上找到谷歌结果暗示通过 conda 有问题的 pyarrow。请继续关注。
-
在一个全新的环境中运行
conda install dask pyarrow。得到以下崩溃日志:``` 问题签名:问题事件名称:APPCRASH 应用程序名称:python.exe 应用程序版本:3.8.3150.1013 应用程序时间戳:5ed53446 故障模块名称:arrow.dll 故障模块版本:0.0.0.0 故障模块时间戳:5ecf56ac 异常代码:c000001d 异常偏移量:0000000000521587 操作系统版本:6.3.9600.2.0.0.16.7 区域设置 ID:1033 ``` -
您介意使用
-c conda-forge尝试 conda 安装吗? -
更新了我从中得到的崩溃日志
标签: python dask parquet pyarrow