在numpy中快速读取更少的结构ascii数据文件答案

【问题标题】：fast read less structure ascii data file in numpy在numpy中快速读取更少的结构ascii数据文件
【发布时间】：2014-04-06 05:08:06
【问题描述】：

我想从 .xsf 文件中读取数据网格（浮点数的 3D 数组）。（格式文档在这里http://www.xcrysden.org/doc/XSF.html BEGIN_BLOCK_DATAGRID_3D 块）

问题是数据在 5 列中，如果元素的数量 Nx*Ny*Nz 不能被 5 整除，那么最后一行可以有任意长度。由于这个原因，我无法使用 numpy.genfromtxt() of numpy.loadtxt() 。 ..

我做了一个子程序，它确实解决了这个问题，但速度非常慢（因为它可能使用紧密的循环）。我要读取的文件很大（>200 MB 200x200x200 = 8000000 个 ASCII 数字）

有没有什么真正快速的方法如何将python / numpy中的这种不友好格式读入ndarray？

xsf 数据网格如下所示（shape=(3,3,3) 的示例）

BEGIN_BLOCK_DATAGRID_3D
 BEGIN_DATAGRID_3D_this_is_3Dgrid          
 3  3  3         # number of elements Nx Ny Nz                     
 0.0 0.0 0.0     # grid origin in real space                     
 1.0 0.0 0.0     # grid size in real space                    
 0.0 1.0 0.0                               
 0.0 0.0 1.0                          
   0.000  1.000  2.000  5.196  8.000   # data in 5 columns     
   1.000  1.414  2.236  5.292  8.062        
   2.000  2.236  2.828  5.568  8.246        
   3.000  3.162  3.606  6.000  8.544        
   4.000  4.123  4.472  6.557  8.944                   
   1.000  1.414                       # this is the problem
  END_DATAGRID_3D                      
 END_BLOCK_DATAGRID_3D

【问题讨论】：

现有函数对该数据的输出是什么？
我不确定我是否理解您的问题。生成的 3D 数组只是： [[[0.000, 1.000, 2.000], [5.196, 8.000, 1.000], [ 1.414, 2.236, 5.292]] [[8.062, 2.000, 2.236], [2.828, 5.568, 8.246], [3.000, 3.162, 3.606]], [[6.000, 8.544, 4.000], [4.123, 4.472, 6.557], [8.944, 1.000, 1.414]]]

标签： python numpy io

【解决方案1】：

我得到了一些使用 Pandas 和 Numpy 的东西。 Pandas 会为缺失的数据填充 nan 值。

import pandas as pd
import numpy as np
df = pd.read_csv("xyz.data", header=None, delimiter=r'\s+', dtype=np.float, skiprows=7, skipfooter=2)
data = df.values.flatten()
data = data[~np.isnan(data)]
result = data.reshape((data.size/3, 3))

输出

>>> result
array([[ 0.   ,  1.   ,  2.   ],
       [ 5.196,  8.   ,  1.   ],
       [ 1.414,  2.236,  5.292],
       [ 8.062,  2.   ,  2.236],
       [ 2.828,  5.568,  8.246],
       [ 3.   ,  3.162,  3.606],
       [ 6.   ,  8.544,  4.   ],
       [ 4.123,  4.472,  6.557],
       [ 8.944,  1.   ,  1.414]])

【讨论】：