将大型点云数据文件导入 MATLAB答案

【问题标题】：Importing a large point cloud data file into MATLAB将大型点云数据文件导入 MATLAB
【发布时间】：2014-01-17 19:41:23
【问题描述】：

我是一个新的 MATLAB 用户，几乎没有编程经验（我有机械工程背景），所以如果这是一个简单的问题，我提前道歉！

我正在尝试将大型点云文件（.pts 文件扩展名）导入 MATLAB 进行处理。我被引导相信该文件包含一个文本标题和 3 列整数数据（x、y 和 z 坐标） - 我设法将文件的第一部分作为文本文件打开，就是这种情况。

我无法将文件直接导入 MATLAB，因为它太大（8.75 亿点）并且一次只能导入 9000000 行，因此我编写了下面的脚本将文件导入（并因此保存）为 9000000x3块，保存为 MATLAB 文件（或其他适当的格式）。

脚本：

filename='pointcloud.pts';
fid = fopen(filename,'r');
frewind(fid);
header=fread(fid,8,'*char');
points=fread(fid,1,'*int32');
pointsinpass=9000000;
numofpasses=(points/pointsinpass)
counter = 1;

while counter <= numofpasses;

   clear block;

   block=zeros(pointsinpass,3);


    for p=1:pointsinpass;
      block(p,[1:3])=fread(fid, 1,'float');
    end;

    indx=counter;
    filename=sprintf('block%d',indx);
    save (filename), block;


    disp('Iteration')
    disp(counter)
    disp('complete')
    counter=counter+1;


end;
fclose(fid);

脚本运行良好，循环 5 次迭代，导入 5 个数据块。然后，当它尝试导入第 6 个块时，我收到以下错误：

Subscripted assignment dimension mismatch.

Error in LiDARread_attempt5 (line 22)
          block(p,[1:3])=fread(fid, 1,'float');

我不确定是什么导致了错误，我相信它与fread 命令大小有关，因为我已经尝试了各种值，例如 3，这使得在尺寸不匹配错误发生之前只导入一个块.

如果我错过了一些非常基本的东西，我再次道歉，我对编程技术的理解非常有限，直到几个月前才被介绍。

【问题讨论】：

第 6 个块中的数据文件本身可能有错误。为什么不使用调试器？ (dbstop if error)

标签： matlab point-clouds pts lidar

【解决方案1】：

在某些时候fread() 返回[] 为空。

我可以展示如何重现错误：

a = zeros(2,2)
a =
     0     0
     0     0
a(2,1:2) = []

Subscripted assignment dimension mismatch.

我建议使用textscan() 而不是fread()。

【讨论】：

【解决方案2】：

Matlab 是一个很棒的工具，但对于大数据问题，我发现它很困难。虽然它代表了一个学习曲线，但我可以建议你研究一下 python 吗？多年前，我从 matlab 切换到 python，一路上并没有回头太多。

Spyder 是一个强大的 IDE http://code.google.com/p/spyderlib/，它应该为 matlab 用户提供一个很好的桥梁。用于 Windows 的 Pythonxy http://code.google.com/p/pythonxy/ 将为您提供在该平台上高效工作所需的所有工具，但最后我检查了它仅支持 32 位地址空间。如果您需要 Windows 上的 64 位支持，https://stackoverflow.com/users/453463/cgohlke http://www.lfd.uci.edu/~gohlke/pythonlibs/ 提供了很棒的软件包当然在 linux 上，所有必要的软件包都可以很容易地安装。您需要在所有情况下都使用 python2.7 以完全兼容必需的包

我不知道您问题的所有细节，但使用 numpy memmap 数据结构可能会有所帮助。它允许从磁盘操作巨大的数组，而无需将整个数组加载到主内存中。它会为您处理内部结构。

基本上你所做的就是：

##memmap example
#notice we first use the mdoe w+ to create.  Subsequent reads 
#(and modifications can use r+)
fpr = np.memmap('MemmapOutput', dtype='float32', mode='w+', shape=(3000000,4))
fpr = numpy.random.rand(3000000,4)
del fpr #this frees the array and flushes to disk
fpr = np.memmap('MemmapOutput', dtype='float32', mode='r+', shape=(3000000,4))
fpr = numpy.random.rand(3000000,4)#reassign the values - in general you might not need to modify the array. but it can be done
columnSums = fpr.sum(axis=1) #notice you can use all the numpy functions seamlessly
del fpr #best to close the array again when done proces

请不要采取错误的方式。我并不是要说服您放弃 matlab，而是考虑在您的工具集中添加另一个工具。

【讨论】：