如何保存/加载张量以加快训练速度？答案

【问题标题】：How to save/load tensors to speed up training?如何保存/加载张量以加快训练速度？
【发布时间】：2019-02-24 14:44:39
【问题描述】：

我想知道是否可以在 tensorflow.js 中保存和加载张量以避免为每批重新计算它们？问题是我的 gpu 几乎没有使用，因为它必须等待 cpu 在训练之前将我的数组转换为张量。

我的工作流程现在看起来像这样：

加载数据集（从硬盘读取到数组）（1-2 秒）

2.cpu 将数组转换为张量（耗时较长）

3.gpu 训练（耗时 1 秒或更短）

卸载/整理（5秒，也有点太长了）
重复

编辑：这是一些有问题的代码（意味着长时间的繁重计算）和注释的没有问题的行：

async function learn_on(ep){

    for (var learn_ep = ep+1; learn_ep <= 1200; learn_ep++) {
        var batch_start = 0;

        var mini_batch_in = [];
        var mini_batch_out = [];

        var shuffle_arr=[];
        for(var i=0;i<in_tensor_sum.length;i++){
            shuffle_arr.push(i); // needs no time
        }

        shuffle_arr=F_shuffle_array(shuffle_arr); // needs no time

        // in_tensor_sum / out_tensor_sum is just an 2 dimensional array = data_set number , data points 
        for (var batch_num = batch_start; batch_num < in_tensor_sum.length; batch_num++) {

            mini_batch_in.push(in_tensor_sum[shuffle_arr[batch_num]]); // very fast also
            mini_batch_out.push(out_tensor_sum[shuffle_arr[batch_num]]);// very fast also

            if (batch_num + 1 == batch_start + 250 || batch_num == in_tensor_sum.length - 1) {
                //possible to import/export xs/ys?????
                var xs = tf.tensor(mini_batch_in); //here CPU heavy computation!!!!!!!!!!!!!!!! TAKES LONG TIME 9600 input units here
                var ys = tf.tensor(mini_batch_out); // and here CPU heavy computation!!!!!!!! TAKES not so Long time, but this is because of small output size just 400

                // GPU ACCELARATION starts here Super fast only one second! This rocks!!!
                await model.fit(xs, ys, {
                    epochs: 1, shuffle: true,
                    callbacks: {
                        onEpochEnd: async (epoch, log) => {
                            console.log(`${batch_num}:|Epoch ${learn_ep}: | set: ${batch_num / in_tensor_sum.length} | loss = ${log.loss}`);                          
                        },
                        onTrainEnd: async () => {

                        }
                    }
                });
                //avoid memory leaks START (ALSO TAKES a little time!!!!)
                await tf.tidy(() => {
                    tf.tensor([xs, ys]);
                    console.log('numTensors (inside tidy): ' + tf.memory().numTensors);
                });

                console.log('numTensors (outside tidy): ' + tf.memory().numTensors);
                xs.dispose();
                ys.dispose();
                console.log('numTensors (after dispose): ' + tf.memory().numTensors);

                batch_start = batch_num + 1;
                mini_batch_in = [];
                mini_batch_out = [];
                //avoid memory leaks END

            }


        }

    }
}

编辑 2：

我现在尝试使用 'tfjs-npy' 来保存和加载张量。但是我得到一个错误：

.
.
.
var xs = await tf.tensor(mini_batch_in);
var ys = await tf.tensor(mini_batch_out);

var fs = require('fs');            
var tf_parser= require  ('tfjs-npy');


var writeTO=await tf_parser.serialize(ys);
await fs.writeFileSync('/home/test/NetBeansProjects/ispeed_tensload/save_tensors/test.js',new Buffer(writeTO));

var tensor_data =await fs.readFileSync("/home/test/NetBeansProjects/ispeed_tensload/save_tensors/test.js");
var my_arrayBuffer = new Uint8Array(tensor_data).buffer;
var ys2=await tf_parser.parse(my_arrayBuffer);


await model.fit(xs, ys2, {....

错误：

(node:26576) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'values' of undefined
    at NodeJSKernelBackend.getInputTensorIds (/home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-node/dist/nodejs_kernel_backend.js:142:26)
    at NodeJSKernelBackend.executeSingleOutput (/home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-node/dist/nodejs_kernel_backend.js:186:73)
    at NodeJSKernelBackend.gather (/home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-node/dist/nodejs_kernel_backend.js:965:21)
    at environment_1.ENV.engine.runKernel.$x (/home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-core/dist/ops/segment_ops.js:56:84)
    at /home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-core/dist/engine.js:129:26
    at Engine.scopedRun (/home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-core/dist/engine.js:101:23)
    at Engine.runKernel (/home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-core/dist/engine.js:127:14)
    at gather_ (/home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-core/dist/ops/segment_ops.js:56:38)
    at Object.gather (/home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-core/dist/ops/operation.js:23:29)
    at /home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-layers/dist/backend/tfjs_backend.js:275:20

我猜 'tfjs-npy' 产生的格式不匹配。但我不知道。另一个可接受的解决方案是让张量创建过程在 GPU 训练时在多个线程上运行（c++ 后端优化），以将空闲时间减少到最低限度。但我不知道这是否可能。现在创建进程只在node.js进程中单线程运行，性能很弱。

【问题讨论】：

可以显示你的代码吗？
这有帮助吗？
在你上面列出的步骤中，为什么要重复步骤1？为什么不能从 HDD 读取一次？
因为它不适合 ram/（node.js 不允许太大的数组），所以我必须逐步读取完整的数据集。但我宁愿阅读完全准备好的张量。我认为计算出的张量需要像普通数组大小的 4x-5x 倍。但是阅读比计算要快。
如果在张量创建过程中使用多线程也将有所帮助，即使在 gpu 进行训练时，也可以最大限度地减少 GPU 空闲状态。

标签： node.js performance pipeline tensorflow.js

【解决方案1】：

nodejs 使用的内存可以使用标志--max-old-space-size 增加，如here 所示。 nodejs 和 tensorflow.js 都没有这方面的问题。唯一的问题可能是你的记忆容量。这可能是来回读取数据的唯一原因。

话虽如此，目前还不清楚它在做什么：

 await tf.tidy(() => {
                    tf.tensor([xs, ys]);
                    console.log('numTensors (inside tidy): ' + tf.memory().numTensors);
                });

没用是因为：

张量被创建和处理掉。
xs 和 ys 不是类似数组的 tf.tensor([xs, ys]) 将创建 2 个 NaN 值的张量。它对代码的性能没有任何影响。

张量xs和ys分别被xs.dispose()和ys.dispose()有效地处理掉

【讨论】：

没有 tf.tidy 我的 16 GB 内存在几个循环后就满了，因为没有它 tf.memory().numTensors 越来越大。张量的创建过程真的很慢，这样我就无法充分发挥 GPU 的潜力。（它只是具有 6 GB VRAM 的 1060GTX，甚至不是 Titan ；）在 tfs-node 后端使用多线程会缩短 GPU 空闲时间。但是（非 gpu / c++）后端不运行，当我使用 gpu 后端。只有实习生 javascript/node.js（只有 1 个线程！！！）用于创建张量。
@user3776738, tf.tensor([xs, ys]); 已创建但未使用。您可以考虑删除tf.tidy 中的所有代码。将数据加载到内存和 gpu 并不是免费的。数据越多，需要的时间就越长。但是一旦加载完毕，您就可以使用所使用的后端为所欲为。
@user3776738 你的张量有多大，加载需要多长时间？
事实证明我只是超级愚蠢：我所要做的只是将张量一个接一个地创建成一个数组，然后用它来训练它。我只是有一些内存问题在过去，但也许我只是在读取并连接所有内容，这杀死了我的 ram。这就是为什么我不考虑这样做，因为我认为张量比输入数组大 8 倍甚至更大.
即使是最优秀的人也会遇到这种情况。快乐编码:)