【问题标题】:NodeJS - Streaming directly to Azure Data lakeNodeJS - 直接流式传输到 Azure 数据湖
【发布时间】:2021-05-28 21:22:05
【问题描述】:

我得到了下面的例子,我使用流媒体下载了一个 zip 文件。效果很好。

但我有一个挑战。我需要下载此文件并直接发送到 Azure,而无需在本地保存此下载。有可能吗?

看代码:

const { createWriteStream } = require("fs");
const stream = require("stream");
const { promisify } = require("util");
const pipeline = promisify(stream.pipeline);

const url = "http://....../file.zip";
const fileName = "filedownloaded.zip";

const downloadStream = got.stream(url);
const fileWriterStream = createWriteStream(fileName);

downloadStream.on("downloadProgress", ({ transferred, total, percent }) => {
  const percentage = Math.round(percent * 100);
  console.error(`progress: ${transferred}/${total} (${percentage}%)`);
});

(async () => {
    try {
      await pipeline(downloadStream, fileWriterStream)
      console.log(`File downloaded to ${fileName}`);
    } catch (error) {
      console.error(`Something went wrong. ${error.message}`);
    }
  })();

我应该使用缓冲区来做到这一点吗?我的意思是,我怎样才能将这个文件发送到那里?有人会做这样的事情吗?

这是在 Azure Datalake 上创建容器、文件夹和文件的代码

const http = require('http');
var unzip = require('unzip');
const { DataLakeServiceClient, StorageSharedKeyCredential } = require("@azure/storage-file-datalake");

// Load the .env file if it exists
require("dotenv").config();

const sharedKeyCredential = 
     new StorageSharedKeyCredential(process.env.ACCOUNT_NAME, process.env.ACCOUNT_KEY);
const datalakeServiceClient = new DataLakeServiceClient(
      `https://${process.env.ACCOUNT_NAME}.dfs.core.windows.net`, sharedKeyCredential);

async function CreateFileSystem(fileSystemName) {
  const fileSystemClient = datalakeServiceClient.getFileSystemClient(fileSystemName);
  const createResponse = await fileSystemClient.create(); 
  return {response: createResponse, container: fileSystemClient} 
}

async function CreateDirectory(fileSystemClient, directoryName) {
  const directoryClient = fileSystemClient.getDirectoryClient(directoryName);
  const result = await directoryClient.create();
  return result
}

async function DeleteDirectory(fileSystemClient, directoryName) {
  const directoryClient = fileSystemClient.getDirectoryClient(directoryName); 
  const result = await directoryClient.delete();
  return result
}

async function UploadFile(fileSystemClient, from, fileName ) {
  const fs = require('fs') 
  var content = "";
  fs.readFile('mytestfile.txt', (err, data) => { 
      if (err) throw err;
      content = data.toString();
  }) 
  const fileClient = fileSystemClient.getFileClient("directoryexample2/uploaded-file.txt");
  await fileClient.create();
  await fileClient.append(content, 0, content.length);
  await fileClient.flush(content.length);

}

const main = async () => {
  const fs  =  await CreateFileSystem("filesystemexample2");
  const dir = await CreateDirectory(fs.container, "directoryexample2");
  await UploadFile(fs.container)
}

console.log("Starting ...")
main();

【问题讨论】:

  • 您好,我尝试在 UploadData 上使用此代码,但出现错误。 const fileClient = fileSystemClient.getFileClient("directoryexample2/uploaded-file.txt"); await fileClient.create(); await fileClient.append(content, 0, content.length); await fileClient.flush(content.length); Error: (node:86818) UnhandledPromiseRejectionWarning: RestError: 上传数据不连续或者位置查询参数值不等于上传数据后的文件长度。

标签: javascript node.js azure stream azure-data-lake


【解决方案1】:

因为看起来您只是在读取文件缓冲区的字符串表示并将其提供给您的UploadFile 函数。您可以尝试将远程文件作为文本读取:

const content = await got(url).text();

之后直接调用你的上传文件逻辑

const fileClient = fileSystemClient.getFileClient("directoryexample2/uploaded-file.txt");
await fileClient.create();
await fileClient.append(content, 0, content.length);
await fileClient.flush(content.length);

【讨论】:

  • 我尝试过这样做,但没有成功。我得到以下错误:(node:85967) UnhandledPromiseRejectionWarning: RestError: The uploaded data is not contiguous or the position query parameter value is not equal to the length of the file after appending the uploaded data. 该文件已创建但 0 字节
  • 我正在努力寻找正在发生的事情,如果我发现了什么,我会告诉你
  • @Danilo 如果可能的话,您能否根据我的回答用您尝试过的内容更新原始帖子?
  • 我做到了,发表新评论(我认为我发布后无法修改原帖)
猜你喜欢
  • 1970-01-01
  • 2021-12-30
  • 1970-01-01
  • 1970-01-01
  • 2013-11-14
  • 1970-01-01
  • 2018-05-09
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多