【问题标题】:How to write a pickle file to S3, as a result of a luigi Task?由于 luigi 任务,如何将泡菜文件写入 S3?
【发布时间】:2018-06-28 11:37:51
【问题描述】:

作为 luigi 任务的结果,我想在 S3 上存储一个泡菜文件。下面是定义任务的类:

class CreateItemVocabulariesTask(luigi.Task):
    def __init__(self):
        self.client = S3Client(AwsConfig().aws_access_key_id,
                               AwsConfig().aws_secret_access_key)
        super().__init__()

    def requires(self):
        return [GetItem2VecDataTask()]

    def run(self):
        filename = 'item2vec_results.tsv'
        data = self.client.get('s3://{}/item2vec_results.tsv'.format(AwsConfig().item2vec_path),
                               filename)
        df = pd.read_csv(filename, sep='\t', encoding='latin1')
        unique_users = df['CustomerId'].unique()
        unique_items = df['ProductNumber'].unique()
        item_to_int, int_to_item = utils.create_lookup_tables(unique_items)
        user_to_int, int_to_user = utils.create_lookup_tables(unique_users)

        with self.output()[0].open('wb') as out_file:
            pickle.dump(item_to_int, out_file)
        with self.output()[1].open('wb') as out_file:
            pickle.dump(int_to_item, out_file)
        with self.output()[2].open('wb') as out_file:
            pickle.dump(user_to_int, out_file)
        with self.output()[3].open('wb') as out_file:
            pickle.dump(int_to_user, out_file)

    def output(self):
        files = [S3Target('s3://{}/item2int.pkl'.format(AwsConfig().item2vec_path), client=self.client),
                 S3Target('s3://{}/int2item.pkl'.format(AwsConfig().item2vec_path), client=self.client),
                 S3Target('s3://{}/user2int.pkl'.format(AwsConfig().item2vec_path), client=self.client),
                 S3Target('s3://{}/int2user.pkl'.format(AwsConfig().item2vec_path), client=self.client),]
        return files

当我运行此任务时,我收到错误 ValueError: Unsupported open mode 'wb'。我尝试转储到 pickle 文件中的项目只是 python 字典。

完整的追溯:

Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\luigi\worker.py", line 203, in run
    new_deps = self._run_get_new_deps()
  File "C:\Anaconda3\lib\site-packages\luigi\worker.py", line 140, in _run_get_new_deps
    task_gen = self.task.run()
  File "C:\Users\user\Documents\python workspace\pipeline.py", line 60, in run
    with self.output()[0].open('wb') as out_file:
  File "C:\Anaconda3\lib\site-packages\luigi\contrib\s3.py", line 714, in open
    raise ValueError("Unsupported open mode '%s'" % mode)
ValueError: Unsupported open mode 'wb'

【问题讨论】:

    标签: amazon-s3 luigi


    【解决方案1】:

    这是一个仅在 python 3.x 上发生的问题,如 here 所述。为了使用 python 3 并编写二进制文件或目标(即使用'wb'模式),只需将 S3Target 的格式参数设置为Nop。像这样:

    S3Target('s3://path/to/file', client=self.client, format=luigi.format.Nop)

    请注意,这只是一个技巧,并不那么直观,也没有记录。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2017-11-09
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多