【发布时间】:2016-05-06 18:21:39
【问题描述】:
我正在更改 hdfs 目录结构。 目前如下:
.../customers/customers1/2016-05-16-10/lots_of_files1.csv
.../customers/customers2/2016-05-16-10/lots_of_files2.csv
.../customers/customers3/2016-05-16-10/lots_of_files1.csv
.../customers/customers4/2016-05-16-10/...
.../customers/customers5/2016-05-16-10/...
.../customers/customers6/2016-05-16-10/...
.../customers/customers7/2016-05-16-10/...
我想摆脱客户(1-7):
.../customers/2016-05-16-10/lots_of_files1.csv
.../customers/2016-05-16-10/lots_of_files2.csv
.../customers/2016-05-16-10/lots_of_files1(1).csv
我想使用蛇咬 python hdfs 库,但出现了很多边缘情况: 1. 同一日期可能出现多次。 2. csv的名称可能出现多次,但数据不同,也必须移动。
您如何以最简洁的方式实现它?
【问题讨论】:
标签: python hadoop hdfs snakebite