【发布时间】:2019-02-03 19:25:48
【问题描述】:
我想拆分/分隔 csv 列范围内给定的值,为范围内的每个数字添加新数据,同时保持所有其他列的数据匹配。
重要的是我能够为 (xy) 范围内的任何数字维护其他列 (Job ID) 中的数据,因此写入的结果 csv 显然会比原来的要长得多.
我希望我的输出 csv 代表 26-29、66-67 等范围内每个数字的单独列。所以我想要一个输出 csv 文件,例如:
Job ID 21879 表示 4 次,分别代表 26、27、28 和 29。
我想在为我的脚本编写以下步骤之前先执行此操作,但现在卡住了。
脚本的其余部分用 (/) 分割日期值,将它们分配给新行并将它们与页码字段连接起来。这是我希望为显示范围内的数字拆分的页码字段。
此脚本的结果列表仅输出作业 ID 列中所需的值,以及第二个中的连接日期和页面字段。这部分工作正常,它是我需要将每个数字表示为给定范围的单个数字的最后一个 csv 文件。
感谢您在拆分这些值范围和维护其他数据字段方面的帮助。
我的输入数据的一个子集如下:
Job ID Job summary Link Locality Received Job status Asset Date Page No
21879 Addition Documents Link CBD 15/06/2018 Completed Water 28/06/2018 26-29
21878 Addition Documents Link CBD 28/06/2018 Completed Water
21877 Addition Documents Link CBD 28/06/2018 Completed Water
21876 Addition Documents Link CBD 28/06/2018 Completed Water
21875 Addition Documents Link CBD 28/06/2018 Completed Water
21874 Addition Documents Link CBD 28/06/2018 Completed Water 26/07/2018 42-43
21873 Addition Documents Link CBD 27/06/2018 Completed Water 26/07/2018
21872 Addition Documents Link CBD 27/06/2018 Completed Water 26/07/2018 66-67
21871 Addition Documents Link CBD 27/06/2018 Completed Water 26/07/2018 07-08
21870 Addition Documents Link CBD 27/06/2018 Completed Water 28/06/2018 59
21869 Addition Documents Link CBD 27/06/2018 Completed Water 28/06/2018 58
21868 Addition Documents Link CBD 26/06/2018 Completed Water
21867 Addition Documents Link CBD 26/06/2018 Completed Water
我想要的输出是:
Job ID Job summary Link Locality Received Job status Asset Date Page No
21879 Addition Documents Link CBD 15/06/2018 Completed Water 28/06/2018 26
21879 Addition Documents Link CBD 15/06/2018 Completed Water 28/06/2018 27
21879 Addition Documents Link CBD 15/06/2018 Completed Water 28/06/2018 28
21879 Addition Documents Link CBD 15/06/2018 Completed Water 28/06/2018 29
21878 Addition Documents Link CBD 28/06/2018 Completed Water
21877 Addition Documents Link CBD 28/06/2018 Completed Water
21876 Addition Documents Link CBD 28/06/2018 Completed Water
21875 Addition Documents Link CBD 28/06/2018 Completed Water
21874 Addition Documents Link CBD 28/06/2018 Completed Water 26/07/2018 42
21874 Addition Documents Link CBD 28/06/2018 Completed Water 26/07/2018 43
21873 Addition Documents Link CBD 27/06/2018 Completed Water 26/07/2018
21872 Addition Documents Link CBD 27/06/2018 Completed Water 26/07/2018 66
21872 Addition Documents Link CBD 27/06/2018 Completed Water 26/07/2018 67
21871 Addition Documents Link CBD 27/06/2018 Completed Water 26/07/2018 07
21871 Addition Documents Link CBD 27/06/2018 Completed Water 26/07/2018 08
21870 Addition Documents Link CBD 27/06/2018 Completed Water 28/06/2018 59
21869 Addition Documents Link CBD 27/06/2018 Completed Water 28/06/2018 58
21868 Addition Documents Link CBD 26/06/2018 Completed Water
21867 Addition Documents Link CBD 26/06/2018 Completed Water
当前脚本是:
import os
import csv
with open('CSV_File.csv','r') as csvinput:
with open('temp__spreadsheet_cache_1.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
if row[7] == "Date":
writer.writerow(row+["day"])
else:
writer.writerow(row+row[4].split('/'))
with open('temp__spreadsheet_cache_1.csv','r') as csvinput:
with open('temp__spreadsheet_cache_2.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
if row[7] == "Date":
writer.writerow(row+["month"])
else:
writer.writerow(row+row[4].split('/'))
with open('temp__spreadsheet_cache_2.csv','r') as csvinput:
with open('temp__spreadsheet_cache_3.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
if row[7] == "Date":
writer.writerow(row+["year"])
else:
writer.writerow(row+row[4].split('/'))
with open('temp__spreadsheet_cache_3.csv','r') as csvinput:
with open('temp__spreadsheet_cache_4.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
if row[7] == "Date":
writer.writerow(row+["Concatenation"])
else:
writer.writerow(row+row[4].split('/'))
#---Using Current output (temp__spreadsheet_cache_4.csv) to create new list--
blank =[]
with open (r'temp__spreadsheet_cache_4.csv', 'r') as NEW_CSV:
csvReader = csv.reader(NEW_CSV, delimiter=',', quotechar='"')
header = csvReader.next()
JobIndex = header.index("Job ID")
PageIndex = header.index("Page No")
DayIndex = header.index("day")
MonthIndex = header.index("month")
YearIndex = header.index("year")
Summary = header.index("Job summary")
StatusIndex = header.index("Job status")
class_1 = header.index("Asset")
for row in csvReader:
Page = row[PageIndex]
Day = row[DayIndex]
Month = row[MonthIndex]
Year = row[YearIndex]
JobID = row[JobIndex]
To_be_overridden_concat = row[PageIndex]
Type = row[Summary]
Status = row[StatusIndex]
waterclass = row[class_1]
if waterclass == 'Water'
blank.append([JobID,Day,Month,Year,Page,To_be_overridden_concat])
str(blank)
for column in blank:
column[1] = column[1].lstrip('0')
column[2] = column[2].lstrip('0')
column[3] = column[3].lstrip('0')
column[4] = column[4].lstrip('0')
for column in blank:
column[0] = column[0].lstrip()
column[1] = column[1].lstrip()
column[2] = column[2].lstrip()
column[3] = column[3].lstrip()
column[4] = column[4].lstrip()
for column in blank:
column[0] = column[0].rstrip()
column[1] = column[1].rstrip()
column[2] = column[2].rstrip()
column[3] = column[3].rstrip()
column[4] = column[4].rstrip()
column[5] = column[1]+column[2]+column[3]+column[4]
##os.remove("temp__spreadsheet_cache_4.csv")
os.remove("temp__spreadsheet_cache_3.csv")
os.remove("temp__spreadsheet_cache_2.csv")
os.remove("temp__spreadsheet_cache_1.csv")
for row in blank:
del row[1:5]
print blank[0:10]
【问题讨论】:
-
您能分享一下您输入的真实内容/结构吗?您在没有指定分隔符的情况下创建 csv 阅读器这一事实在某种程度上表明您的输入使用逗号作为分隔符,而不是上面示例所建议的空白字符。具体来说,我想知道当没有给出日期和页码时是否有空白单元格,即该行有两个尾随逗号。
-
此外,除了重复的行之外,您所需的示例输出在结构上看起来与您的输入相同,但您的代码会在添加一些新列时忽略许多原始列。那么,两者中的哪一个是您想要的输出?还有... :)
if waterclass == 'Water'行最后缺少一个冒号,这让我想到了一个问题:只有在您的输入中Asset是Water时,您才希望这样做吗? -
抱歉,我是论坛的新手。显然必须发布一定次数来添加数据的图像/屏幕截图,但分隔符在第 38 行。如果不添加屏幕截图,我有点不确定该怎么做。因此尝试仅针对 csv 文件中的可视数据表示进行调整。 excel 数据刚刚被复制到此处的 .py 文件中。空白单元格没问题,并且没有这些问题。我坚持的部分实际上是在提供的代码之前,只是试图通过添加到目前为止所做的事情来提供上下文。正确的!复制错误,应该是....'水':提前谢谢!
-
如果提供的代码与您的问题没有直接关系,最好不要发布。它只会让人们感到困惑,因为它让我感到困惑。您应该只包括理解和回答您的问题所必需的内容。您是否尝试过从示例输入到示例输出?
-
直接复制/粘贴示例数据(来自文本编辑器)比尝试为问题格式化更好。使用 edit 按钮进行任何更改以改进您的问题。