【发布时间】:2017-01-11 22:28:13
【问题描述】:
我真的被困住了。我的任务是过滤 5000 条记录 CSV 的日期以找到特定的日期范围,按升序对其进行排序,然后获取创建句子的不同列的字段。我已经能够成功地对日期进行排序并对其进行排序,但我现在的问题是我不知道如何获取与该行相对应的单词。代码如下:
#/usr/bin/python3
import csv
import time
def finder():
with open('sample_data.csv', encoding="utf8") as csvfile:
reader = csv.DictReader(csvfile)
r = [] # This will hold our ID numbers for rows
c = [] # This will hold our initial dates that are filtered out from the main csv
l = [] # This will hold our sorted dates from c
w = [] # This will hold our words
sentence = '' #This will be our sentence
# Filter out created_at dates we don't care about
def filterDates():
for row in reader:
createdOn = float(row['created_at'])
d = time.strftime('%Y-%m-%d', time.localtime(createdOn)) # Converts dates
if d < '2014-06-22':
pass
else:
c.append(d)
filterDates()
def sort(c):
for i in c:
if i > '2014-06-22' and i < '2014-07-22':
l.append(i)
l.sort(reverse=False)
else:
pass
sort(c)
def findWords(l):
for row in reader:
words = row['word']
for x in range(l):
print(words[0])
findWords(l)
finder()
我知道这段代码可能很草率而且到处都是。我认为这是对工作的挑战,并认为我可以轻松完成,但显然我的 Python 还不够好。我以前没有使用过 Python CSV。我会马上说我不再打算申请这份工作,但如果我想不通,这会让我发疯。我已经花了几个小时尝试不同的事情,我的问题在于如何获取具有正确日期的行并获取单词。
感谢所有建议和帮助!为了我自己的理智,我需要弄清楚这一点。
谢谢, RDD
数据样本:
id created_at first_name last_name email gender company currency word drug_brand drug_name drug_company pill_color frequency token keywords
1 1309380645 Stephanie Franklin sfranklin0@sakura.ne.jp Female Latz IDR transitional SUNLEYA Age minimizing sun care AVOBENZONE, OCTINOXATE, OCTISALATE, OXYBENZONE C.F.E.B. Sisley Maroon Yearly ______T______h__e________ _______N__e__z_____p______e_____________d______i______a_____n__ _____h__i__v__e___-_____m___i____n__d__ _____________f ________c_______h__a__________s_.__ _Z________a_____l_____g________o__._ est risus auctor sed tristique in
2 1237178109 Michelle Fowler mfowler1@oracle.com Female Skipstorm EUR flexibility Medulla Arnica Medulla Arnica Uriel Pharmacy Inc. Yellow Once _____ morbi vestibulum velit id
3 1303585711 Betty Barnes bbarnes2@howstuffworks.com Female Skibox IDR workforce Rash Relief Zinc Oxide Dimethicone Touchless Care Concepts LLC Purple Monthly ___ ac est lacinia
4 1231175716 Jerry Rogers jrogers3@canalblog.com Male Cogibox IDR content-based up and up acid controller complete Famotidine, Calcium Carbonate, Magnesium Hydroxide Target Corporation Maroon Daily NIL augue a suscipit nulla elit
5 1236709011 Harry Garrett hgarrett4@mlb.com Male Yotz RUB coherent Vistaril HYDROXYZINE PAMOATE Pfizer Laboratories Div Pfizer Inc Orange Never �_nb_l_ _u___ __olop __ __oq_l _n _unp_p__u_ _od___ po_sn__ op p_s '__l_ _u__s_d_p_ _n_____suo_ '____ __s _olop _nsd_ ___o_ morbi ut odio cras
6 1400030214 Lori Martin lmartin5@apache.org Female Aivee EUR software Fluorouracil Fluorouracil Taro Pharmaceutical Industries Ltd. Pink Daily _ dui vel sem
7 1368791435 Joe Turner jturner6@elpais.com Male Mycat IRR tangible Sulfacetamide Sodium Sulfacetamide Sodium Paddock Laboratories, LLC Aquamarine Often 1;DROP TABLE users nulla facilisi cras non velit
8 1394919241 Ruth Bryant rbryant7@dell.com Female Browsecat IDR incremental Pollens - Trees, Mesquite, Prosopis juliflora Mesquite, Prosopis juliflora Jubilant HollisterStier LLC Aquamarine Weekly ___________ et magnis dis
9 1352948920 Cynthia Lopez clopez8@gov.uk Female Twitterbeat USD Up-sized Ideal Flawless Octinoxate, Titanium Dioxide Avon Products, Inc Red Daily (_�_�___ ___) purus eu magna
10 1319910259 Phillip Ross pross9@ehow.com Male Buzzshare VEF data-warehouse Serotonin Serotonin BioActive Nutritional Orange Weekly __ vel sem
好的,所以在 Westley White 的大力帮助下进行了一些调整后,我能够让这个功能正常运行!我把它压缩成一个嵌套函数,它正在做它应该做的事情!代码如下:
#/usr/bin/python3
import csv
import time
def finder():
with open('sample_data.csv', 'r', encoding='latin-1') as csvfile:
reader = csv.DictReader(csvfile)
def dates(reader):
# Set up variables
date_range = []
sentence = []
# Initiate iteration through CSV
for row in reader:
createdOn = float(row['created_at'])
words = str(row['word'])
d = time.strftime('%Y-%m-%d', time.localtime(createdOn)) # Converts dates
if d >= '2014-06-22' and d <= '2014-07-22':
date_range.append(d)
date_range.sort()
for word in words:
if d in date_range:
sentence.append(word)
print(sentence)
dates(reader)
finder()
只剩下一个问题了。当sentence[] 追加时,它一次追加一个字符。我不知道如何在不将它们全部组合在一起的情况下将字母组合到 CSV 列中的单词中。有什么想法吗?
谢谢!
【问题讨论】:
-
这里的嵌套函数定义是什么?你不需要闭包...
-
有没有办法附加文件?如果可能的话,我会附上整个 CSV。这只是样本数据。另外,你是什么意思不需要关闭?对不起,我的 Python 不是很好。休息后才重新开始。
-
老实说,您只需要标题和前几行即可。关于闭包,我问你为什么在函数内部定义函数,然后在该函数内部简单地调用它们......这没有任何意义。您将无法重用这些功能。
-
添加了数据样本。那么我是否应该在 main 函数中定义函数,然后在 finder 函数之外调用 finder() 之前调用它们?
-
什么是“主要功能”?您应该定义对您有用的函数。在
finder中调用它们似乎是合理的。在finder中定义它们不会。
标签: python python-3.x