【发布时间】:2020-07-30 06:02:19
【问题描述】:
我想为一项任务准备一长串数据。 我已经能够将在单个实例上完成任务的代码放在一起,但现在我想让它通过一个列表运行。以下是我尝试过的。
用于测试的单个实例.....
sentences = ['if the stimulus bill had become hamstrung by a filibuster threat or recalcitrant conservadems']
antecedents = ['bill had become hamstrung by']
实际用例是 pandas 数据框中的两列,我已将其转换为列表
f = tra_df['sentence'].tolist()
b = tra_df['antecedent'].tolist()
单个用例的代码....
results =[]
ous = 1
ayx = ' '.join([str(elem) for elem in antecedents])
ayxx = ayx.split(" ")
antlabels = []
for i in range(len(ayxx)):
antlabels.append(ous)
lab = ' '.join([str(elem) for elem in antlabels])
# Build the regex string required
rx = '({})'.format('|'.join(re.escape(el) for el in antecedents))
# Generator to yield replaced sentences
it = (re.sub(rx, lab, sentence) for sentence in sentences)
# Build list of paired new sentences and old to filter out where not the same
results = ([new_sentence for old_sentence, new_sentence in zip(sentences, it) if old_sentence != new_sentence])
# replace other non 1 values with 0
nw_results = ' '.join([str(elem) for elem in results])
ew_results= nw_results.split(" ")
new_results = ['0' if i is not '1' else i for i in ew_results]
labels =([int(e) for e in new_results])
labels
这就是我得到的结果
[0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
对大列表稍作修改的代码
for sentences, antecedents in zip(f, b):
gobels = []
#def format_labels(antecedents,sentences):
results =[]
#lab =[]
ous = 1
ayx = ' '.join([str(elem) for elem in antecedents])
ayxx = ayx.split(" ")
antlabels = []
for i in range(len(ayxx)):
antlabels.append(ous)
lab = ' '.join([str(elem) for elem in antlabels])
# Build the regex string required
rx = '({})'.format('|'.join(re.escape(el)for el in antecedents))
# Generator to yield replaced sentences
it = (re.sub(rx, lab, sentence)for sentence in sentences)
# Build list of paired new sentences and old to filter out where not the same
results = ([new_sentence for old_sentence, new_sentence in zip(sentences, it) if old_sentence != new_sentence])
nw_results = ' '.join([str(elem) for elem in results])
ew_results= nw_results.split(" ")
new_results = ['0' if i is not '1' else i for i in ew_results]
labels =([int(e) for e in new_results])
t2 = time.time()
gobels.append(labels)
现在,我得到的不是包含 0 和 1 的字符串列表,而是只有 1 的长列表.....
可能出了什么问题?
[[1,
1,
1,
1,
1,
1,
1,
1,
1,
........]
【问题讨论】:
-
ous来自antlabels.append(ous)是什么? -
ous 是一个占位符变量,我希望拥有与要替换的单词一样多的 ous (1)。
-
没错,它应该是 antlabels.append(bag) 但它没有抛出错误,因为我已经在前一行声明了它的 ia 值
标签: python arrays regex pandas list