【问题标题】:Modifying code to process a long list of strings修改代码以处理一长串字符串
【发布时间】:2020-07-30 06:02:19
【问题描述】:

我想为一项任务准备一长串数据。 我已经能够将在单个实例上完成任务的代码放在一起,但现在我想让它通过一个列表运行。以下是我尝试过的。

用于测试的单个实例.....

sentences = ['if the stimulus bill had become hamstrung by a filibuster threat or recalcitrant conservadems']
antecedents = ['bill had become hamstrung by']

实际用例是 pandas 数据框中的两列,我已将其转换为列表

f = tra_df['sentence'].tolist()
b = tra_df['antecedent'].tolist()

单个用例的代码....

results =[]

ous = 1
ayx = ' '.join([str(elem) for elem in antecedents])
ayxx = ayx.split(" ")
antlabels = []    
for i in range(len(ayxx)):

    antlabels.append(ous)
    lab = ' '.join([str(elem) for elem in antlabels])



     # Build the regex string required
rx = '({})'.format('|'.join(re.escape(el) for el in antecedents))
     # Generator to yield replaced sentences
it = (re.sub(rx, lab, sentence) for sentence in sentences)
     # Build list of paired new sentences and old to filter out where not the same
results = ([new_sentence for old_sentence, new_sentence in zip(sentences, it) if old_sentence != new_sentence])

# replace other non 1 values with 0
nw_results = ' '.join([str(elem) for elem in results])
ew_results= nw_results.split(" ")
new_results = ['0' if i is not '1' else i for i in ew_results]
labels =([int(e) for e in new_results]) 

labels

这就是我得到的结果

[0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]

对大列表稍作修改的代码

for sentences, antecedents in zip(f, b):
    gobels = []
    #def format_labels(antecedents,sentences):
    results =[]
    #lab =[]
    ous = 1
    ayx = ' '.join([str(elem) for elem in antecedents])
    ayxx = ayx.split(" ")
    antlabels = []    
    for i in range(len(ayxx)):
        antlabels.append(ous)
        lab = ' '.join([str(elem) for elem in antlabels])



     # Build the regex string required
    rx = '({})'.format('|'.join(re.escape(el)for el in antecedents))
     # Generator to yield replaced sentences
    it = (re.sub(rx, lab, sentence)for sentence in sentences)
     # Build list of paired new sentences and old to filter out where not the same
    results = ([new_sentence for old_sentence, new_sentence in zip(sentences, it) if old_sentence != new_sentence])

    nw_results = ' '.join([str(elem) for elem in results])
    ew_results= nw_results.split(" ")
    new_results = ['0' if i is not '1' else i for i in ew_results]
    labels =([int(e) for e in new_results]) 

    t2 = time.time()
    gobels.append(labels)

现在,我得到的不是包含 0 和 1 的字符串列表,而是只有 1 的长列表.....

可能出了什么问题?

[[1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
 ........]

【问题讨论】:

  • ous 来自antlabels.append(ous) 是什么?
  • ous 是一个占位符变量,我希望拥有与要替换的单词一样多的 ous (1)。
  • 没错,它应该是 antlabels.append(bag) 但它没有抛出错误,因为我已经在前一行声明了它的 ia 值

标签: python arrays regex pandas list


【解决方案1】:

这样的东西可能会更好地扩展。可能还有一种更 Pythonic 的方式来做这件事。

a = '1 2 3 4 5'
b = '3 4 6'

a = a.split()
b = b.split()

for idx, val in enumerate(b):
    try:
        a[a.index(val)] = True
    except ValueError:
        pass

for idx, val in enumerate(a):
    if val is not True:
        a[idx] = False

print([1.0 if i else 0.0 for i in a])
# [0.0, 0.0, 1.0, 1.0, 0.0]

【讨论】:

  • 这很好用,但我想要通过它传递两个字符串列表列表。 a=['1 2 3 4 5', '6 7 8 9 10', '11 12 13 14 15'], b=['.....', '......', '。 ..'],
  • 这个方法虽然看似简单,但是当字符串 a 中出现字符串 b 的其他部分时会失败。
猜你喜欢
  • 2016-11-22
  • 2023-01-17
  • 2012-09-28
  • 2017-06-23
  • 1970-01-01
  • 2016-11-24
  • 2022-10-30
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多