【问题标题】:Filter out descriptions that aren't in the component list过滤掉不在组件列表中的描述
【发布时间】:2019-08-15 23:19:22
【问题描述】:

我有 2 个列表,一个有很多组件,另一个有组件及其描述。我需要找到一种方法来过滤掉所有无用的信息,同时保持描述列表的顺序与组件列表的顺序相同。

我尝试使用列表理解,但这并没有给我预期的结果。

lst = [] 
for i in range (len(components)):
   lst.append([x for x in description if components[i] in x])

这是 2 个变量的简短版本;

components = ['INVALID' , 'R100' , 'R101' , 'C100' , 'R100' , 'R100']
description = [
'  30_F "30_F";',
'  POWER_IN1 Supply   2 At     5 Volts, 0.8 Amps;',
'  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
'  R101          100     5     5 f PN"66151002538" "CH-WID_ 100R -5-RR 0603 (B)";',
'  C100          100n    10    10 f PN"10210616" "CFCAP X7R S 100nF 50V (T)";',
'  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
'  R100       CLOSED PN"10057609" "RES S 5mOhm 1% 2512_H6_1 (T)"      VERSION 12046547;']

我期望的输出是;

'  INVALID    No description'
'  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";'
'  R101       100     5     5 f PN"66151002538" "CH-WID_ 100R -5-RR 0603 (B)";'
'  C100       100n    10    10 f PN"10210616" "CFCAP X7R S 100nF 50V (T)";'
'  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";'
'  R100       CLOSED PN"10057609" "RES S 5mOhm 1% 2512_H6_1 (T)"      VERSION 12046547;

【问题讨论】:

  • 添加实际结果会有所帮助
  • 你如何确定最后一个R100 条目应该打印R100 CLOSED 的描述以及R100 OPEN 的其他描述?

标签: python list filter


【解决方案1】:

使用str.startswith 函数,辅助seen position 序列和Python 的for/else 功能:

import pprint

...  # your input data variables

seen_pos = []
res = []
for comp in components:
    for i, desc in enumerate(description):
        if i not in seen_pos and desc.strip().startswith(comp):
            seen_pos.append(i)
            res.append('{:<10}{}'.format(comp, desc.strip().replace(comp, '', 1).strip()))
            break
    else:
        res.append('{:<10}{}'.format(comp, 'No description'))

pprint.pprint(res, width=100)

输出:

['INVALID   No description',
 'R100      OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
 'R101      100     5     5 f PN"66151002538" "CH-WID_ 100R -5-RR 0603 (B)";',
 'C100      100n    10    10 f PN"10210616" "CFCAP X7R S 100nF 50V (T)";',
 'R100      OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
 'R100      CLOSED PN"10057609" "RES S 5mOhm 1% 2512_H6_1 (T)"      VERSION 12046547;']

【讨论】:

  • 这适用于我提供的小示例,但是当我尝试将其合并到我的代码中时它不起作用。输出与描述列表完全相同。这可能是由于组件没有描述吗?如果是这样,这将如何解决?
  • @mark7378,首先修复/详细说明您的问题以涵盖实际情况,以免人们陷入猜测。添加实际样本和预期结果
  • 对不起,实际数据要大得多,这就是我缩短它的原因。我添加了一个附加组件“无效”并更新了预期结果。
【解决方案2】:
[x for x in description if x.split()[0] in components]

【讨论】:

    【解决方案3】:

    使用re 的一种解决方案。它将保持components 列表中定义的顺序:

    components = ['R100' , 'R101' , 'C100' , 'R100' , 'R100']
    description = [
    '  30_F "30_F";',
    '  POWER_IN1 Supply   2 At     5 Volts, 0.8 Amps;',
    '  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
    '  R101          100     5     5 f PN"66151002538" "CH-WID_ 100R -5-RR 0603 (B)";',
    '  C100          100n    10    10 f PN"10210616" "CFCAP X7R S 100nF 50V (T)";',
    '  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
    '  R100       CLOSED PN"10057609" "RES S 5mOhm 1% 2512_H6_1 (T)"      VERSION 12046547;']
    
    import re
    
    c = iter(components)
    
    filtered = []
    current = next(c)
    for line in description:
        if current and re.findall(r'^\s*{}\s*'.format(re.escape(current)), line):
            filtered.append(line)
            current = next(c, None)
    
    from pprint import pprint
    pprint(filtered, width=150)
    

    打印:

    ['  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
     '  R101          100     5     5 f PN"66151002538" "CH-WID_ 100R -5-RR 0603 (B)";',
     '  C100          100n    10    10 f PN"10210616" "CFCAP X7R S 100nF 50V (T)";',
     '  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
     '  R100       CLOSED PN"10057609" "RES S 5mOhm 1% 2512_H6_1 (T)"      VERSION 12046547;']
    

    【讨论】:

      【解决方案4】:

      只需使用简单的列表推导和基本过滤

      >>> res = [d for d in description if d.strip().split(' ', 1)[0] in components]
      >>> pprint(res)
      ['  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
       '  R101          100     5     5 f PN"66151002538" "CH-WID_ 100R -5-RR 0603 (B)";',
       '  C100          100n    10    10 f PN"10210616" "CFCAP X7R S 100nF 50V (T)";',
       '  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
       '  R100       CLOSED PN"10057609" "RES S 5mOhm 1% 2512_H6_1 (T)"      VERSION 12046547;']
      

      【讨论】:

        【解决方案5】:

        更新 OP 改变了问题。检查'INVALID' 增加了一个额外的复杂层,这个答案没有涵盖。


        遍历description 中的字符串,如果components 中有任何一个,则将它们添加到列表中。

        comp_set = set(components)
        filtered = [d for d in description if any(c in d for c in comp_set)]
        
        for x in filtered:
            print(x)
        

        输出:

          R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";
          R101          100     5     5 f PN"66151002538" "CH-WID_ 100R -5-RR 0603 (B)";
          C100          100n    10    10 f PN"10210616" "CFCAP X7R S 100nF 50V (T)";
          R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";
          R100       CLOSED PN"10057609" "RES S 5mOhm 1% 2512_H6_1 (T)"      VERSION 12046547;
        

        【讨论】:

          猜你喜欢
          • 2011-06-07
          • 1970-01-01
          • 1970-01-01
          • 2021-07-05
          • 1970-01-01
          • 1970-01-01
          • 2018-01-16
          • 1970-01-01
          • 2015-12-23
          相关资源
          最近更新 更多