【问题标题】:How to filter string using regex?如何使用正则表达式过滤字符串?
【发布时间】:2021-12-10 10:04:13
【问题描述】:

我有一个必须在 python 中过滤的字符串列表。

list=["पत्ता स नं Himanshu अष्टविनायक Address: sr no94/1B/1/2/3",
       "चाळ, जय foo boo, बस स्टोप जवळ, ashatvinayak chal, jay bhavani",
       "पिंपळे गुरव, पुणे, महाराष्ट्र, 411027 nagar, near bus stop, Pimple",
       "Gurav, Pune, Maharashtra,",
       "411027",
       "www"]

我想要欲望输出

list=["Address: sr no94/1B/1/2/3",
      "ashatvinayak chal, jay bhavani",
      "411027 nagar, near bus stop, Pimple",
      "Gurav, Pune, Maharashtra,"
      "411027",
      "www"]

我的代码

regex = re.compile("[^a-zA-Z0-9!@#$&()\\-`.+,/\"]+")
for i in list:
   print(" ".join(regex.sub(' ', i).split()))

我的输出

Himanshu Address sr no94/1B/1/2/3
, foo boo, , ashatvinayak chal, jay bhavani
, , , 411027 nagar, near bus stop, Pimple
Gurav, Pune, Maharashtra,
411027
www

如果 Himansu 出现在非英文字符之间(例如:पत्ता स नं Himanshu अष्टविनायक),我想删除它。

【问题讨论】:

  • regex.sub(' ', i) 不会编译,它需要输入字符串参数。
  • 你的代码出错了。
  • 对不起,我忘了写正则表达式
  • 您能否确认您只是想删除直到最后一个非英语(非 ASCII)字母的所有文本。这将使解决方案更简单。

标签: python regex


【解决方案1】:

试试这个代码:

import re
list = ["पत्ता स नं Himanshu अष्टविनायक Address: sr no94/1B/1/2/3",
        "चाळ, जय foo boo, बस स्टोप जवळ, ashatvinayak chal, jay bhavani",
        "पिंपळे गुरव, पुणे, महाराष्ट्र, 411027 nagar, near bus stop, Pimple",
        "पिं Gurav, Pune, Maharashtra,",
        "411027",
        "www"]
list2 = []
pattern = "[^a-zA-Z0-9!@\s:#$&()\\-`.+,/\"]+[, ]*(?!.*[^a-zA-Z0-9!@\s:#$&()\\-`.+,/\"]+[, ]*)"
for i in list:
    st = re.findall(pattern,i)
    if st:
        list2.append(i[i.index(st[0])+len(st[0]):])
    else:
        list2.append(i)
print(list2)

输出:
['Address: sr no94/1B/1/2/3', 'ashatvinayak chal, jay bhavani', '411027 nagar, near bus stop, Pimple', 'Gurav, Pune, Maharashtra,', '411027', 'www']

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2012-10-13
    • 1970-01-01
    • 2020-09-12
    • 1970-01-01
    • 1970-01-01
    • 2019-10-07
    • 1970-01-01
    • 2017-01-15
    相关资源
    最近更新 更多