Your ''.join() expression is filtering, removing anything non-ASCII; you could use a conditional expression instead:

return ''.join([i if ord(i) < 128 else ' ' for i in text])

This handles characters one by one and would still use one space per character replaced.

Your regular expression should just replace consecutive non-ASCII characters with a space:

re.sub(r'[^\x00-\x7F]+',' ', text)
re.sub(r'[^\x00-\x7f]', ' ', str)

Note the + there.

 

检查字符串是否包含非英文ASCII等:

a = "ds  dl,;sd!@)~`09历史s"
regexp = re.compile(r'[^\x00-\x7f]')
if regexp.search(a):
  print('matched')

 

相关文章:

  • 2022-12-23
  • 2022-12-23
  • 2022-02-13
  • 2022-01-07
  • 2022-12-23
  • 2021-06-21
  • 2022-12-23
  • 2022-12-23
猜你喜欢
  • 2022-02-26
  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
  • 2021-09-13
  • 2021-09-29
  • 2022-01-30
相关资源
相似解决方案