【问题标题】:Regex to access digits between two underscores正则表达式访问两个下划线之间的数字
【发布时间】:2021-09-14 19:55:16
【问题描述】:

我正在尝试访问两个下划线之间的数字。例如在下面的文本中,

https://http-google-ghh.vault.com__929091__2.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__929090__1.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__1205024__1.0
https://http-google-ghh.vault.com__929090__1.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__1205024__1.0

我只需要获取数字 929091、929092 等。

我试过'_(.*)_',但我也得到了下划线。我只需要号码

【问题讨论】:

  • re.findall(r'_(\d+)_', text)
  • 类似_(\d+)_?匹配下划线之间的一位或多位数字。
  • re.findall(r'(?

标签: python regex pandas


【解决方案1】:

使用

re.findall(r'__([0-9]+)__', s)

regex proof

解释

--------------------------------------------------------------------------------
  __                       '__'
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  __                       '__'

Python code

import re
s = r"""https://http-google-ghh.vault.com__929091__2.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__929090__1.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__1205024__1.0
https://http-google-ghh.vault.com__929090__1.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__1205024__1.0"""
print(re.findall(r'__([0-9]+)__', s))

结果['929091', '929092', '929090', '929092', '1205024', '929090', '929092', '1205024']

【讨论】:

  • 它也打印下划线。我只需要数字。有什么方法可以只获取数字而不是下划线吗?
  • @IttyBit 代码已添加。没有下划线。你怎么看下划线?
  • 非常感谢。我在 Redshift 中运行时看到了下划线。
猜你喜欢
  • 1970-01-01
  • 2020-05-11
  • 2010-09-25
  • 1970-01-01
  • 1970-01-01
  • 2015-04-28
  • 1970-01-01
  • 1970-01-01
  • 2011-07-26
相关资源
最近更新 更多