【问题标题】:Python re.findall between two strings while excluding stringsPython re.findall 在两个字符串之间同时排除字符串
【发布时间】:2017-03-19 16:12:24
【问题描述】:

我当前的代码如下:

idk = {"id":30511879634,"title":"3.5y","option1":"3.5y","option2":null,"option3":null,"sku":"","requires_shipping":true,"taxable":true,"featured_image":{"id":18778730002,"product_id":8876555346,"position":1,"created_at":"2017-02-15T15:51:03-05:00","updated_at":"2017-02-15T15:51:37-05:00","src":"https:\/\/cdn.shopify.com\/s\/files\/1\/1527\/4931\/products\/AJ6_HEIRESS_PRODUCT.jpg?v=1487191897","variant_ids":[30511879634,30511879698,30511879762,30511879826,30511879890,30511879954,30511880018,30511880082]},"available":false,"name":"Air Jordan 6 Retro Premium GG 'Heiress' - 3.5y","public_title":"3.5y","options":["3.5y"],"price":16000,"weight":1361,"compare_at_price":null,"inventory_quantity":0,"inventory_management":"shopify","inventory_policy":"deny","barcode":""},{"id":30511879698,"title":"4y","option1":"4y","option2":null,"option3":null,"sku":"","requires_shipping":true,"taxable":true,"featured_image":{"id":18778730002,"product_id":8876555346,"position":1,"created_at":"2017-02-15T15:51:03-05:00","updated_at":"2017-02-15T15:51:37-05:00","src":"https:\/\/cdn.shopify.com\/s\/files\/1\/1527\/4931\/products\/AJ6_HEIRESS_PRODUCT.jpg?v=1487191897","variant_ids":[30511879634,30511879698,30511879762,30511879826,30511879890,30511879954,30511880018,30511880082]},"available":true,"name":"Air Jordan 6 Retro Premium GG 'Heiress' - 4y","public_title":"4y","options":["4y"],"price":16000,"weight":1361,"compare_at_price":null,"inventory_quantity":1,"inventory_management":"shopify","inventory_policy":"deny","barcode":""},
variants = re.findall(r'"id":(.*?),"title"', idk)

返回['30511879634', '18778730002,"product_id":8876555346,"position":1,"created_at":"2017-02-15T15:51:03-05:00","updated_at":"2017-02-15T15:51:37-05:00","src":"https:\\/\\/cdn.shopify.com\\/s\\/files\\/1\\/1527\\/4931\\/products\\/AJ6_HEIRESS_PRODUCT.jpg?v=1487191897","variant_ids":[30511879634,30511879698,30511879762,30511879826,30511879890,30511879954,30511880018,30511880082]},"available":false,"name":"Air Jordan 6 Retro Premium GG \'Heiress\' - 3.5y","public_title":"3.5y","options":["3.5y"],"price":16000,"weight":1361,"compare_at_price":null,"inventory_quantity":0,"inventory_management":"shopify","inventory_policy":"deny","barcode":""},{"id":30511879698']

但我希望它返回['30511879634', '30511879698']

我知道我可以做到 variants = re.findall(r'"id":[^"product_id"].,"title"', idk) 但那会返回 ['"id":30511879634,"title"', '"id":30511879698,"title"']

我试过variants = re.findall(r'"id":[^"product_id"](.*?),"title"', idk) 但这不起作用。无论如何我可以只返回数字,同时确保第二个 id (18778730002) 不包含在列表中,而只是 30511879634 和 30511879698。

【问题讨论】:

  • 好吧,首先它很难阅读,请只发布相关部分
  • 您不将其视为常规字典的任何特殊原因?好像是标准的 Python 字典?
  • 这似乎是一些json 的响应,为什么要注意使用适当的模块?
  • @ZdaR 我很抱歉...不想留下任何东西以防万一需要解决方案...实际上文本部分要大得多我尝试将其缩短到可以的地方看到一个重复的模式
  • @idjaw 文本部分比我发布的要大得多,我重复使用文本来查找其中的其他模式...考虑到这一点,re.findall 在我看来似乎很合适

标签: python regex python-2.7


【解决方案1】:

你可以使用这个 regex ...

(?<=\"id\":)\d+(?=,\"title\")

regex demo / explanation

python (demo)

import re

idk = """{"id":30511879634,"title":"3.5y","option1":"3.5y","option2":null,"option3":null,"sku":"","requires_shipping":true,"taxable":true,"featured_image":{"id":18778730002,"product_id":8876555346,"position":1,"created_at":"2017-02-15T15:51:03-05:00","updated_at":"2017-02-15T15:51:37-05:00","src":"https:\/\/cdn.shopify.com\/s\/files\/1\/1527\/4931\/products\/AJ6_HEIRESS_PRODUCT.jpg?v=1487191897","variant_ids":[30511879634,30511879698,30511879762,30511879826,30511879890,30511879954,30511880018,30511880082]},"available":false,"name":"Air Jordan 6 Retro Premium GG 'Heiress' - 3.5y","public_title":"3.5y","options":["3.5y"],"price":16000,"weight":1361,"compare_at_price":null,"inventory_quantity":0,"inventory_management":"shopify","inventory_policy":"deny","barcode":""},{"id":30511879698,"title":"4y","option1":"4y","option2":null,"option3":null,"sku":"","requires_shipping":true,"taxable":true,"featured_image":{"id":18778730002,"product_id":8876555346,"position":1,"created_at":"2017-02-15T15:51:03-05:00","updated_at":"2017-02-15T15:51:37-05:00","src":"https:\/\/cdn.shopify.com\/s\/files\/1\/1527\/4931\/products\/AJ6_HEIRESS_PRODUCT.jpg?v=1487191897","variant_ids":[30511879634,30511879698,30511879762,30511879826,30511879890,30511879954,30511880018,30511880082]},"available":true,"name":"Air Jordan 6 Retro Premium GG 'Heiress' - 4y","public_title":"4y","options":["4y"],"price":16000,"weight":1361,"compare_at_price":null,"inventory_quantity":1,"inventory_management":"shopify","inventory_policy":"deny","barcode":""}"""
variants = re.findall(r"(?<=\"id\":)\d+(?=,\"title\")", idk)
print(variants) #-> ['30511879634', '30511879698']

【讨论】:

  • 这看起来很 hacky - 为什么不使用 json 解析器呢?
  • @Jan coz OP 没有要求:P
  • @siam 这就是为什么我喜欢你的回答;)
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2020-06-20
  • 2019-12-17
  • 2016-07-01
  • 2013-03-29
  • 1970-01-01
  • 2017-07-07
  • 2015-06-24
相关资源
最近更新 更多