【问题标题】:REGEX reformatting正则表达式重新格式化
【发布时间】:2014-09-25 02:11:23
【问题描述】:

我正在尝试重新格式化 json 文件并删除文件的大部分内容。这是原始的json文件。

       "2597401":[{"jobID":"2597401",
                 "account":"TG-CCR120014",
                 "user":"charngda",
                 "pkgT":{"pgi/7.2-  5":{"libA":["libpgc.so"],
                 "flavor":["default"]}},          
                 "startEpoch":"1338497979",
                 "runTime":"1022",
                 "execType":"user:binary",              
                 "exec":"ft.D.64",
                 "numNodes":"4",
                 "sha1":"5a79879235aa31b6a46e73b43879428e2a175db5",
                 "execEpoch":1336766742,
                 "execModify":"Fri May 11 15:05:42 2012",
                 "startTime":"Thu May 31 15:59:39 2012",
                 "numCores":"64",
                 "sizeT":{"bss":"1881400168","text":"239574","data":"22504"}},  
                 {"jobID":"2597401",
                 "account":"TG-CCR120014",
                 "user":"charngda",
                 "pkgT":{"pgi/7.2-5":{"libA":["libpgc.so"],
                 "flavor":["default"]}},
                 "startEpoch":"1338497946",
                 "runTime":"33"  "execType":"user:binary",
                 "exec":"cg.C.64",
                 "numNodes":"4",
                 "sha1":"caf415e011e28b7e4e5b050fb61cbf71a62a9789",
                 "execEpoch":1336766735,
                "execModify":"Fri May 11 15:05:35 2012",
                "startTime":"Thu May 31 15:59:06 2012",
                "numCores":"64",
                "sizeT":{"bss":"29630984","text":"225749","data":"20360"}},
                {"jobID":"2597401",
                "account":"TG-CCR120014",
                "user":"charngda",
                "pkgT":{"pgi/7.2-5":  {"libA":["libpgc.so"],
                "flavor":["default"]}},
                "startEpoch":"1338500447",
                "runTime":"145",
                "execType":"user:binary",
                "exec":"mg.D.64",
                "numNodes":"4",
                "sha1":"173de32e1514ad097b1c051ec49c4eb240f2001f",
                "execEpoch":1336766756,
                "execModify":"Fri May 11 15:05:56 2012",
                "startTime":"Thu May 31 16:40:47 2012",
                "numCores":"64",
                "sizeT":{"bss":"456954120","text":"426186","data":"22184"}},{"jobID":"2597401",
                "account":"TG-CCR120014",
                "user":"charngda",
                "pkgT":{"pgi/7.2-5":{"libA":["libpgc.so"],
                "flavor":["default"]}},
                "startEpoch":"1338499002",
                "runTime":"1444",
                "execType":"user:binary",
                "exec":"lu.D.64",
                "numNodes":"4",
                "sha1":"c6dc16d25c2f23d2a3321d4feed16ab7e10c2cc1",
                "execEpoch":1336766748,
                "execModify":"Fri May 11 15:05:48 2012",
                "startTime":"Thu May 31 16:16:42 2012",
                "numCores":"64",
                "sizeT":{"bss":"199850984","text":"474218","data":"27064"}}],

对于每个 JobId,我只想保留“exec”字段和 JobID。我如何构建一个正则表达式来使其余数据变笨?理想情况下,我想要以下内容: JobID exec1 exec2 exec3
有没有办法做到这一点?

提前致谢。

【问题讨论】:

  • 你的意思是{"2597401": [{"JobID": 2597401, "exec": "ft.D.64"}]}
  • 有点。初始数字是 JobId,所以理想情况下我想要这样的东西。 2597401 ft.D.64 cg,C,64 mg.D.64 lu.d.64 同一个工作有多个 exec,所以我想要 jobID 和 exec。
  • 使用 JSON 库来读取您的 JSON,让您对其进行操作,并将其保存回来。与您的代码不同,该 JSON 库已经编写、测试和调试过了。正则表达式并不是你在碰巧涉及文本的每一个问题上挥舞的魔杖。
  • @amber4478 类似什么?
  • 2597401 ft.D.64 cg,C,64 mg.D.64 lu.d.64

标签: regex json


【解决方案1】:

因为您没有指定您的 RegEx 引擎,所以我假设您使用 作为我的答案。

基于 JSON 格式,您可以使用此 RegEx 匹配不需要的 对以替换为空:

/(,\s*(*SKIP))?+("(?!jobID"|exec)[^"]+"\s*+:\s*+("[^"]*"|{(?2)?+(?>,\s*(?2))*}|\[(?3)?+(?>,\s*(?3))*\]))(?(1)|,?)/g

这是您在应用 RegEx 替换后订购的内容:

       "2597401":[{"jobID":"2597401",
                 "execType":"user:binary",              
                 "exec":"ft.D.64",
                 "execEpoch":1336766742,
                 "execModify":"Fri May 11 15:05:42 2012"},  
                 {"jobID":"2597401"  "execType":"user:binary",
                 "exec":"cg.C.64",
                 "execEpoch":1336766735,
                "execModify":"Fri May 11 15:05:35 2012"},
                {"jobID":"2597401",
                "execType":"user:binary",
                "exec":"mg.D.64",
                "execEpoch":1336766756,
                "execModify":"Fri May 11 15:05:56 2012"},{"jobID":"2597401",
                "execType":"user:binary",
                "exec":"lu.D.64",
                "execEpoch":1336766748,
                "execModify":"Fri May 11 15:05:48 2012"}],

如您所见,生成的字符串在“"jobID":"2597401" "execType":"user:binary"”中的语法无效,这是您的给定数据中的语法错误...

附解释:

/(,\s*(*SKIP))?+
# Attempts to match a comma and whitespace,
# without backtracking;
# And if the comma is matched, use (*SKIP) verb,
# which advances the pointer if we fail to match the comma.

# Key - Value pairs not worthy of keeping.
(
  "(?!jobID"|exec)[^"]+" # Check if we like this key.
  \s*+:\s*+ # The colon, advance whitespaces.
  ( # Check keys recursively.
    "[^"]*"
      # String literals, boring.
    | {(?2)?+(?>,\s*(?2))*}
      # Or: An object storing some key-value pairs
      # we don't care about.
    | \[(?3)?+(?>,\s*(?3))*\]
      # Or: An array storing some values
      # we don't care about.
  )
)
(?(1)|,?)
# Balance the comma (so the result string is still valid JSON)
/gx

这是regex demo

【讨论】:

    猜你喜欢
    • 2010-12-13
    • 2011-01-09
    • 1970-01-01
    • 2023-02-04
    • 1970-01-01
    • 1970-01-01
    • 2013-04-28
    • 2021-01-16
    • 1970-01-01
    相关资源
    最近更新 更多