【问题标题】:Python - Accessing elements within file containing text stringPython - 访问包含文本字符串的文件中的元素
【发布时间】:2020-08-03 22:55:07
【问题描述】:

我有一个包含以下文本的文件:

{"Referer": "https://dashboard", "Accept-Language": "en-GB","CST": "46e01f86be307fd0509217271e8c8c3cdcb0e661ee12f872a473cdeb26ac060201111", "Sec-Fetch-Site": "same-origin", "Accept-Encoding": "gzip, deflate, br", "Content-Type": "application/json", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Dest": "empty", "Cookie": "savedSR=null; personalisationTags=[]; _ga=GA1.2.982687521.1587128695; _gid=GA1.2.956366233.1587128695; AMCVS_434717FE52A6476F0A490D4C%40AdobeOrg=1; s_ecid=MCMID%7C12108296909723217653110223702033109972; optimizelyEndUserId=oeu1587128701566r0.7936576379275126; _gcl_au=1.1.250521668.1587128723; _gat_UA-53269626-3=1; _gali=loginbutton; sessionOpen=true; preferredAccountId=KY7KR; exitUrl=https://; exitPath=uk; deviceType=Desktop; deviceOs=Other; defaultDealingPlatform=PUREDEAL; client_id=a20fa8511e2a302574dddc5533444d0b; callerReqId=af712cb1fe891a75; X-SECURITY-TOKEN=9d2f5f8b93cad41d0b334fbff3590fb66fb787d96981be70ac27394f28d0791201111; REFRESH-TOKEN=eyJraWQiOiJDQVE4QU1JSUJDZ0tDQVFFQXFKdiIsInR5cCI6IkNMSUVOVF9SRUZSRVNIIiwiYWxnIjoiUlMyNTYiLCJ6aXAiOiJHWklQIn0.H4sIAAAAAAAAAFXMMQvCMBCG4f-S2eHu0ibXbuKik9C6OCbppQRqEFtBEP-7N7j4jQ8f79ukpUjdTpPpDQISth2T2ZnnKo8abqJ83A_DldmC8mG8qDROADO7KBZ8nqCFjtCTR-HEyaYpRRDnUAQps6fQeEWJ5EICBwSo01oJm-mxZY_E3oLCump-LHVeZCxzPVd91Zj_X_K6_8BjZ-HzBetQSCXFAAAA.O44FIhfHyu0-qO70yj36bvYfKhzCVdyqn44pP2VYeCdMcf877qejiwvTbpFqY9A9A8R5VoQHID5V38r9unmJyyt_Npz8b-yuGbmpSyFRy75Be8PtST8TwVYpCNgQF7Bxt5fG8z8G9p2ZU8J56V8zIjs1IuZQkq0G5qtFSJ3uQT3IRs-qPTnN8Fv50Ra2LowojLJDrfT7RHkA-MbFQGkheuVq7b8G15dZzFjlv2T6eSGzhesCAKvpAzAEiDkL25AG7quclI84w5zyltawR99KoRpL_JZvNXGbxIbNFjcPJvqqQI7vtAtylSvyCs76UKlSaF3cc61GaeRRkdYSU725lQ; D; ID=TD=DB400FA09853FBF32BD51785C67BDB2F2EEB2D75:CS=2; CST=46e01f86be307fd0509217271e8c8c3cdcb0e661ee12f872a473cdeb26ac060201111; ACCESS-TOKEN=eyJraWQiOiJDQVE4QU1JSUJDZ0tDQVFFQXFKdiIsInR5cCI6IkNMSUVOVF9BQ0NFU1MiLCJhbGciOiJSUzI1NiIsInppcCI6IkdaSVAifQ.H4sIAAAAAAAAAFXOsQrCQBAE0H_Z2mJ3z9xt0omNYiEkNiIWl8smBPQUE0EQ_91tLJzyMQzzhpjS7ZnnCaoT7I5hV8N5Aekyap63HVRASExFKQwLeE76yPGqxptVXR9FHBqvm4PJ0itSL75Vh6HvsMCSKXAglSTJpS61qN6TKnEvgeMyGGrLPib0yEgWWxvjDBUVEoglODSY7Bw0Yx4u2oxD3mdr5bb_b-nr_oMSHX6-T676f9oAAAA.dwqQl2p5IOSHKmrQfQgSAO1b3ua3p0M5i8iP2avlJQFc2JLRSw6lrC8W83ZXgtgxEKvrXPzut8mmuU-nWhU1sjOBObWUxBww2DixK-V7AC2BEyt5UKtC5JgSezbcyQeejOenlFWBOEEIeYUN4-yjySt_FRFzZ-iJoVGYw_o5wWL03dckv7c5jUlR30lEqby6M-wIhnkXjIMDItGUKOhdXgZOYFC-22ZEb43Cf2zbJPcAaLi9HNlBvC3G1VNpChVpxLKQt2fYbCdbxvAO1s2Kf2TAA67PFmb5oHj36H3ybnpo7czobaovk9jEs4quezVv_OYhkMiuz1O9chD7O-Vx3g; cpaEnabled=true....

我想访问 CST 和 X-SECURITY-TOKEN 值并将它们保存到两个单独的变量中。出于某种原因,我拥有的文件将始终具有带有逗号分隔符的 CST,而 X-SECURITY-TOKEN 将始终具有分号分隔符。

要检索上述内容,我使用以下 for 循环代码:

searchfile = open("har.txt", "r")
for line in searchfile:
    print(line)
searchfile.close()

我需要将 line 转换为字典,以便我可以访问 CST 和 X-SECURITY-TOKEN 还是只进行拆分?请指教

【问题讨论】:

  • 这是一个JSON字符串,使用json库,here是一个教程
  • 没有人解决分号问题。 Cookie 数据以分号分隔。 在本例中为“分号和空格”。请参阅下面的答案以了解如何处理。 (即使不知道 cookie 数据是用分号分隔的,从示例输入中可以清楚地看出 X-SECURITY-TOKEN 不是 dict 中的键,data['X-SECURITY-TOKEN'] 是直接的 KeyError。)

标签: python json io har


【解决方案1】:

对于 JSON 格式的文件,建议使用库为您处理读取。

json.load(file) 将返回一个字典类型。 file 必须支持.read() 方法。

读取 JSON 文件非常简单:

import json
with open('filepath.txt','r') as f:
    fileDict = json.load(f)

然后,您可以使用dict[key] 访问任何键值对,就像使用普通 python 字典一样,例如:

print(fileDict["CST"])

您可以查看docs 了解更多详细信息,例如使用json.loads(str) 将字符串(而不是文件)转换为字典。

【讨论】:

    【解决方案2】:

    尝试使用json模块从文件中解析json:

    import json
    
    with open("har.txt", "r") as f:
        data = json.load(f)
        print(data["CST"]) # output: 46e01f86be307fd0509217271e8c8c3cdcb0e661ee12f872a473cdeb26ac060201111
    

    【讨论】:

      【解决方案3】:

      除了使用 JSON 最初加载/解析数据之外,所有其他答案都跳过了分号问题。这来自Cookie(s)。 Cookie 数据通常以分号分隔。 X-SECURITY-TOKEN其中一个 cookie。

      json.loads() 会将表示 JSON 对象的字符串加载到 Python。然后像往常一样访问每个相关键,例如其他人指出的“CST”。

      顺便说一句,use with open()... for file management。它处理错误并在之后关闭文件。

      with open("har.txt", "r") as searchfile:
          for line in searchfile:
              data = json.loads(line)
              cst = data["CST"]  # your CST data
              # now for cookies, first get all the cookie data
              cookies = data["Cookie"].split("; ")  # split on semi-colons with a space
              """This gives you a list of strings in the format "X=Y"
              where 'X' is the cookie name and 'Y' is its value. Looks like this:
      
              ['savedSR=null', 'personalisationTags=[]', '_ga=GA1.2.982687521.1587128695', ...]
      
              Next, we split each of thse on the '=' signs:
              """
              for cookie in cookies:
                  name, value = cookie.split('=')  # tuple un-packing
                  if name == 'X-SECURITY-TOKEN':  # check the cookie name
                      security_token = value
                      break  # we found the token, skip the remaining cookies
              else:
                  # ^ if the for loop ends without reaching that 'break'
                  # note the `else` indent is NOT for the `if`
                  print('Security token not found')
                  security_token = None
      
              print(cst, security_token)
      
      # don't need a searchfile.close() at the end
      

      【讨论】:

        猜你喜欢
        • 2016-07-17
        • 2013-03-29
        • 2022-11-14
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2015-07-05
        • 2017-08-25
        相关资源
        最近更新 更多