【问题标题】:Extract part of data from JSON file with python [duplicate]使用python从JSON文件中提取部分数据[重复]
【发布时间】:2015-03-28 21:11:30
【问题描述】:

我一直在尝试仅从 JSON 文件中提取某些数据。我设法解码 JSON 并将想要的数据放入 python 字典中。当我打印出字典时,它显示了所有想要的数据,但是当我尝试将字典写入新文件时,只有最后一个对象被写入。 我无法理解的一件事是,为什么当我打印 dict 时,我得到了多个 dicts 对象,而不是我期望的 1 个。

我的代码:

import json
input_file=open('json.json', 'r')
output_file=open('test.json', 'w')
json_decode=json.load(input_file)
for item in json_decode:
    my_dict={}
    my_dict['title']=item.get('labels').get('en').get('value')
    my_dict['description']=item.get('descriptions').get('en').get('value')
    my_dict['id']=item.get('id')
    print my_dict
back_json=json.dumps(my_dict, output_file)
output_file.write(back_json)
output_file.close() 

我的 json.json 文件:

[
{"type":"item","labels":{"en":{"language":"en","value":"George Washington"}},"descriptions":{"en":{"language":"en","value":"American politician, 1st president of the United States (in office from 1789 to 1797)"}},"id":"Q23"},
{"type":"item","aliases":{"en":[{"language":"en","value":"Douglas Noël Adams"},{"language":"en","value":"Douglas Noel Adams"}]},"labels":{"en":{"language":"en","value":"Douglas Adams"}},"descriptions":{"en":{"language":"en","value":"English writer and humorist"}},"id":"Q42"},
{"type":"item","aliases":{"en":[{"language":"en","value":"George Bush"},{"language":"en","value":"George Walker Bush"}]},"labels":{"en":{"language":"en","value":"George W. Bush"}},"descriptions":{"en":{"language":"en","value":"American politician, 43rd president of the United States from 2001 to 2009"}},"id":"Q207"},
{"type":"item","aliases":{"en":[{"language":"en","value":"Velázquez"},{"language":"en","value":"Diego Rodríguez de Silva y Velázquez"}]},"labels":{"en":{"language":"en","value":"Diego Velázquez"}},"descriptions":{"en":{"language":"en","value":"Spanish painter who was the leading artist in the court of King Philip IV"}},"id":"Q297"},
{"type":"item","labels":{"en":{"language":"en","value":"Eduardo Frei Ruiz-Tagle"}},"descriptions":{"en":{"language":"en","value":"Chilean politician and former President"}},"id":"Q326"}
]

打印 my_dict 输出:

{'id': u'Q23', 'description': u'American politician, 1st president of the United States (in office from 1789 to 1797)', 'title': u'George Washington'}
{'id': u'Q42', 'description': u'English writer and humorist', 'title': u'Douglas Adams'}
{'id': u'Q207', 'description': u'American politician, 43rd president of the United States from 2001 to 2009', 'title': u'George W. Bush'}
{'id': u'Q297', 'description': u'Spanish painter who was the leading artist in the court of King Philip IV', 'title': u'Diego Vel\xe1zquez'}
{'id': u'Q326', 'description': u'Chilean politician and former President', 'title': u'Eduardo Frei Ruiz-Tagle'}

在 test.json 文件中输出:

{"id": "Q326", "description": "Chilean politician and former President", "title": "Eduardo Frei Ruiz-Tagle"}

我也想知道为什么 dict 输出 'title': u'Diego Vel\xe1zquez' 但是如果我去打印 my_dict.values()[2] 我得到的名字通常写成 Diego Velázquez。

非常感谢

【问题讨论】:

  • u'Diego Vel\xe1zquez' 是 Unicode 的 Python 表示,其中 \xe1 是字符 á。
  • 第二个问题:如果您打印 dict,您将获得字符串的 python 表示 (repr),而打印字符串则为您提供“正常”表示 (str)。查看 "repr" 和 "str" (satyajit.ranjeev.in/2012/03/14/python-repr-str.html) 了解更多信息

标签: python json dictionary extract decode


【解决方案1】:

当你这样做时:

for item in json_decode:

您正在遍历文件中的每一行。

每次通过循环时,您都会覆盖 my_dict 变量,这就是为什么您的输出中只得到一行的原因。

加载文件后,您可以简单地打印出json_decode 变量来执行您想要的操作。

https://docs.python.org/3.3/library/json.html

【讨论】:

    【解决方案2】:

    您的代码为每个对象创建新的字典对象:

    my_dict={}
    

    此外,它会覆盖变量的先前内容。 m_dict 中的 字典已从内存中删除。

    尝试在 for 循环之前创建一个列表并将结果存储在那里。

    result = []
    for item in json_decode:
        my_dict={}
        my_dict['title']=item.get('labels').get('en').get('value')
        my_dict['description']=item.get('descriptions').get('en').get('value')
        my_dict['id']=item.get('id')
        print(my_dict)
        result.append(my_dict)
    

    最后,将结果写入输出:

    back_json=json.dumps(result)
    

    打印字典对象旨在通过显示数据类型来帮助开发人员。在 u'Diego Vel\xe1zquez' 中,开头的 u 表示一个 Unicode 对象(字符串)。打印对象使用时,会根据您操作系统中的当前语言设置对其进行解码。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-03-07
      • 2021-03-27
      • 1970-01-01
      • 2015-08-27
      • 1970-01-01
      • 2016-10-20
      相关资源
      最近更新 更多