【问题标题】:Expand dataframe with several nested json用几个嵌套的 json 扩展数据框
【发布时间】:2020-01-10 01:04:33
【问题描述】:

我有一个从网络抓取中获得的 DataFrame,如下所示:

data = [{'StrategicResearchPriorities': {'data': [{'strategicAreaId': 0,
     'strategicAreaValue': 'Population',
     'strategicGoalId': 1,
     'strategicGoalValue': 'Social'}]},
  'ScienceAndResearchPriorities': {'data': [{'scienceAndResearchPriorityId': 'Health',
     'scienceAndResearchPriorityValue': 'Health',
     'practicalResearchChallengeId': 'XXX.',
     'practicalResearchChallengeValue': 'YYY'}]},
  'IndustrialTransformationPriorities': None,
  'FieldsOfResearch': '{"data":[{"guid":1557,"value":"200102 - Communication Technology and Digital Media Studies","code":200102,"percentage":"45"},{"guid":1499,"value":"180119 - Law and Society","code":180119,"percentage":"30"},{"guid":1381,"value":"160104 - Social and Cultural Anthropology","code":160104,"percentage":"15"},{"guid":1444,"value":"160808 - Sociology and Social Studies of Science and Technology","code":160808,"percentage":"10"}]}',
  'Title': 'X and Y',
  'AdminOrganisationStateName': 'A',
  'AdminOrganisation': 'B',
  'ProjectCode': '0000001',
  'ChiefInvestigators': [{'FamilyName': 'Surname1',
    'FirstName': 'Name1',
    'SecondName': None,
    'Title': 'Mr',
    'PersonOrdinal': 1},
   {'FamilyName': 'Surname2',
    'FirstName': 'Name2',
    'SecondName': 'SecondName2',
    'Title': 'Ms',
    'PersonOrdinal': 3},
   ],
  'OrganisationParticipantSummary': '{"data":[{"id":11111,"guid":"af4","name":"Institute","number":1,"roleName":"Administering Organisation","roleId":1,"inKind":true},{"id":22222,"guid":"af6","name":"University","number":2,"roleName":"Other","roleId":3,"inKind":true}]}',
  'Summary': 'Some text',
  'AnnouncedDate': '1900-06-10T14:46:54.57',
  'AllocatedNumbersCalendarYears': [1,
   2,
   1,
   5,],
  'UnnamedAwardSummary': {}},
 {'StrategicResearchPriorities': {'data': [{'strategicAreaId': 4,
     'strategicAreaValue': 'Productivity',
     'strategicGoalId': 11,
     'strategicGoalValue': 'Economy'}]},
  'ScienceAndResearchPriorities': {'data': [{'scienceAndResearchPriorityId': 'Manufacturing',
     'scienceAndResearchPriorityValue': 'Manufacturing',
     'practicalResearchChallengeId': 'Technologies.',
     'practicalResearchChallengeValue': 'Modern technologies.'}]},
  'IndustrialTransformationPriorities': None,
  'FieldsOfResearch': '{"data":[{"guid":222,"value":"010101 - Subject1","code":"020202","percentage":"50"},{"guid":555,"value":"020201 - Subject10","code":"020201","percentage":"50"}]}',
  'Title': 'A and B and C',
  'AdminOrganisationStateName': 'Org',
  'AdminOrganisation': 'Institute',
  'ProjectCode': 'XX100000',
  'ChiefInvestigators': [{'FamilyName': 'Surname3',
    'FirstName': 'Name3',
    'SecondName': None,
    'Title': 'Dr',
    'PersonOrdinal': 1},
   {'FamilyName': 'Surname4',
    'FirstName': 'Name4',
    'SecondName': 'SecondName4',
    'Title': 'Prof',
    'PersonOrdinal': 15}],
  'OrganisationParticipantSummary': '{"data":[{"id":10002,"guid":"ab3","name":"University","number":1,"roleName":"Owner","roleId":1,"inKind":true},{"id":50000,"guid":"2a7","name":"University2","number":2,"roleName":"Other","roleId":3,"inKind":true}]}',
  'Summary': 'Some text 2.',
  'AnnouncedDate': '1800-06-12T15:26:55.003',
  'AllocatedNumbersCalendarYears': [5,
   1,
   3,
   2,
   9,
   20,
   10],
  'UnnamedAwardSummary': {}},
 ]

我想将所有不同的单元格解压成一个大数据框。我试过了

json_normalize(data)

但单元格像字符串一样被读取。问题是诸如“StrategicResearchPriorities”之类的字段在(“数据”)中有另一个列表,无法访问它。

PS:很抱歉,数组很长,但我认为最好展示所有这些。它实际上已经被修剪了很多。

【问题讨论】:

    标签: python json pandas


    【解决方案1】:

    【讨论】:

      猜你喜欢
      • 2018-11-06
      • 2023-03-07
      • 2018-05-25
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-05-21
      • 1970-01-01
      相关资源
      最近更新 更多