【问题标题】:Accessing pandas dataframe columns that are not in Index访问不在索引中的熊猫数据框列
【发布时间】:2021-01-08 07:34:17
【问题描述】:

我正在尝试从以下嵌套 json 中提取一些字段并写入单独的 csv 文件:

{
  "AccountID": "00000000-0000-0000-0000-000000000000",
  "LocationID": "00000000-0000-0000-0000-000000000000",
  "CreatedBy": "string",
  "ModifiedBy": "string",
  "Created": "string",
  "Modified": "string",
  "LocationData": {
    "KeyFields": {},
    "DisplayPoint": {
      "Type": "Calculated",
      "Latitude": 0.0,
      "Longitude": 0.0,
      "VerificationType": "Client"
    },
    "BusinessStatus": "Open",
    "Status": "Active",
    "BusinessName": {
      "Name": "string",
      "LongName": "string",
      "Locale": "Not_set"
    },
    "BusinessDescription": {
      "Description": "string",
      "ShortDescription": "string",
      "LongDescription": "string"
    },
    "PrimaryAddress": {
      "AddressLine1": "string",
      "AddressLine2": "string",
      "AddressLine3": "string",
      "AddressLine4": "string",
      "AddressLine5": "string",
      "Neighborhood": "string",
      "Locality": "string",
      "Region": "string",
      "PostalCode": "string",
      "CountryCode": "string"
    },
    "PhoneNumbers": {
      "PrimaryPhoneNumber": "string",
      "Landline": "string",
      "Mobile": "string",
      "Fax": "string",
      "TollFree": "string"
    },
    "HoursOfOperationStructured": {
      "Su": {
        "Ranges": [
          {
            "StartTime": "string",
            "EndTime": "string"
          }
        ],
        "State": "Open",
        "AdditionalInfo": "string"
      },
      "Mo": {
        "Ranges": [
          {
            "StartTime": "string",
            "EndTime": "string"
          }
        ],
        "State": "Open",
        "AdditionalInfo": "string"
      },
      "Tu": {
        "Ranges": [
          {
            "StartTime": "string",
            "EndTime": "string"
          }
        ],
        "State": "Open",
        "AdditionalInfo": "string"
      },
      "We": {
        "Ranges": [
          {
            "StartTime": "string",
            "EndTime": "string"
          }
        ],
        "State": "Open",
        "AdditionalInfo": "string"
      },
      "Th": {
        "Ranges": [
          {
            "StartTime": "string",
            "EndTime": "string"
          }
        ],
        "State": "Open",
        "AdditionalInfo": "string"
      },
      "Fr": {
        "Ranges": [
          {
            "StartTime": "string",
            "EndTime": "string"
          }
        ],
        "State": "Open",
        "AdditionalInfo": "string"
      },
      "Sa": {
        "Ranges": [
          {
            "StartTime": "string",
            "EndTime": "string"
          }
        ],
        "State": "Open",
        "AdditionalInfo": "string"
      },
      "SpecialHours": [
        {
          "Date": "string",
          "Ranges": [
            {
              "StartTime": "string",
              "EndTime": "string"
            }
          ],
          "State": "Open",
          "AdditionalInfo": "string"
        }
      ]
    }  
}

我可以使用 pandas 和 json_normalize 来展平数据。然后我可以通过引用我想要的字段来提取字段,例如df['LocationData.PrimaryAddress.Locality']。这适用于我需要的所有字段,除了抛出 KeyError 的“StartTime”和“EndTime”范围

当我尝试通过像这样引用它来提取任何特定日期的“开始时间”或“结束时间”范围时:df['LocationData.HoursOfOperationStructured.Su.Ranges.StartTime'] ---- 它返回一个

KeyError: "['LocationData.HoursOfOperationStructured.Su.Ranges.StartTime'] not in index"

如何使用 pandas 访问此文件中所有日期的“开始时间”/“结束时间”列?

【问题讨论】:

    标签: json python-3.x pandas dataframe


    【解决方案1】:

    df['LocationData.HoursOfOperationStructured.Su.Ranges'] 列和所有其他类似的列已“欠规范化”:它们包含带有键“StartTime”和“EndTime”的单元素字典列表。您可以在循环中将这些字典列转换为“真实”列,然后与原始数据框连接:

    ranges = df.columns[df.columns.str.match('.*Ranges.*')]
    missing = [df[r].str[0].apply(pd.Series)\
                    .rename(columns={'StartTime' : f"{r}.StartTime",
                                     'EndTime'   : f"{r}.EndTime"})
               for r in ranges]
    df = df.join(pd.concat(missing, axis=1))
    

    它很丑,但它有效。

    【讨论】:

      猜你喜欢
      • 2015-09-18
      • 1970-01-01
      • 2019-10-28
      • 2016-02-06
      • 1970-01-01
      • 2022-07-10
      • 1970-01-01
      • 2016-06-16
      • 1970-01-01
      相关资源
      最近更新 更多