【问题标题】:Issues with creating json and/or xml创建 json 和/或 xml 的问题
【发布时间】:2022-01-10 14:38:24
【问题描述】:

我需要在 python 中编写代码的帮助,我需要编写一个代码来创建一个带有单词在句子中的位置/索引的 json 或 xml,无论单词中的所有字符是否都是字母,最后他们提供给我的句子中每个单词的单词本身。我首先想到的是用一个简单的字典来存储key和value,然后把字典转换成json:

import json
data = {}
liste = [] # it's for storing all the words after splitting them by space
sentence="As its price tag has been slashed to $1.7trn over a decade, half as much as first pitched, the hunger—or squid—games between progressives and moderates have turned fiercer."

liste = sentence.split(" ")
for word,index in zip(liste,range(0,len(liste))):
    data[word.lower()] = {"alpha":word.lower().isalpha()}
    data[word.lower()]['Word'] = word.lower()
    data[word.lower()]['Index'] = index
json_data = json.dumps(data,ensure_ascii=False)
print(json_data)

给我打印这个 json:

{"as": {"alpha": true, "Word": "as", "Number": 15}, "its": {"alpha": true, "Word": "its", "Number": 1}, "price": {"alpha": true, "Word": "price", "Number": 2}, "tag": {"alpha": true, "Word": "tag", "Number": 3}, "has": {"alpha": true, "Word": "has", "Number": 4}, "been": {"alpha": true, "Word": "been", "Number": 5}, "slashed": {"alpha": true, "Word": "slashed", "Number": 6}, "to": {"alpha": true, "Word": "to", "Number": 7}, "$1.7trn": {"alpha": false, "Word": "$1.7trn", "Number": 8}, "over": {"alpha": true, "Word": "over", "Number": 9}, "a": {"alpha": true, "Word": "a", "Number": 10}, "decade,": {"alpha": false, "Word": "decade,", "Number": 11}, "half": {"alpha": true, "Word": "half", "Number": 12}, "much": {"alpha": true, "Word": "much", "Number":14}, "first": {"alpha": true, "Word": "first", "Number": 16}, "pitched,": {"alpha": false, "Word": "pitched,", "Number": 17}, "the": {"alpha": true, "Word": "the", "Number": 18}, "hunger—or": {"alpha": false, "Word": "hunger—or", "Number": 19}, "squid—games": {"alpha": false, "Word": "squid—games", "Number": 20}, "between": {"alpha": true, "Word": "between", "Number": 21}, "progressives": {"alpha": true, "Word": "progressives", "Number": 22}, "and": {"alpha": true, "Word": "and", "Number": 23}, "moderates": {"alpha": true, "Word": "moderates", "Number": 24}, "have": {"alpha": true, "Word": "have", "Number": 25}, "turned": {"alpha": true, "Word": "turned", "Number": 26}, "fiercer.": {"alpha": false, "Word": "fiercer.", "Number": 27}}

正如您所见,这个 json 不正确,缺少一些单词(另外两个“as”)。在对stackoverflow做了一些研究之后,我想我开始明白为什么了:如果我的理解是正确的,一个字典和一个json对象不能多次拥有同一个键。但问题是,在大多数英语句子中,有些单词是重复的。

英文句子示例:由于其价格标签在过去十年中已降至 1.7 万亿美元,是最初价格的一半,进步派和温和派之间的饥饿或鱿鱼游戏变得更加激烈。

在这句话中,单词“as”重复了 3 次,所以我认为在我的代码中,字典中的键被覆盖了两次,因为有 3 个单词“as”。我的想法正确吗?如果是正确的,我该怎么做才能解决这个问题?我可以以某种方式绕过字典或json问题的唯一键吗?我应该使用哪种数据结构以及如何获取 json 或 xml 作为输出?

【问题讨论】:

  • 您可以查看collections.defaultdictcollections.Counter
  • 谢谢@oc11,这就是我要找的!

标签: python json python-3.x xml dictionary


【解决方案1】:

在 json 中,你不能绕过这个语法,但是你可以添加一个 json 属性到一个单词中:

data[word.lower()]["occurences"]= data[word.lower()]["occurences"] +1 if word.lower() in data else 1

作为旁注,我强烈建议您将常用代码重命名为属性(此处至少为word.lower()

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-03-04
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多