【问题标题】:Convert MongoDB collection to Json file and read from the same Json file in Python将 MongoDB 集合转换为 Json 文件并在 Python 中从同一个 Json 文件中读取
【发布时间】:2016-04-16 19:00:05
【问题描述】:

我正在尝试将 mongodb 集合转换为 json 文件,然后将相同的 Json 文件数据加载到另一个 MongoDB 集合。该集合有大约 60,000 行。我写了以下代码:

from pymongo import MongoClient
import json
from bson.json_util import dumps
from bson import json_util

with open("collections/review.json", "w") as f:
    l = list(reviews_collection.find())  
    json.dump(json.dumps(l,default=json_util.default),f,indent = 4)

# reviews_collection_bkp.remove()
reviews_collection_bkp.remove()
with open("collections/review.json") as dataset:
    for line in dataset:
            data = json.loads(line)
            reviews_collection_bkp.insert({
                 "reviewId": data["reviewId"],
                 "business": data["business"],
                 "text": data["text"],
                 "stars": data['stars'],
                 "votes":data["votes"]
             })
print reviews_collection_bkp.find().count() 

review_collection 是我想在 Json 文件名review.json 中写入的集合,稍后我想从同一个文件中读取以将数据插入到 MongoDB 集合中。但我认为代码无法创建正确的 json 文件。因为在读取同一个文件时会产生以下错误:

    "reviewId": data["reviewId"],
TypeError: string indices must be integers

为什么创建的 Json 文件格式不正确?

这是linedata 的示例输出:

"[{\"votes\": {\"funny\": 0, \"useful\": 0, \"cool\": 0}, \"business\": \"wqu7ILomIOPSduRwoWp4AQ\", \"text\": \"Went for breakfast on 6/16/14. We received very good service and meal came within a few minutes.Waitress could have smiled more but was friendly. \\nI had a Grand Slam... it was more than enough food. \\nMeal was very tasty... We will definitely go back. \\nIt is a popular Denny's.\", \"reviewId\": \"0GS3S7UsRGI4B7ziy4cd7Q\", \"stars\": 4, \"_id\": {\"$oid\": \"5711d16fe396f81fcb51dc73\"}},...]

[{"votes": {"funny": 0, "useful": 0, "cool": 0}, "business": "wqu7ILomIOPSduRwoWp4AQ", "text": "Went for breakfast on 6/16/14. We received very good service and meal came within a few minutes.Waitress could have smiled more but was friendly. \nI had a Grand Slam... it was more than enough food. \nMeal was very tasty... We will definitely go back. \nIt is a popular Denny's.", "reviewId": "0GS3S7UsRGI4B7ziy4cd7Q", "stars": 4, "_id": {"$oid": "5711d16fe396f81fcb51dc73"}}......]

【问题讨论】:

  • 检查.json 文件时看到了什么?你的data 是一个字符串,这就是错误所说的
  • 请发布linedata的样本

标签: python json mongodb


【解决方案1】:

您确定文件的每一行都是有效的 json 吗?

我认为这是一个正确的做法:

with open("collections/review.json") as dataset:
    data = json.loads(dataset)
    for line in data:
        reviews_collection_bkp.insert({
             "reviewId": line['reviewId'],
             ...
         })

如果这不起作用,请尝试打印生成的json文件,以了解如何解码。

【讨论】:

  • json 文件太大,无法打印。我在问题本身中添加了示例行和数据输出。
  • 这样试试:data[0]['reviewId']每一行是一个list,第一项是一个dict。解码该行,然后访问字典。
【解决方案2】:

由于您的数据是您需要遍历它的字典列表。

for line in dataset:
    data = json.loads(line)
    for doc in data:
         reviews_collection_bkp.insert({
                 "reviewId": data["reviewId"],
                 "business": data["business"],
                 "text": data["text"],
                 "stars": data['stars'],
                 "votes":data["votes"]
             }) 

【讨论】:

  • 仍然得到相同的结果。
  • json.loads(dataset) 正在抛出 TypeError: expected string or buffer 并在调用 'json.load() 后抛出字符串索引错误。
  • @triple.s 你的data 类型是str 吗?
  • 这不应该是一个列表吗?
  • 是的..这就是为什么我怀疑 Json 文件本身不是以有效格式创建的!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2019-07-08
  • 2021-10-14
  • 2022-01-08
  • 2018-02-27
  • 2021-08-13
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多