【问题标题】:Using PyMongo, I need to fetch the fields of another collection使用 PyMongo,我需要获取另一个集合的字段
【发布时间】:2016-10-09 18:20:44
【问题描述】:

我需要使用 PyMongo 构建一个查询,它从 MongoDB 数据库中的两个相关集合中获取数据。

集合 X 具有字段 UserId、Name 和 EmailId:

[
  {
    "UserId" :    "941AB",
    "Name" :      "Alex Andresson",
    "EmailId" :   "alex@example.com"
  },
  {
    "UserId" :    "768CD",
    "Name" :      "Bryan Barnes",
    "EmailId" :   "bryan@example.com"
  }
]   

集合 Y 具有字段 UserId1、UserID2 和 Rating:

[
  {
    "UserId1" :  "941AB",
    "UserId2" :  "768CD",
    "Rating" :   0.8
   }
]

我需要打印 UserId1 和 UserId2 的姓名和电子邮件 ID 以及评分,如下所示:

[
  {
    "UserId1" :    "941AB",
    "UserName1" :  "Alex Andresson"
    "UserEmail1" : "alex@example.com",
    "UserId2" :    "768CD",
    "UserName2" :  "Bryan Barnes"
    "UserEmail2" : "bryan@example.com",
    "Rating":      0.8
  }
]

这意味着我需要从集合 Y 和 X 中获取数据。我现在正在使用 PyMongo,但我无法找到它的解决方案。谁能给我一个关于这个概念的伪代码或如何推进它。

【问题讨论】:

    标签: mongodb python-2.7 mongodb-query jupyter-notebook pymongo-2.x


    【解决方案1】:

    您需要手动进行连接或使用一些可以为您完成连接的库 - 可能是 mongoengine

    基本上你需要找到你感兴趣的评分,然后找到与这些评分相关的用户。

    例子:

    #!/usr/bin/env python3
    
    import pymongo
    from random import randrange
    
    client = pymongo.MongoClient()
    db = client['test']
    
    # clean collections
    db['users'].drop()
    db['ratings'].drop()
    
    # insert data
    user_count = 100
    rating_count = 20
    
    db['users'].insert_many([
        {'UserId': i, 'Name': 'John', 'EmailId': i}
        for i in range(user_count)])
    
    db['ratings'].insert_many([
        {'UserId1': randrange(user_count), 'UserId2': randrange(user_count), 'Rating': i}
        for i in range(rating_count)])
    
    # don't forget the indexes
    db['users'].create_index('UserId')
    # but it would be better if we used _id as the UserId
    
    # if you want to make queries based on Rating value, then add also this index:
    db['ratings'].create_index('Rating')
    
    # now print ratings with users that have value 10+
    
    # simple approach:
    ratings = db['ratings'].find({'Rating': {'$gte': 10}})
    for rating in ratings:
        u1 = db['users'].find_one({'UserId': rating['UserId1']})
        u2 = db['users'].find_one({'UserId': rating['UserId2']})
        print('Rating between {} (UserId {:2}) and {} (UserId {:2}) is {:2}'.format(
            u1['Name'], u1['UserId'], u2['Name'], u2['UserId'], rating['Rating']))
    
    print('---')
    
    # optimized approach:
    ratings = list(db['ratings'].find({'Rating': {'$gte': 10}}))
    user_ids = {r['UserId1'] for r in ratings}
    user_ids |= {r['UserId2'] for r in ratings}
    users = db['users'].find({'UserId': {'$in': list(user_ids)}})
    users_by_id = {u['UserId']: u for u in users}
    for rating in ratings:
        u1 = users_by_id.get(rating['UserId1'])
        u2 = users_by_id.get(rating['UserId2'])
        print('Rating between {} (UserId {:2}) and {} (UserId {:2}) is {:2}'.format(
            u1['Name'], u1['UserId'], u2['Name'], u2['UserId'], rating['Rating']))
    

    请注意,第一种方法调用一个find 来进行评分,每个评分调用两个finds,但第二种方法总共只调用三个finds。如果您通过网络访问 MongoDB,这将导致巨大的性能差异。

    如果可能,我建议用户集合使用_id 而不是UserId

    当然,这个特殊的用例使用 SQL 数据库会容易得多。如果您使用 MongoDB 来提高性能并且读取次数多于写入次数,请考虑将相关用户名称缓存到评级文档中。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-09-11
      • 2023-03-13
      • 1970-01-01
      相关资源
      最近更新 更多