【问题标题】:What's the best way to calculate similarity between rows in a table based on association?基于关联计算表中行之间相似度的最佳方法是什么?
【发布时间】:2010-06-13 01:15:22
【问题描述】:

假设每个人都有一组最喜欢的书籍。

所以我有一张桌子:

  • 书籍
  • Person 和 Book 之间的关联(MxN 的联合表)

我想根据最喜欢的书籍重叠来获取与 Person1 相似的人。那就是:它们的共同点越多,它们就越相似。

我不必只使用 SQL 来解决这个问题。我也可以使用编程。我正在使用 SQL Server 2008 和 C#。

您的专家会使用什么解决方案?

【问题讨论】:

    标签: c# .net sql sql-server orm


    【解决方案1】:

    这可能不是最有效的,但相对简单:

    WITH SimlarBookPrefs(person_id, similar_person_id, booksInCommon) AS
    (
     Select p1.person_id, p2.person_id AS simlar_person_id,   
     /* Find the number of books p1 and p2 have in common */
       (SELECT COUNT(*) FROM PersonBook pb1, PersonBook pb2 
         JOIN pb1=book_id=pb2.book_id
       WHERE pb1.person_id=p1.person_id AND pb2.person_id=p2.person_id) As BooksInCommon
       FROM Person p1 CROSS JOIN Person p2
    )
    

    这将为您提供每个人、其他人的列表和共同的号码簿。

    要获得最相似的人,请添加(在同一查询中)

    SELECT TOP 1 similar_person_id FROM SimilarBookPrefs 
       WHERE person_id = <person_to_match>
       ORDER By booksInCommon DESC;
    

    第一部分不必是 CTE(即 WITH ...),它可以是视图甚至是派生表。为简洁起见,这里是 CTE。

    【讨论】:

      【解决方案2】:

      如果我在 C# 中这样做,我可能会这样处理它

      var query = from personBook in personBooks
                  where personBook.PersonId != basePersonId // ID of person to match
                  join bookbase in personBooks
                  on personBook.BookId equals bookbase.BookId
                  where bookbase.PersonId == basePersonId // ID of person to match
                  join person in persons 
                  on personBook.PersonId equals person.Id 
                  group person by person into bookgroup
                  select new
                  {
                      Person = bookgroup.Key, 
                      BooksInCommon = bookgroup.Count()
                  };
      

      这可能通过实体框架或 Linq to SQL 完成,或者直接翻译成 SQL。

      完整示例代码

      class CommonBooks
      {
          static void Main()
          {
              List<Person> persons = new List<Person>()
              {
                  new Person(1, "Jane"), new Person(2, "Joan"), new Person(3, "Jim"), new Person(4, "John"), new Person(5, "Jill")
              };
      
              List<Book> books = new List<Book>()
              {
                  new Book(1), new Book(2), new Book(3), new Book(4), new Book(5)
              };
      
              List<PersonBook> personBooks = new List<PersonBook>()
              {
                  new PersonBook(1,1), new PersonBook(1,2), new PersonBook(1,3), new PersonBook(1,4), new PersonBook(1,5), 
                  new PersonBook(2,2), new PersonBook(2,3), new PersonBook(2,5), 
                  new PersonBook(3,2), new PersonBook(3,4), new PersonBook(3,5), 
                  new PersonBook(4,1), new PersonBook(4,4),
                  new PersonBook(5,1), new PersonBook(5,3), new PersonBook(5,5)
              };
      
              int basePersonId = 4; // person to match likeness
      
              var query = from personBook in personBooks
                          where personBook.PersonId != basePersonId
                          join bookbase in personBooks
                          on personBook.BookId equals bookbase.BookId
                          where bookbase.PersonId == basePersonId
                          join person in persons
                          on personBook.PersonId equals person.Id
                          group person by person into bookgroup
                          select new
                          {
                              Person = bookgroup.Key,
                              BooksInCommon = bookgroup.Count()
                          };
      
              foreach (var item in query)
              {
                  Console.WriteLine("{0}\t{1}", item.Person.Name, item.BooksInCommon);
              }
      
              Console.Read();
          }
      }
      
      class Person
      {
          public int Id { get; set; }
          public string Name { get; set; }
          public Person(int id, string name) { Id = id; Name = name; }
      }
      
      class Book
      {
          public int Id { get; set; }
          public Book(int id) { Id = id; }
      }
      
      class PersonBook
      {
          public int PersonId { get; set; }
          public int BookId { get; set; }
          public PersonBook(int personId, int bookId) { PersonId = personId; BookId = bookId; }
      }
      

      【讨论】:

        【解决方案3】:

        您所描述的问题通常被称为“协同过滤”并使用“推荐系统”来解决。搜索这些术语中的任何一个都应该会为您带来大量有用的信息。

        【讨论】:

          猜你喜欢
          • 2020-03-03
          • 2016-09-25
          • 1970-01-01
          • 2020-05-26
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2020-07-08
          相关资源
          最近更新 更多