【问题标题】:How does DISTINCT work when using JPA and Hibernate使用 JPA 和 Hibernate 时 DISTINCT 是如何工作的
【发布时间】:2010-11-23 16:32:11
【问题描述】:

DISTINCT 在 JPA 中与哪一列一起使用,是否可以更改它?

这是一个使用 DISTINCT 的 JPA 查询示例:

select DISTINCT c from Customer c

这没有多大意义 - distinct 基于哪一列?是否因为找不到而在实体上指定为注解?

我想指定要区分的列,例如:

select DISTINCT(c.name) c from Customer c

我正在使用 MySQL 和 Hibernate。

【问题讨论】:

  • @Id 在实体的生命周期中扮演什么具体角色

标签: java jpa distinct


【解决方案1】:

你很接近。

select DISTINCT(c.name) from Customer c

【讨论】:

  • 这只会返回该列的数组。如何使用这种方法返回整个实体?
  • @cen - 你的要求是不合逻辑的。如果我有两个客户 (id=1234, name="Joe Customer") 和 (id=2345, name="Joe Customer"),这样的查询应该返回哪个?结果将是不确定的。现在,你可以用类似的东西来强制它(不太清楚它的语法是如何工作的,但这应该给出一般的想法):select c from Customer c where id in (select min(d.id) from Customer d group by d.name) ...但这取决于情况,因为你需要想出一个方式基于您可用于选择实体之一的属性。
  • @Jules - 在这种情况下,您通常并不真正关心返回哪个,因此任何选择技术都可以。我认为mysql默认情况下甚至可以处理这种情况。我不记得我 2 年前的确切用例了。
  • @Jules 有没有办法将返回的对象数组与实体映射。
【解决方案2】:

根据底层 JPQL 或 Criteria API 查询类型,DISTINCT 在 JPA 中有两种含义。

标量查询

对于返回标量投影的标量查询,如以下查询:

List<Integer> publicationYears = entityManager
.createQuery(
    "select distinct year(p.createdOn) " +
    "from Post p " +
    "order by year(p.createdOn)", Integer.class)
.getResultList();

LOGGER.info("Publication years: {}", publicationYears);

DISTINCT 关键字应该传递给底层 SQL 语句,因为我们希望 DB 引擎在返回结果集之前过滤重复:

SELECT DISTINCT
    extract(YEAR FROM p.created_on) AS col_0_0_
FROM
    post p
ORDER BY
    extract(YEAR FROM p.created_on)

-- Publication years: [2016, 2018]

实体查询

对于实体查询,DISTINCT 有不同的含义。

不使用DISTINCT,查询如下:

List<Post> posts = entityManager
.createQuery(
    "select p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

将像这样加入postpost_comment 表:

SELECT p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1, 1]

但是父 post 记录在每个关联 post_comment 行的结果集中重复。因此,Post 实体中的List 将包含重复的Post 实体引用。

要消除Post实体引用,我们需要使用DISTINCT

List<Post> posts = entityManager
.createQuery(
    "select distinct p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();
 
LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

但随后DISTINCT 也被传递给 SQL 查询,这根本不可取:

SELECT DISTINCT
       p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'
 
-- Fetched the following Post entity identifiers: [1]

通过将DISTINCT 传递给SQL 查询,执行计划将执行一个额外的排序 阶段,这会增加开销而不会带来任何价值,因为父子组合总是返回唯一记录,因为子PK专栏:

Unique  (cost=23.71..23.72 rows=1 width=1068) (actual time=0.131..0.132 rows=2 loops=1)
  ->  Sort  (cost=23.71..23.71 rows=1 width=1068) (actual time=0.131..0.131 rows=2 loops=1)
        Sort Key: p.id, pc.id, p.created_on, pc.post_id, pc.review
        Sort Method: quicksort  Memory: 25kB
        ->  Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.054..0.058 rows=2 loops=1)
              Hash Cond: (pc.post_id = p.id)
              ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.010..0.010 rows=2 loops=1)
              ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.027..0.027 rows=1 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 9kB
                    ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.017..0.018 rows=1 loops=1)
                          Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
                          Rows Removed by Filter: 3
Planning time: 0.227 ms
Execution time: 0.179 ms

带有 HINT_PASS_DISTINCT_THROUGH 的实体查询

要从执行计划中消除排序阶段,我们需要使用HINT_PASS_DISTINCT_THROUGH JPA 查询提示:

List<Post> posts = entityManager
.createQuery(
    "select distinct p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.setHint(QueryHints.HINT_PASS_DISTINCT_THROUGH, false)
.getResultList();
 
LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

现在,SQL 查询将不包含 DISTINCTPost 实体引用重复项将被删除:

SELECT
       p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'
 
-- Fetched the following Post entity identifiers: [1]

执行计划将确认我们这次不再有额外的排序阶段:

Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.066..0.069 rows=2 loops=1)
  Hash Cond: (pc.post_id = p.id)
  ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.011..0.011 rows=2 loops=1)
  ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.041..0.041 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.036..0.037 rows=1 loops=1)
              Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
              Rows Removed by Filter: 3
Planning time: 1.184 ms
Execution time: 0.160 ms

【讨论】:

  • 上周买的,不过还没买完 ;-) 可能是我读过的最好的 IT 书籍
  • 谢谢,非常有用的答案!!在阅读了您在此处提到的文章和 Spring Data JPA 参考文档后,通过在方法顶部添加此注释在我的 Spring Data JPA 存储库中实现了这一点:@QueryHints(@QueryHint(name = "hibernate.query.passDistinctThrough", value = "false"))
  • @dk7 这正是我想要的。谢谢!
  • 但是计划时间增加了为什么会这样?
  • @İsmailYavuz PASS_DISTINCT_THROUGHHHH-10965 实现,并且从 Hibernate ORM 5.2.2 开始可用。 Spring Boot 1.5.9 很老,使用 Hibernate ORM 5.0.12。因此,如果您想从这些很棒的功能中受益,您需要升级您的依赖项。
【解决方案3】:
@Entity
@NamedQuery(name = "Customer.listUniqueNames", 
            query = "SELECT DISTINCT c.name FROM Customer c")
public class Customer {
        ...

        private String name;

        public static List<String> listUniqueNames() {
             return = getEntityManager().createNamedQuery(
                   "Customer.listUniqueNames", String.class)
                   .getResultList();
        }
}

【讨论】:

    【解决方案4】:

    更新:请查看票数最高的答案。

    我自己的目前已经过时了。仅出于历史原因保留在这里。


    Join 中通常需要 HQL 的独特性,而不是像您自己这样的简单示例。

    另见How do you create a Distinct query in HQL

    【讨论】:

    • 无意冒犯,但这怎么能被接受为答案呢?
    • 这是从 2009 年到 2012 年唯一有效的答案
    【解决方案5】:

    我同意 kazanaki 的回答,它帮助了我。 我想选择整个实体,所以我使用了

     select DISTINCT(c) from Customer c
    

    在我的例子中,我有多对多的关系,我想在一个查询中加载带有集合的实体。

    我使用了 LEFT JOIN FETCH,最后我必须使结果与众不同。

    【讨论】:

      【解决方案6】:

      我会使用 JPA 的构造函数表达式功能。另请参阅以下答案:

      JPQL Constructor Expression - org.hibernate.hql.ast.QuerySyntaxException:Table is not mapped

      按照问题中的示例,它会是这样的。

      SELECT DISTINCT new com.mypackage.MyNameType(c.name) from Customer c
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2012-12-27
        • 1970-01-01
        • 2013-08-21
        • 2012-09-13
        • 2010-10-09
        • 1970-01-01
        • 2015-05-08
        • 1970-01-01
        相关资源
        最近更新 更多