【问题标题】:trigram and ILIKE simultaneouslytrigram 和 ILIKE 同时
【发布时间】:2018-02-22 13:07:49
【问题描述】:

我有 GIN 索引列,使用 gin_trgm_ops 进行索引。

我正在使用相似性搜索术语mad

我明白了:

god-made
made
man
man-made
may

但是它漏掉了一些像srimad这样的词。

我想选择前 5 个 ILIKE '%mad%''mad%',然后选择前 5 个三元组并组合结果。

实施解决方案后:

我的 SQL 查询和解释:

EXPLAIN (COSTS OFF)
(SELECT word_similarity('mad',word), word FROM articles_words WHERE word ILIKE '%mad%' ORDER BY word_similarity('mad',word) DESC LIMIT 10) 
UNION 
(SELECT word_similarity('mad',word),word FROM articles_words WHERE word_similarity('mad',word) > 0.4 ORDER BY word_similarity('mad',word) DESC, word LIMIT 10)

  "QUERY PLAN"
"HashAggregate"
"  Group Key: (word_similarity('mad'::text, articles_words.word)), articles_words.word"
"  ->  Append"
"        ->  Limit"
"              ->  Sort"
"                    Sort Key: (word_similarity('mad'::text, articles_words.word)) DESC"
"                    ->  Bitmap Heap Scan on articles_words"
"                          Recheck Cond: (word ~~* '%mad%'::text)"
"                          ->  Bitmap Index Scan on words_idx"
"                                Index Cond: (word ~~* '%mad%'::text)"
"        ->  Limit"
"              ->  Sort"
"                    Sort Key: (word_similarity('mad'::text, articles_words_1.word)) DESC, articles_words_1.word"
"                    ->  Seq Scan on articles_words articles_words_1"
"                          Filter: (word_similarity('mad'::text, word) > '0.40000000000000002'::double precision)"

还有关于 UNION 的问题:

第一个查询项:

(SELECT word_similarity('mad',word), word FROM articles_words WHERE word ILIKE '%mad%' ORDER BY word_similarity('mad',word) DESC LIMIT 10)

0.75 man-made
0.75 made
0.75 god-made
0.5 srimad-bhagavatam
0.5 srimad

第二个查询项:

(SELECT word_similarity('mad',word),word FROM articles_words WHERE word_similarity('mad',word) > 0.4 ORDER BY word_similarity('mad',word) DESC, word LIMIT 10)

0.75 god-made
0.75 made
0.75 man-made
0.5 anti-material
0.5 half-man
0.5 magistrate
0.5 maha
0.5 maha-mantra
0.5 mahaprabhu
0.5 maharaja

我想要结果为:

0.75 man-made
0.75 made
0.75 god-made
0.5 srimad-bhagavatam
0.5 srimad
0.5 anti-material
0.5 half-man
0.5 magistrate
0.5 maha
0.5 maha-mantra
0.5 mahaprabhu
0.5 maharaja

但我按以下顺序排列:

0.75 god-made
0.5 maha
0.5 anti-material
0.5 mahaprabhu
0.5 maharaja
0.5 srimad
0.5 half-man
0.5 magistrate
0.5 srimad-bhagavatam
0.75 made
0.75 man-made
0.5 maha-mantra

【问题讨论】:

    标签: postgresql trigram


    【解决方案1】:

    您应该改用 GiST 索引。

    附下表:

    test=> TABLE trigram;
     id |   val    
    ----+----------
      1 | god-made
      2 | made
      3 | man
      5 | man-made
      4 | may
      6 | srimad
    ...
    

    您可以这样创建索引:

    CREATE INDEX ON trigram USING gist (val gist_trgm_ops);
    

    它可以用在这样的查询中:

    EXPLAIN (COSTS off)
    (SELECT id, val
     FROM trigram
     WHERE val ILIKE '%mad%'
     LIMIT 5)
    UNION
    (SELECT id, val
     FROM trigram
     ORDER BY val <-> 'mad'
     LIMIT 5);
                                      QUERY PLAN                                   
    -------------------------------------------------------------------------------
     HashAggregate
       Group Key: trigram.id, trigram.val
       ->  Append
             ->  Limit
                   ->  Index Scan using trigram_val_idx on trigram
                         Index Cond: (val ~~* '%mad%'::text)
             ->  Subquery Scan on "*SELECT* 2"
                   ->  Limit
                         ->  Index Scan using trigram_val_idx on trigram trigram_1
                               Order By: (val <-> 'mad'::text)
    (10 rows)
    

    【讨论】:

    • 为什么使用 GIST 而不是 GIN。
    • 因为ORDER BY val &lt;-&gt; 'mad'不能使用GIN索引。
    • 同样在 UNION 中,我希望顺序如下:首先显示所有第一个设置项,然后显示第二个设置项的非重复项,
    • 您没有显示查询或EXPLAIN 输出。在我的查询中,不能使用 GIN 索引。
    • 嗯?我不明白你的最新评论。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-01-04
    • 2012-08-21
    • 2013-12-18
    • 2018-05-13
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多