【发布时间】:2011-10-03 16:22:36
【问题描述】:
这两天我脑子里一片空白……
我想在 sphinx 搜索中使用斯洛文尼亚字母,所有英文字母 + č ž š(以防万一)
我在网上四处寻找正确的字符,但我发现蹲...
所以我决定一步一步自己做……
这是我的索引
index classifieds
{
source = classifieds_src
path = c:\Sphinx\data\classifieds
docinfo = extern
min_infix_len = 2
infix_fields = title,keywords,summary,text
expand_keywords = 1
enable_star = 1
charset_type = utf-8
charset_table = 0..9, a..z, _, A..Z->a..z,-, U+002C, \
U+010C->U+010D, U+0106->U+0107, U+0160->U+0161, U+017D->U+017E, \
U+010D->c,U+0107->c, U+0161->s, U+017E->z, \
U+010D, U+0107, U+0161, U+017E
}
我将大 Č、Ć Š Ž 映射到对应的小写字母,并添加了从 č 到 c, ć 到 c, š 到 s 和 ž 到 z 最后我将这四个字符添加到表格中......
这些是我的分类标题:
t1: HP USB optična miška za prenosnik RH304 t2: Čiška PCplus MO-U033+F2 (optična, brezžična, PS/2) t3: Miška Logitech optična Nano M235 siva
db 编码:utf8_general_ci 表的编码:utf8_general_ci 标题字段编码:utf8_general_ci
测试用例:
$testcase = array(
"miška",
"mi*ka",
"Čiška",
"čiška",
"miska",
"usb prenosnik",
"prenosnik miska",
"miška usb"
);
//api settings:
$this->sphinx->SetArrayResult(true);
$this->sphinx->setLimits(0, 100);
$this->sphinx->setMatchMode(SPH_MATCH_EXTENDED2);
$this->sphinx->SetSortMode(SPH_SORT_RELEVANCE, '@weight DESC');
$this->sphinx->SetRankingMode(SPH_RANK_PROXIMITY_BM25);
$this->sphinx->SetFieldWeights(array("title"=>100, "keywords"=>80, "summary"=>60,
"text"=>20, "slug"=>10));
最后是测试结果:
关键字(total / total_found) 单词
miška (0/0)
Array
(
[*miška*] => Array
(
[docs] => 0
[hits] => 0
)
[miška] => Array
(
[docs] => 0
[hits] => 0
)
)
mi*ka (0/0)
Array
(
[*mi*] => Array
(
[docs] => 3
[hits] => 4
)
[mi] => Array
(
[docs] => 1
[hits] => 1
)
[*2aka*] => Array
(
[docs] => 0
[hits] => 0
)
[2aka] => Array
(
[docs] => 0
[hits] => 0
)
)
Čiška (0/0)
Array
(
[*čiška*] => Array
(
[docs] => 0
[hits] => 0
)
[čiška] => Array
(
[docs] => 0
[hits] => 0
)
)
čiška (0/0)
Array
(
[*čiška*] => Array
(
[docs] => 0
[hits] => 0
)
[čiška] => Array
(
[docs] => 0
[hits] => 0
)
)
miska (0/0)
Array
(
[*miska*] => Array
(
[docs] => 0
[hits] => 0
)
[miska] => Array
(
[docs] => 0
[hits] => 0
)
)
usb prenosnik (1/1)
Array
(
[*usb*] => Array
(
[docs] => 1
[hits] => 1
)
[usb] => Array
(
[docs] => 1
[hits] => 1
)
[*prenosnik*] => Array
(
[docs] => 1
[hits] => 1
)
[prenosnik] => Array
(
[docs] => 1
[hits] => 1
)
)
prenosnik miska (0/0)
Array
(
[*prenosnik*] => Array
(
[docs] => 1
[hits] => 1
)
[prenosnik] => Array
(
[docs] => 1
[hits] => 1
)
[*miska*] => Array
(
[docs] => 0
[hits] => 0
)
[miska] => Array
(
[docs] => 0
[hits] => 0
)
)
miška usb (0/0)
Array
(
[*miška*] => Array
(
[docs] => 0
[hits] => 0
)
[miška] => Array
(
[docs] => 0
[hits] => 0
)
[*usb*] => Array
(
[docs] => 1
[hits] => 1
)
[usb] => Array
(
[docs] => 1
[hits] => 1
)
)
你可以清楚地看到我只在没有斯洛文尼亚特殊字符的查询中得到积极的结果
拜托,请帮我解决这个问题
【问题讨论】:
-
天啊!我做到了! [在这里找到答案][1] [1]:ryaneby.com/2009/11/21/unicode-and-sphinx.html 我需要将 sql_query_pre = SET CHARACTER_SET_RESULTS=utf8 sql_query_pre = SET NAMES utf8 添加到我的源定义中......显然数据库默认情况下没有连接槽 utf8!呜呜呜
-
我会,但它不会让我:S 100 声望需要...请自己发布,我会确认
标签: php utf-8 sphinx character