【问题标题】:How to apply search query for special characters like @ in Zend_Search_Lucene?如何对 Zend_Search_Lucene 中的 @ 等特殊字符应用搜索查询?
【发布时间】:2012-05-29 09:24:51
【问题描述】:

Zend_Search_Lucene 中,我使用以下代码进行索引,并且我已更改默认分析器以搜索数值。

public function executeIndexIT() {

   $path = '/home/project/mgh/lib/';
   set_include_path(get_include_path() . PATH_SEPARATOR . $path);       
   require_once '/home/project/mgh/lib/Zend/Search/Lucene.php';

   Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive());

   $index = new Zend_Search_Lucene('/home/project/mgh/data/search_file/lucene.customer.index',true);

   $filenames1='/home/project/mgh/web/cvcollection/data8/ASBABranches10546.pdf';
   $filenames2='/home/project/mgh/web/cvcollection/data2/manoj_new10550.pdf';

   $fc1=htmlentities("'".$this->ConvertPDF($filenames1)."'");       
   $fc2=htmlentities("'".$this->ConvertPDF($filenames2)."'");

   $doc = new Zend_Search_Lucene_Document();
   $doc->addField(Zend_Search_Lucene_Field::unIndexed('URL', $filenames1));
   $doc->addField(Zend_Search_Lucene_Field::text('contents',$fc1));     
   $index->addDocument($doc);

   $doc = new Zend_Search_Lucene_Document();
   $doc->addField(Zend_Search_Lucene_Field::unIndexed('URL', $filenames2));
   $doc->addField(Zend_Search_Lucene_Field::text('contents',$fc2));     
   $index->addDocument($doc);

   $index->commit();
   exit;
}

在为搜索建立索引后,我正在使用以下代码:

public function executeSearchLucene() {

    $path = '/home/project/mgh/lib/';
    set_include_path(get_include_path() . PATH_SEPARATOR . $path);
    require_once('Zend/Search/Lucene.php');

    Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive());

    $hits = array();
    $txtSearch='@';
    try {
        $query = Zend_Search_Lucene_Search_QueryParser::parse($txtSearch);
    } catch (Zend_Search_Lucene_Search_QueryParserException $e) {
        echo "Query syntax error: " . $e->getMessage() . "\n";
    }

    $index = new Zend_Search_Lucene('/home/project/mgh/data/search_file/lucene.customer.index');

    //**added on 29 may**/      
    $results = $index->find($query);
    echo count($results);
    foreach ( $results as $result ) {
        echo "<pre>";
        var_dump($result->URL); 
   }
   exit;
}

这里$fc2 包含几个电子邮件地址,我需要搜索它们。 但我得到 0 次点击。

如何使用Zend_Search_Lucene 搜索@! 等字符?

【问题讨论】:

    标签: php zend-framework utf-8 special-characters zend-search-lucene


    【解决方案1】:

    它仅适用于 keyword 字段,因为它们未标记化。因此,您需要确保将电子邮件(或其他带有特殊字符的文本)作为单独的数据提供,例如示例。您也不能使用查询解析器,因为查询解析器会将其转换为 Zend_Search_Lucene_Search_Query_Preprocessing_Term 对象:

    echo('<pre>');
    var_dump(Zend_Search_Lucene_Search_QueryParser::parse("*@*"));
    var_dump(Zend_Search_Lucene_Search_QueryParser::parse("@"));
    echo('</pre>');
    die();
    

    根据文档:

    实际上并不参与查询执行

    所以工作代码如下:

    $index = Zend_Search_Lucene::create('/tmp/index');
    
    $doc1 = new Zend_Search_Lucene_Document;
    $doc1->addField(Zend_Search_Lucene_Field::text('title', 'Some Title Here'))
        ->addField(Zend_Search_Lucene_Field::keyword('content', 'test@test.com'));
    $index->addDocument($doc1);
    
    $doc2 = new Zend_Search_Lucene_Document;
    $doc2->addField(Zend_Search_Lucene_Field::text('title', 'Another title Here'))
        ->addField(Zend_Search_Lucene_Field::keyword('content', 'test!test.com'));
    $index->addDocument($doc2);
    
    $index->commit();
    
    Zend_Search_Lucene_Search_Query_Wildcard::setMinPrefixLength(0);
    $term  = new Zend_Search_Lucene_Index_Term("*@*");
    $query = new Zend_Search_Lucene_Search_Query_Wildcard($term);
    
    $hits = $index->find($query);
    echo('<pre>');
    var_dump(count($hits));
    foreach($hits as $hit) {
        var_dump($hit->title);
        var_dump($hit->content);
    }
    echo('</pre>');
    
    Zend_Search_Lucene_Search_Query_Wildcard::setMinPrefixLength(0);
    $term  = new Zend_Search_Lucene_Index_Term("*!*");
    $query = new Zend_Search_Lucene_Search_Query_Wildcard($term);
    
    $hits = $index->find($query);
    echo('<pre>');
    var_dump(count($hits));
    foreach($hits as $hit) {
        var_dump($hit->title);
        var_dump($hit->content);
    }
    echo('</pre>');
    
    die();
    

    希望现在很清楚。 Zend Lucene 实现有很多限制。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2015-07-19
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-05-22
      • 2018-08-24
      相关资源
      最近更新 更多