Lucene——创建索引和检索索引

何为Lucene？
Lucene是一个全文检索引擎工具，利用lucene可以快速搭建处属于自己的搜索引擎。并且是一个完全用Java编写的高性能，功能齐全的文本搜索引擎库。它是一种适用于几乎所有需要全文搜索的应用程序的技术，尤其是跨平台搜索。

1.创建索引

public class CreateIndex {


    @Test
    public void deom1() throws IOException {
        //1. 初始化索引库的存储信息并指定输出的文本位置
        FSDirectory fsDirectory = FSDirectory.open(Paths.get("D:\\lucene\\deom"));
        //2. 初始化索引写入器对象
        Analyzer analyzer = new StandardAnalyzer(); 
        IndexWriter indexWriter = new IndexWriter(fsDirectory,new IndexWriterConfig(analyzer));
        //3. 初始化文档对象
        Document document = new Document();
        //Field.Store.YES表示将域值储存到索引库中      Field.Store.YES表示不存储
        document.add(new TextField("content","眼前即是心上人", Field.Store.YES));
        indexWriter.addDocument(document);
        //4. 提交写操作
        indexWriter.commit();
        //5. 释放资源
        indexWriter.close();
       }
    }

2.检索索引

public class SearchIndex {

    @Test
    public void test1() throws IOException {
        //1. 准备检索关键词
        String keyword = "心"; 
        //2. 初始化索引库的存储信息
        FSDirectory fsDirectory = FSDirectory.open(Paths.get("D:\\lucene\\deom"));
        //3. 初始化索引读取器
        IndexReader indexReader = DirectoryReader.open(fsDirectory);
        //4. 初始化索引的检索器
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
        //5. 检索索引库
        // 创建基于词元的查询对象
        Query query = new TermQuery(new Term("content",keyword));
        // 参数一: 查询条件 参数二：获取匹配条件的前n条记录
        // TopDocs 封装了检索结果
        TopDocs topDocs = indexSearcher.search(query, 10);
        System.out.println("命中的结果数量："+topDocs.totalHits);
        // 检索结果
        ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        for (ScoreDoc scoreDoc : scoreDocs) {
            System.out.println("文档的得分："+scoreDoc.score);
            // 数据在索引库中存储的编号
            int docID = scoreDoc.doc;
            System.out.println("docID："+docID);
            Document document = indexReader.document(docID);
            System.out.println(document.get("content"));
        }
            //6. 释放资源
	        indexReader.close();
	    }
    }

数据分类
结构化数据有固定格式或者固定长度的数据数据库表
半结构化数据有一定格式但长度不确定的数据 json xml
非结构化数据没有固定格式或者固定长度的数据如：word、pdf、网页、txt
数据检索
结构化数据： sql语句
半结构化数据：尽可能的将半结构化数据转换成结构化数据 json
非结构数据：想找D盘文件名中含有java关键词的所有文件
顺序扫描逐一进行扫描查找符合条件的文件效率慢检索过程不能复用
全文检索创建索引检索索引（牺牲空间换效率）

3.lucene的检索过程（照片来lucene实战）

Lucene——创建索引和检索索引