lucene建立搜索
lucene建立搜索
使用lucene来完成全文搜索需要两个步骤:
1 建立索引
2 构建查询并运行他们
lucene API可以和很容易的完成这个任务
下面我们来介绍索引的建立和搜索:
一:建立一个Cat实例的索引。
1:需要一个lucene的IndexWriter类的实例,它是lucene索引能力的入口,
同时它允许可查询数据被写到一个索引中。
public class CatSearcher{
String indexDir = "index"; // directory storing index files
private IndexWriter openIndexWriter() throws IOException {
Analyzer analyzer = new StandardAnalyzer();
return new IndexWriter(indexDir, analyzer, false);
}
}
2:构建索引后,建立包含Cat字段的文档,这个文件被索引。
public class CatSearcher{
//...
private Document buildDocument(Cat contact) {
Document document= new Document();
document.add(Field.Keyword("id",
String.valueOf(contact.getId())));
document.add(Field.Text("Name", contact.getName()));
return document;
}
<---------------->
在文件中有四种文件类型被引用:Field.Text, Field.UnIndexed, Field.Keyword
, Field.UnStored,使用的类型取决于字段的内容。
<---------------->
3.建立索引最后一步是向它添加文件
public class CatSearcher{
//...
private void index(Cat contact) throws IOException {
IndexWriter writer = openIndexWriter();
Document document = buildDocument(contact);
writer.addDocument(document);
writer.optimize();
writer.close();
}
批量添加要搜索的Document的时候,在批量末尾调用optimize()更有效。
optimize():优化磁盘中的索引以便进行更有效的检索。
二:搜索文件
public class CatSearcher{
//...
public Hits search(String fieldname, String criteria)
throws ParseException, IOException {
// open IndexSearcher
IndexSearcher searcher = new IndexSearcher(indexDir);
try {
Query query = buildQuery(fieldname, criteria);
Hits hits = searcher.search(query);
return hits;
} finally {
// searcher.close();
}
}
其中Query对象由bulideQuery()生成
public class CatSearcher{
//...
private Query buildQuery(String fieldName, String criteria)
throws ParseException {
Analyzer analyzer = new CustomAnalyzer();
QueryParser parser = new QueryParser(fieldName, analyzer);
return parser.parse(criteria);
}
<--------------------->
说了这么多,还是让我们来看看如何使用这些类进行搜索:
CatSearcher searcher = new CatSearcher();
//perform search
Hits hits = searcher.search("Name", "Bar or fred");
if (hits.length() == 0) {
// no results found
System.out.println("No results found");
} else {
// iterate over results
for(int i = 0; i < hits.length(); i++) {
Document document = hits.doc(i);
System.out.println("--- Result " + i);
System.out.println("Name: " + document.get("Name"));
System.out.println("ID : " + document.get("id"));
System.out.println("Score : " + hits.score(i));
}
}
最后介绍一下如何生成自定义Analyzer 类以及如何使用TokenFilters
还是举例说明:
public class CustomAnalyzer extends Analyzer {
/**
* Processes the input by first converting it to
* lower case, then by eliminating stop words, and
* finally by performing Porter stemming on it.
*
* @param reader the Reader that provides access to the input text
* @return an instance of TokenStream
*/
public TokenStream tokenStream(java.lang.String string, Reader reader) {
LetterTokenizer tokenizer = new LetterTokenizer(reader);
TokenStream result = null;
result = new LowerCaseFilter(tokenizer);
result = new StopFilter(result, StopAnalyzer.ENGLISH_STOP_WORDS);
result = new PorterStemFilter(result);
return result;
}
}
CustomAnalyZer扩展了Analyzer抽象基类,并且通过重新定义tokenstream()
方法用自定义方式标识文本。使用了三中不同的TokenFilter类:LowerCaseFilter
StopFilter PorterStemFilter 他们各自有不同的作用能够使得你的搜索更聪明。
使用lucene来完成全文搜索需要两个步骤:
1 建立索引
2 构建查询并运行他们
lucene API可以和很容易的完成这个任务
下面我们来介绍索引的建立和搜索:
一:建立一个Cat实例的索引。
1:需要一个lucene的IndexWriter类的实例,它是lucene索引能力的入口,
同时它允许可查询数据被写到一个索引中。
public class CatSearcher{
String indexDir = "index"; // directory storing index files
private IndexWriter openIndexWriter() throws IOException {
Analyzer analyzer = new StandardAnalyzer();
return new IndexWriter(indexDir, analyzer, false);
}
}
2:构建索引后,建立包含Cat字段的文档,这个文件被索引。
public class CatSearcher{
//...
private Document buildDocument(Cat contact) {
Document document= new Document();
document.add(Field.Keyword("id",
String.valueOf(contact.getId())));
document.add(Field.Text("Name", contact.getName()));
return document;
}
<---------------->
在文件中有四种文件类型被引用:Field.Text, Field.UnIndexed, Field.Keyword
, Field.UnStored,使用的类型取决于字段的内容。
<---------------->
3.建立索引最后一步是向它添加文件
public class CatSearcher{
//...
private void index(Cat contact) throws IOException {
IndexWriter writer = openIndexWriter();
Document document = buildDocument(contact);
writer.addDocument(document);
writer.optimize();
writer.close();
}
批量添加要搜索的Document的时候,在批量末尾调用optimize()更有效。
optimize():优化磁盘中的索引以便进行更有效的检索。
二:搜索文件
public class CatSearcher{
//...
public Hits search(String fieldname, String criteria)
throws ParseException, IOException {
// open IndexSearcher
IndexSearcher searcher = new IndexSearcher(indexDir);
try {
Query query = buildQuery(fieldname, criteria);
Hits hits = searcher.search(query);
return hits;
} finally {
// searcher.close();
}
}
其中Query对象由bulideQuery()生成
public class CatSearcher{
//...
private Query buildQuery(String fieldName, String criteria)
throws ParseException {
Analyzer analyzer = new CustomAnalyzer();
QueryParser parser = new QueryParser(fieldName, analyzer);
return parser.parse(criteria);
}
<--------------------->
说了这么多,还是让我们来看看如何使用这些类进行搜索:
CatSearcher searcher = new CatSearcher();
//perform search
Hits hits = searcher.search("Name", "Bar or fred");
if (hits.length() == 0) {
// no results found
System.out.println("No results found");
} else {
// iterate over results
for(int i = 0; i < hits.length(); i++) {
Document document = hits.doc(i);
System.out.println("--- Result " + i);
System.out.println("Name: " + document.get("Name"));
System.out.println("ID : " + document.get("id"));
System.out.println("Score : " + hits.score(i));
}
}
最后介绍一下如何生成自定义Analyzer 类以及如何使用TokenFilters
还是举例说明:
public class CustomAnalyzer extends Analyzer {
/**
* Processes the input by first converting it to
* lower case, then by eliminating stop words, and
* finally by performing Porter stemming on it.
*
* @param reader the Reader that provides access to the input text
* @return an instance of TokenStream
*/
public TokenStream tokenStream(java.lang.String string, Reader reader) {
LetterTokenizer tokenizer = new LetterTokenizer(reader);
TokenStream result = null;
result = new LowerCaseFilter(tokenizer);
result = new StopFilter(result, StopAnalyzer.ENGLISH_STOP_WORDS);
result = new PorterStemFilter(result);
return result;
}
}
CustomAnalyZer扩展了Analyzer抽象基类,并且通过重新定义tokenstream()
方法用自定义方式标识文本。使用了三中不同的TokenFilter类:LowerCaseFilter
StopFilter PorterStemFilter 他们各自有不同的作用能够使得你的搜索更聪明。
air_tuyh
2005-05-14 21:07:56
评论:0
阅读:2320
引用:0
