|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.tgels.search.rankingalgorithm.RankingQuery
public class RankingQuery
The RankingAlgorithm implementation, uses Apache Lucene to get documents from
the index but scores and ranks using the RankingAlgorithm
Two Algorithm are available SIMPLE and COMPLEX. SIMPLE is a very
fast algorithm and can return queries in <100ms on a 10m wikipedia index (complete index).
It can also scale to 100m docs or maybe more. COMPLEX is a more complex
algorithm so is a little slower compared to the SIMPLE, but can also still return
queries in < 100ms on a 10m wikipedia index (complete index). COMPLEX
is more accurate and should be able to give you the best rankings
as compared to SIMPLE.
RankingAlgorithm can be used in two modes, Document mode (default) and
Product mode. The scoring changes with the mode. In Document mode,
documents are matched for relevancy while in Product mode, documents
are matched for term occurence. Document mode is useful for matching text,
html, rich text pdf/word, books, faq, forums discussions, etc. Product
mode is useful for small text as in Retail/ecommerce product matches, etc.
Programmtic:
rq.setMode(RankingQuery.MODE_DOCUMENT);
Property:
To change MODE, start application with -Dmode=document,
for product, -Dmode=document
You can also set an attribute, scan to fast/medium/full
scan. Fast is the default and the fastest while full scan
is the most accurate but also slow, and takes
lots of memory.
Programmtic:
rq.setAlgorithm(RankingQuery.ALGORITHM_COMPLEX);
Property:
To enable SIMPLE, start application with -Dalgorithm=SIMPLE,
for COMPLEX, -Dalgorithm=COMPLEX
You will need to have the Apache Lucene 3.x in the class path. At RankingQuery
instantiation a Lucene IndexSearcher or IndexReader object is needed
as RankingQuery uses the IndexReader to read the documents from the Index.
See examples below ...
Example 1:
RankingQuery rq = new RankingQuery();
IndexSearcher is = new IndexSearcher(index);
StandardAnalyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser(field, analyzer);
Query query = parser.parse(searchterms);
RankingHits rh = rq.search(query, is); //is = Lucene IndexSearcher object
System.out.println("num hits=" + rh.getNumHits() + "--no docs=" + is.maxDoc());
for (int i=0; i<rh.getNumHits() && i<10; i++) {
System.out.println("i=" + i + "--" + rh.score(i) + "--docid=" + rh.docid(i) + "--doc=" + rh.doc(i).get(title) );
}
Example 2: [ Much faster and uses very little memory, scales upto 100m docs ]
IndexReader reader = IndexReader.open(FSDirectory.open(new File(index)));
RankingQuery rq = new RankingQuery();
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
QueryParser parser = new QueryParser(Version.LUCENE_30, field, analyzer);
Query query = parser.parse(searchterms);
TopScoreDocCollector tdc = TopScoreDocCollector.create(1000, true);
rq.search(query, null, reader, tdc); //is = Lucene IndexSearcher object
int hits = tdc.getTotalHits();
ScoreDoc sda[] = null;
if (hits > 0) {
sda = tdc.topDocs().scoreDocs;
}
System.out.println("num hits=" + hits + "--no docs=" + reader.maxDoc());
for (int i=0; i<hits && i<10; i++) {
ScoreDoc sd = sda[i];
System.out.println("i=" + i + "--" + sd.score + "--docid=" + sd.doc + "--doc=" + reader.document(sd.doc).get(title) );
}
reader.close();
RankingHits,
RankingScore,
TopScoreDocCollector| Field Summary | |
|---|---|
static int |
ALGORITHM_COMPLEX
|
static int |
ALGORITHM_SIMPLE
|
static int |
AND
|
static int |
AND_OR
|
static int |
MODE_DOCUMENT
|
static int |
MODE_PRODUCT
|
static int |
OR
|
static int |
SCAN_FAST
|
static int |
SCAN_FULL
|
static int |
SCAN_MEDIUM
|
| Constructor Summary | |
|---|---|
RankingQuery()
|
|
RankingQuery(org.apache.lucene.index.IndexReader reader)
Constructor to create a RankingQuery object. |
|
RankingQuery(org.apache.lucene.search.IndexSearcher is)
Constructor to create a RankingQuery object. |
|
RankingQuery(java.lang.String indexPath)
Constructor to create a RankingQuery object. |
|
| Method Summary | |
|---|---|
void |
addToLowerBoostSet(java.lang.String keywords)
Experimental, can change |
void |
close()
Closes the IndexReader objects opened. |
org.apache.lucene.document.Document |
doc(int docid)
Similar to IndexSearcher doc(id), returns a Lucene Document object |
int |
getAlgorithm()
|
int |
getAndOr()
|
int |
getMode()
|
int |
getScan()
|
RankingHits |
search(org.apache.lucene.search.Query query)
Search a Lucene index for terms in the query. |
RankingHits |
search(org.apache.lucene.search.Query query,
org.apache.lucene.search.Filter filter,
org.apache.lucene.index.IndexReader ir,
org.apache.lucene.search.Collector collector)
Search a Lucene index for terms in the query. |
RankingHits |
search(org.apache.lucene.search.Query query,
org.apache.lucene.index.IndexReader r)
Search a Lucene index for terms in the query. |
RankingHits |
search(org.apache.lucene.search.Query query,
org.apache.lucene.search.IndexSearcher is)
Search a Lucene index for terms in the query. |
RankingHits |
search(java.lang.String field,
java.lang.String searchTerms)
Search a Lucene index for terms in the query. |
RankingHits |
search(org.apache.lucene.search.Weight weight,
org.apache.lucene.search.Filter filter,
org.apache.lucene.index.IndexReader ir,
org.apache.lucene.search.Collector collector)
Similar to Lucene search. |
void |
setAlgorithm(int type)
Set algorithm, SIMPLE or COMPLEX. |
void |
setAndOr(int type)
Set And Or or AndOr combinations to get at the results. |
void |
setMode(int type)
Set mode, Document or Product mode. |
void |
setScan(int scan)
Used along with mode on how to scan a document. |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final int ALGORITHM_COMPLEX
public static final int ALGORITHM_SIMPLE
public static final int MODE_PRODUCT
public static final int MODE_DOCUMENT
public static final int SCAN_FAST
public static final int SCAN_MEDIUM
public static final int SCAN_FULL
public static final int AND_OR
public static final int AND
public static final int OR
| Constructor Detail |
|---|
public RankingQuery(org.apache.lucene.index.IndexReader reader)
reader - Lucene InndexReader object.public RankingQuery(org.apache.lucene.search.IndexSearcher is)
is - Lucene IndexSearcher object.
public RankingQuery(java.lang.String indexPath)
throws java.lang.Throwable
indexPath - to a Lucene index.
java.lang.Throwablepublic RankingQuery()
| Method Detail |
|---|
public void setScan(int scan)
Programmtic:
rq.setScan(RankingQuery.SCAN_FAST);
Property:
To change SCAN, start application with -Dscan=fast,
for product, -Dscan=product
scan - Valid values are RankingQuery.SCAN_FAST, RankingQuery.SCAN_MEDIUM, RankingQuery.SCAN_FULLpublic int getScan()
public void setMode(int type)
Programmtic:
rq.setMode(RankingQuery.MODE_DOCUMENT);
Property:
To change MODE, start application with -Dmode=document,
for product, -Dmode=document
type - Valid values are RankingQuery.MODE_DOCUMENT or RankingQuery.MODE_PRODUCTpublic int getMode()
public void setAlgorithm(int type)
Programmtic:
rq.setAlgorithm(RankingQuery.ALGORITHM_COMPLEX);
Property:
To enable SIMPLE, start application with -Dalgorithm=SIMPLE,
for COMPLEX, -Dalgorithm=COMPLEX
type - Valid values are RankingQuery.ALGORITHM_SIMPLE or RankingQuery.ALGORITHM_COMPLEXpublic int getAlgorithm()
public void setAndOr(int type)
type - Valid values are RankingQuery.AND or RankingQuery.AND_OR or RankingQuery.OR. One can also
set this to any value between 0 and 100 as needed.public int getAndOr()
public void close()
throws java.lang.Throwable
java.lang.ThrowableRankingQuery(String)
public org.apache.lucene.document.Document doc(int docid)
throws java.lang.Throwable
docid - Lucene document id
java.lang.Throwable
public RankingHits search(org.apache.lucene.search.Query query)
throws java.lang.Throwable
query - A Lucene query object
java.lang.ThrowableRankingHits
public RankingHits search(java.lang.String field,
java.lang.String searchTerms)
throws java.lang.Throwable
Example:
RankingQuery rq = new RankingQuery("/lucene/index/perl");
RankingHits rh = rq.search("search_field", "text");
System.out.println("num hits=" + rh.getNumHits() + "--no docs=" + is.maxDoc());
for (int i=0; i<rh.getNumHits() && i<10; i++) {
System.out.println("i=" + i + "--" + rh.score(i) + "--docid=" + rh.docid(i) + "--doc=" + rh.doc(i).get(title) );
}
field - to searchsearchTerms - search terms
java.lang.ThrowableRankingHits
public RankingHits search(org.apache.lucene.search.Query query,
org.apache.lucene.search.IndexSearcher is)
throws java.lang.Throwable
Example 1:
RankingQuery rq = new RankingQuery();
RankingHits rh = rq.search(query, is); //is = Lucene IndexSearcher object
System.out.println("num hits=" + rh.getNumHits() + "--no docs=" + is.maxDoc());
for (int i=0; i<rh.getNumHits() && i<10; i++) {
System.out.println("i=" + i + "--" + rh.score(i) + "--docid=" + rh.docid(i) + "--doc=" + rh.doc(i).get(title) );
}
Example 2: [ Much faster and uses very little memory, scales upto 100m docs ]
RankingQuery rq = new RankingQuery(); *
StandardAnalyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser(field, analyzer);
Query query = parser.parse(searchterms);
TopScoreDocCollector tdc = new TopScoreDocCollector();
rq.search(query, null, indexreader, tdc); //is = Lucene IndexSearcher object
int hits = tdc.getTotalHits();
System.out.println("num hits=" + hits + "--no docs=" + indexreader.maxDoc());
for (int i=0; i<hits && i<10; i++) {
ScoreDoc sd = tdc.topDocs().scoreDocs[i]
System.out.println("i=" + i + "--" + sd.score(i) + "--docid=" + sd.doc + "--doc=" + indexreader.document(sd.doc).get(title) );
}
query - Lucene query objectis - is a Lucene IndexSearcher object.
java.lang.ThrowableRankingHits,
RankingScore
public RankingHits search(org.apache.lucene.search.Query query,
org.apache.lucene.index.IndexReader r)
throws java.lang.Throwable
Example:
RankingQuery rq = new RankingQuery();
RankingHits rh = rq.search(query, is); //is = Lucene IndexSearcher object
System.out.println("num hits=" + rh.getNumHits() + "--no docs=" + is.maxDoc());
for (int i=0; i<rh.getNumHits() && i<10; i++) {
System.out.println("i=" + i + "--" + rh.score(i) + "--docid=" + rh.docid(i) + "--doc=" + rh.doc(i).get(title) );
}
query - Lucene query objectr - Lucene IndexSearcher object
java.lang.ThrowableRankingHits
public RankingHits search(org.apache.lucene.search.Query query,
org.apache.lucene.search.Filter filter,
org.apache.lucene.index.IndexReader ir,
org.apache.lucene.search.Collector collector)
throws java.lang.Throwable
Example:
RankingQuery rq = new RankingQuery();
RankingHits rh = rq.search(query, filter, ir, collector); //is = Lucene IndexSearcher object
System.out.println("num hits=" + rh.getNumHits() + "--no docs=" + is.maxDoc());
for (int i=0; i<rh.getNumHits() && i<10; i++) {
System.out.println("i=" + i + "--" + rh.score(i) + "--docid=" + rh.docid(i) + "--doc=" + rh.doc(i).get(title) );
}
Example 2: [ Much faster and uses very little memory, scales upto 100m docs ]
IndexReader reader = IndexReader.open(FSDirectory.open(new File(index)));
RankingQuery rq = new RankingQuery();
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
QueryParser parser = new QueryParser(Version.LUCENE_30, field, analyzer);
Query query = parser.parse(searchterms);
TopScoreDocCollector tdc = TopScoreDocCollector.create(1000, true);
rq.search(query, null, reader, tdc); //is = Lucene IndexSearcher object
int hits = tdc.getTotalHits();
ScoreDoc sda[] = null;
if (hits > 0) {
sda = tdc.topDocs().scoreDocs;
}
System.out.println("num hits=" + hits + "--no docs=" + reader.maxDoc());
for (int i=0; i<hits && i<10; i++) {
ScoreDoc sd = sda[i];
System.out.println("i=" + i + "--" + sd.score + "--docid=" + sd.doc + "--doc=" + reader.document(sd.doc).get(title) );
}
reader.close();
query - Lucene query objectfilter - is a Lucene filter objectir - is a Lucene IndexReader object.collector - to collect returned results
java.lang.ThrowableRankingHits,
RankingScore
public RankingHits search(org.apache.lucene.search.Weight weight,
org.apache.lucene.search.Filter filter,
org.apache.lucene.index.IndexReader ir,
org.apache.lucene.search.Collector collector)
throws java.lang.Throwable
Example:
RankingQuery rq = new RankingQuery();
RankingHits rh = rq.search(weight, filter, ir, collector); //is = Lucene IndexSearcher object
System.out.println("num hits=" + rh.getNumHits() + "--no docs=" + is.maxDoc());
for (int i=0; i<rh.getNumHits() && i<10; i++) {
System.out.println("i=" + i + "--" + rh.score(i) + "--docid=" + rh.docid(i) + "--doc=" + rh.doc(i).get(title) );
}
Example 2: [ Much faster and uses very little memory, scales upto 100m docs ]
IndexReader reader = IndexReader.open(FSDirectory.open(new File(index)));
RankingQuery rq = new RankingQuery();
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
QueryParser parser = new QueryParser(Version.LUCENE_30, field, analyzer);
Query query = parser.parse(searchterms);
TopScoreDocCollector tdc = TopScoreDocCollector.create(1000, true);
rq.search(query, null, reader, tdc); //is = Lucene IndexSearcher object
int hits = tdc.getTotalHits();
ScoreDoc sda[] = null;
if (hits > 0) {
sda = tdc.topDocs().scoreDocs;
}
System.out.println("num hits=" + hits + "--no docs=" + reader.maxDoc());
for (int i=0; i<hits && i<10; i++) {
ScoreDoc sd = sda[i];
System.out.println("i=" + i + "--" + sd.score + "--docid=" + sd.doc + "--doc=" + reader.document(sd.doc).get(title) );
}
reader.close();
weight - Lucene weight objectfilter - is a Lucene filter objectir - is a Lucene IndexReader object.collector - to collect returned results
java.lang.ThrowableRankingHits,
RankingScorepublic void addToLowerBoostSet(java.lang.String keywords)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||