|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.tgels.search.rankingalgorithm.RankingQuery
public class RankingQuery
RankingAgorithm is a search library that uses a new scoring algorithm
to rank results accurately and relevantly. RankingAlgorithm is very easy
to use since it uses the Apache Lucene index but ranks and scores
on its own
Three Algorithms are available SIMPLE, SIMPLE1 and COMPLEX. SIMPLE is a very
fast algorithm and can return queries in <50ms on a 10m wikipedia index (complete index).
It can also scale to 100m docs or maybe more. SIMPLE1 is the fastest algorithm
and maybe used in autocomplete types of processing. COMPLEX is a more complex
algorithm so is a little slower compared to the SIMPLE, but can also still return
queries in < 50ms on a 10m wikipedia index (complete index). COMPLEX
is more accurate and should be able to give you the best rankings
as compared to SIMPLE.
RankingAlgorithm can be used in two modes, Document mode (default) and
Product mode. The scoring changes with the mode. In Document mode,
documents are matched for relevancy while in Product mode, documents
are matched for term occurence. Document mode is useful for matching text,
html, rich text pdf/word, books, faq, forums discussions, etc. Product
mode is useful for small text as in Retail/ecommerce product matches, etc.
Programmtic:
rq.setMode(RankingQuery.MODE_DOCUMENT);
Property:
To change MODE, start application with -Dmode=document,
for product, -Dmode=document
You can also set an attribute, scan to fast/medium/full
scan. Fast is the default and the fastest while full scan
is the most accurate but also slow, and takes
lots of memory.
Programmtic:
rq.setAlgorithm(RankingQuery.ALGORITHM_COMPLEX);
Property:
To enable SIMPLE, start application with -Dalgorithm=SIMPLE,
for SIMPLE1, -Dalgorithm=SIMPLE1
for COMPLEX, -Dalgorithm=COMPLEX
You will need to have the Apache Lucene 3.x in the class path. At RankingQuery
instantiation a Lucene IndexSearcher or IndexReader object is needed
as RankingQuery uses the IndexReader to read the documents from the Index.
See examples below ...
Example 1:
RankingQuery rq = new RankingQuery();
IndexSearcher is = new IndexSearcher(index);
StandardAnalyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser(field, analyzer);
Query query = parser.parse(searchterms);
RankingHits rh = rq.search(query, is); //is = Lucene IndexSearcher object
System.out.println("num hits=" + rh.getNumHits() + "--no docs=" + is.maxDoc());
for (int i=0; i<rh.getNumHits() && i<10; i++) {
System.out.println("i=" + i + "--" + rh.score(i) + "--docid=" + rh.docid(i) + "--doc=" + rh.doc(i).get(title) );
}
Example 2:
IndexReader reader = IndexReader.open(FSDirectory.open(new File(index)));
RankingQuery rq = new RankingQuery();
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
QueryParser parser = new QueryParser(Version.LUCENE_30, field, analyzer);
Query query = parser.parse(searchterms);
TopScoreDocCollector tdc = TopScoreDocCollector.create(1000, true);
rq.search(query, null, reader, tdc); //is = Lucene IndexSearcher object
int hits = tdc.getTotalHits();
ScoreDoc sda[] = null;
if (hits > 0) {
sda = tdc.topDocs().scoreDocs;
}
System.out.println("num hits=" + hits + "--no docs=" + reader.maxDoc());
for (int i=0; i<hits && i<10; i++) {
ScoreDoc sd = sda[i];
System.out.println("i=" + i + "--" + sd.score + "--docid=" + sd.doc + "--doc=" + reader.document(sd.doc).get(title) );
}
reader.close();
RankingHits,
RankingScore,
TopScoreDocCollector| Field Summary | |
|---|---|
static int |
ALGORITHM_COMPLEX
|
static int |
ALGORITHM_SIMPLE
|
static int |
ALGORITHM_SIMPLE1
|
static int |
AND
|
static int |
AND_OR
|
static boolean |
debug
|
static int |
MODE_DOCUMENT
|
static int |
MODE_PRODUCT
|
static int |
OR
|
static int |
SCAN_FAST
|
static int |
SCAN_FULL
|
static int |
SCAN_MEDIUM
|
| Constructor Summary | |
|---|---|
RankingQuery()
|
|
RankingQuery(org.apache.lucene.index.IndexReader reader)
Constructor to create a RankingQuery object. |
|
RankingQuery(org.apache.lucene.search.IndexSearcher is)
Constructor to create a RankingQuery object. |
|
RankingQuery(java.lang.String indexPath)
Constructor to create a RankingQuery object. |
|
| Method Summary | |
|---|---|
void |
addToLowerBoostSet(java.lang.String keywords)
Experimental, can change |
void |
close()
Closes the IndexReader objects opened. |
org.apache.lucene.document.Document |
doc(int docid)
Similar to IndexSearcher doc(id), returns a Lucene Document object |
int |
getAlgorithm()
|
int |
getAndOr()
|
int |
getMode()
|
int |
getScan()
|
static void |
log(java.lang.String s)
|
RankingHits |
search(org.apache.lucene.search.Query query)
Search a Lucene index for terms in the query. |
int |
search(org.apache.lucene.search.Query query,
org.apache.lucene.search.Filter filter,
org.apache.lucene.index.IndexReader ir,
org.apache.lucene.search.Collector collector)
Search a Lucene index for terms in the query. |
RankingHits |
search(org.apache.lucene.search.Query query,
org.apache.lucene.search.Filter filter,
org.apache.lucene.index.IndexReader ir,
int docs)
Search a Lucene index for terms in the query. |
RankingHits |
search(org.apache.lucene.search.Query query,
org.apache.lucene.index.IndexReader r)
Search a Lucene index for terms in the query. |
RankingHits |
search(org.apache.lucene.search.Query query,
org.apache.lucene.search.IndexSearcher is)
Search a Lucene index for terms in the query. |
RankingHits |
search(java.lang.String field,
java.lang.String searchTerms)
Search a Lucene index for terms in the query. |
int |
search(org.apache.lucene.search.Weight weight,
org.apache.lucene.search.Filter filter,
org.apache.lucene.index.IndexReader ir,
org.apache.lucene.search.Collector collector)
Similar to Lucene search. |
int |
search(org.apache.lucene.search.Weight weight,
org.apache.lucene.search.Filter filter,
org.apache.lucene.index.IndexReader ir,
org.apache.lucene.search.Collector collector,
org.tgels.search.rankingalgorithm.Parameters parms)
Similar to Lucene search. |
void |
setAlgorithm(int type)
Set algorithm, SIMPLE, SIMPLE1 or COMPLEX. |
void |
setAndOr(int type)
Set And Or or AndOr combinations to get at the results. |
void |
setMode(int type)
Set mode, Document or Product mode. |
void |
setScan(int scan)
Used along with mode on how to scan a document. |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static boolean debug
public static final int ALGORITHM_COMPLEX
public static final int ALGORITHM_SIMPLE
public static final int ALGORITHM_SIMPLE1
public static final int MODE_PRODUCT
public static final int MODE_DOCUMENT
public static final int SCAN_FAST
public static final int SCAN_MEDIUM
public static final int SCAN_FULL
public static final int AND_OR
public static final int AND
public static final int OR
| Constructor Detail |
|---|
public RankingQuery(org.apache.lucene.index.IndexReader reader)
reader - Lucene InndexReader object.public RankingQuery(org.apache.lucene.search.IndexSearcher is)
is - Lucene IndexSearcher object.
public RankingQuery(java.lang.String indexPath)
throws java.lang.Throwable
indexPath - to a Lucene index.
java.lang.Throwablepublic RankingQuery()
| Method Detail |
|---|
public void setScan(int scan)
Programmtic:
rq.setScan(RankingQuery.SCAN_FAST);
Property:
To change SCAN, start application with -Dscan=fast,
for product, -Dscan=product
scan - Valid values are RankingQuery.SCAN_FAST, RankingQuery.SCAN_MEDIUM, RankingQuery.SCAN_FULLpublic int getScan()
public void setMode(int type)
Programmtic:
rq.setMode(RankingQuery.MODE_DOCUMENT);
Property:
To change MODE, start application with -Dmode=document,
for product, -Dmode=document
type - Valid values are RankingQuery.MODE_DOCUMENT or RankingQuery.MODE_PRODUCTpublic int getMode()
public void setAlgorithm(int type)
Programmtic:
rq.setAlgorithm(RankingQuery.ALGORITHM_COMPLEX);
Property:
To enable SIMPLE, start application with -Dalgorithm=SIMPLE,
for SIMPLE1, -Dalgorithm=SIMPLE1
for COMPLEX, -Dalgorithm=COMPLEX
type - Valid values are RankingQuery.ALGORITHM_SIMPLE or RankingQuery.ALGORITHM_SIMPLE1 or RankingQuery.ALGORITHM_COMPLEXpublic int getAlgorithm()
public void setAndOr(int type)
type - Valid values are RankingQuery.AND or RankingQuery.AND_OR or RankingQuery.OR. One can also
set this to any value between 0 and 100 as needed.public int getAndOr()
public void close()
throws java.lang.Throwable
java.lang.ThrowableRankingQuery(String)
public org.apache.lucene.document.Document doc(int docid)
throws java.lang.Throwable
docid - Lucene document id
java.lang.Throwable
public RankingHits search(org.apache.lucene.search.Query query)
throws java.lang.Throwable
query - A Lucene query object
java.lang.ThrowableRankingHits
public RankingHits search(java.lang.String field,
java.lang.String searchTerms)
throws java.lang.Throwable
Example:
RankingQuery rq = new RankingQuery("/lucene/index/perl");
RankingHits rh = rq.search("search_field", "text");
System.out.println("num hits=" + rh.getNumHits() + "--no docs=" + is.maxDoc());
for (int i=0; i<rh.getNumHits() && i<10; i++) {
System.out.println("i=" + i + "--" + rh.score(i) + "--docid=" + rh.docid(i) + "--doc=" + rh.doc(i).get(title) );
}
field - to searchsearchTerms - search terms
java.lang.ThrowableRankingHits
public RankingHits search(org.apache.lucene.search.Query query,
org.apache.lucene.search.IndexSearcher is)
throws java.lang.Throwable
Example 1:
RankingQuery rq = new RankingQuery();
RankingHits rh = rq.search(query, is); //is = Lucene IndexSearcher object
System.out.println("num hits=" + rh.getNumHits() + "--no docs=" + is.maxDoc());
for (int i=0; i<rh.getNumHits() && i<10; i++) {
System.out.println("i=" + i + "--" + rh.score(i) + "--docid=" + rh.docid(i) + "--doc=" + rh.doc(i).get(title) );
}
Example 2:
RankingQuery rq = new RankingQuery(); *
StandardAnalyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser(field, analyzer);
Query query = parser.parse(searchterms);
TopScoreDocCollector tdc = new TopScoreDocCollector();
rq.search(query, null, indexreader, tdc); //is = Lucene IndexSearcher object
int hits = tdc.getTotalHits();
System.out.println("num hits=" + hits + "--no docs=" + indexreader.maxDoc());
for (int i=0; i<hits && i<10; i++) {
ScoreDoc sd = tdc.topDocs().scoreDocs[i]
System.out.println("i=" + i + "--" + sd.score(i) + "--docid=" + sd.doc + "--doc=" + indexreader.document(sd.doc).get(title) );
}
query - Lucene query objectis - is a Lucene IndexSearcher object.
java.lang.ThrowableRankingHits,
RankingScore
public RankingHits search(org.apache.lucene.search.Query query,
org.apache.lucene.index.IndexReader r)
throws java.lang.Throwable
Example:
RankingQuery rq = new RankingQuery();
RankingHits rh = rq.search(query, is); //is = Lucene IndexSearcher object
System.out.println("num hits=" + rh.getNumHits() + "--no docs=" + is.maxDoc());
for (int i=0; i<rh.getNumHits() && i<10; i++) {
System.out.println("i=" + i + "--" + rh.score(i) + "--docid=" + rh.docid(i) + "--doc=" + rh.doc(i).get(title) );
}
query - Lucene query objectr - Lucene IndexSearcher object
java.lang.ThrowableRankingHits
public int search(org.apache.lucene.search.Query query,
org.apache.lucene.search.Filter filter,
org.apache.lucene.index.IndexReader ir,
org.apache.lucene.search.Collector collector)
throws java.lang.Throwable
Example:
IndexReader reader = IndexReader.open(FSDirectory.open(new File(index)));
RankingQuery rq = new RankingQuery();
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
QueryParser parser = new QueryParser(Version.LUCENE_30, field, analyzer);
Query query = parser.parse(searchterms);
TopScoreDocCollector tdc = TopScoreDocCollector.create(1000, true);
rq.search(query, null, reader, tdc); //is = Lucene IndexSearcher object
int hits = tdc.getTotalHits();
ScoreDoc sda[] = null;
if (hits > 0) {
sda = tdc.topDocs().scoreDocs;
}
System.out.println("num hits=" + hits + "--no docs=" + reader.maxDoc());
for (int i=0; i<sda.length && i<10; i++) {
ScoreDoc sd = sda[i];
System.out.println("i=" + i + "--" + sd.score + "--docid=" + sd.doc + "--doc=" + reader.document(sd.doc).get(title) );
}
reader.close();
query - Lucene query objectfilter - is a Lucene filter objectir - is a Lucene IndexReader object.collector - to collect returned results
java.lang.ThrowableRankingHits,
RankingScore
public RankingHits search(org.apache.lucene.search.Query query,
org.apache.lucene.search.Filter filter,
org.apache.lucene.index.IndexReader ir,
int docs)
throws java.lang.Throwable
Example:
RankingQuery rq = new RankingQuery();
RankingHits rh = rq.search(query, filter, ir, 100);
System.out.println("num hits=" + rh.getNumHits() + "--no docs=" + is.maxDoc());
for (int i=0; i<rh.getNumHits() && i<100; i++) {
System.out.println("i=" + i + "--" + rh.score(i) + "--docid=" + rh.docid(i) + "--doc=" + rh.doc(i).get(title) );
}
query - Lucene query objectfilter - is a Lucene filter objectir - is a Lucene IndexReader object.docs - number of top hits
java.lang.ThrowableRankingHits,
RankingScore
public int search(org.apache.lucene.search.Weight weight,
org.apache.lucene.search.Filter filter,
org.apache.lucene.index.IndexReader ir,
org.apache.lucene.search.Collector collector)
throws java.lang.Throwable
Example:
IndexReader reader = IndexReader.open(FSDirectory.open(new File(index)));
RankingQuery rq = new RankingQuery();
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
QueryParser parser = new QueryParser(Version.LUCENE_30, field, analyzer);
Query query = parser.parse(searchterms);
TopScoreDocCollector tdc = TopScoreDocCollector.create(1000, true);
rq.search(query, null, reader, tdc);
int hits = tdc.getTotalHits();
ScoreDoc sda[] = null;
if (hits > 0) {
sda = tdc.topDocs().scoreDocs;
}
System.out.println("num hits=" + hits + "--no docs=" + reader.maxDoc());
for (int i=0; i<sda.length && i<10; i++) {
ScoreDoc sd = sda[i];
System.out.println("i=" + i + "--" + sd.score + "--docid=" + sd.doc + "--doc=" + reader.document(sd.doc).get(title) );
}
reader.close();
weight - Lucene weight objectfilter - is a Lucene filter objectir - is a Lucene IndexReader object.collector - to collect returned results
java.lang.Throwable
public int search(org.apache.lucene.search.Weight weight,
org.apache.lucene.search.Filter filter,
org.apache.lucene.index.IndexReader ir,
org.apache.lucene.search.Collector collector,
org.tgels.search.rankingalgorithm.Parameters parms)
throws java.lang.Throwable
Example:
IndexReader reader = IndexReader.open(FSDirectory.open(new File(index)));
RankingQuery rq = new RankingQuery();
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
QueryParser parser = new QueryParser(Version.LUCENE_30, field, analyzer);
Query query = parser.parse(searchterms);
TopScoreDocCollector tdc = TopScoreDocCollector.create(1000, true);
Parameter parms = new Parameter(rq);
parms.algorithm = RankingQuery.SIMPLE;
rq.search(query, null, reader, tdc, parms);
int hits = tdc.getTotalHits();
ScoreDoc sda[] = null;
if (hits > 0) {
sda = tdc.topDocs().scoreDocs;
System.out.println("num hits=" + hits + "--no docs=" + reader.maxDoc());
for (int i=0; i<sda.length && i<10; i++) {
ScoreDoc sd = sda[i];
System.out.println("i=" + i + "--" + sd.score + "--docid=" + sd.doc + "--doc=" + reader.document(sd.doc).get(title) );
}
}
parms.algorithm = RankingQuery.COMPLEX;
parms.mode = RankingQuery.PRODUCT;
rq.search(query, null, reader, tdc, parms); //is = Lucene IndexSearcher object
reader.close();
weight - Lucene weight objectfilter - is a Lucene filter objectir - is a Lucene IndexReader object.collector - to collect returned resultsparms - list of options
java.lang.Throwablepublic void addToLowerBoostSet(java.lang.String keywords)
public static void log(java.lang.String s)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||