Case Study

Integrating RankingAlgorithm Search Engine with JAMWiki

Nagendra Nagarajayya
rankingalgorithm.tgels.com
(download a pdf version)

Introduction

JAMWiki is a java based wiki :

  • Uses the same syntax and offers many of the features of MediaWiki
  • Setup is quick and easy - no external database is required!
  • Supports almost any application server running Java 5 or later.
  • Mature code base, with more than 3000 commits since the project started in 2006.

JAMWiki search uses Lucene library as the default and only search engine. The task was to add RankingAlgorithm library as an additonal search engine option that the admin user could configure using the admin configuration wiki page, and also start JAMWiki with the RankingAlgorithm as the default search engine.

Details

Step 1: Integrating RankingAlgorithm was very easy since only a few lines of code were needed to be changed to replace Lucene as the default library for searching. The method findResults in LuceneSearchEngine.java was modifed to first test that RankingAlgorithm worked with JAMWiki as a search engine. The findResults method was modified .as below:

/* Step 1, first the Lucene searcher.search was commented out as below */
			if (false) {
				searcher.search(rewrittenQuery, collector);

				Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter("<span class=\"highlight\">", "</span>"), new SimpleHTMLEncoder(), new QueryScorer(rewrittenQuery));
				ScoreDoc[] hits = collector.topDocs().scoreDocs;
				for (int i = 0; i < hits.length && i < MAXIMUM_RESULTS_PER_SEARCH; i++) {
					int docId = hits[i].doc;
					Document doc = searcher.doc(docId);
					String summary = retrieveResultSummary(doc, highlighter, analyzer);
					SearchResultEntry result = new SearchResultEntry();
					result.setRanking(hits[i].score);
					result.setTopic(doc.get(ITYPE_TOPIC_PLAIN));
					result.setSummary(summary);
					results.add(result);
				}
			}else {
				Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter("<span class=\"highlight\">", "</span>"), new SimpleHTMLEncoder(), new QueryScorer(rewrittenQuery));
/* Step 2, the RankingAlgorithm code was added as below */
				try {
					RankingQuery rq = new RankingQuery();
					IndexSearcher is = (IndexSearcher)searcher;
					RankingHits hits = rq.search(rewrittenQuery, is);
					for (int i = 0; i < hits.length(); i++) {
						int docId = hits.docid(i);
						Document doc = searcher.doc(docId);
						String summary = retrieveResultSummary(doc, highlighter, analyzer);
						SearchResultEntry result = new SearchResultEntry();
						result.setRanking(hits.score(i));
						result.setTopic(doc.get(ITYPE_TOPIC_PLAIN));
						result.setSummary(summary);
						results.add(result);
					}
				} catch(Throwable t) {t.printStackTrace();}		
			}

The RankingAlgorithm library was added as a maven build dependency in pom.xml at the system scope:

		<dependency>
                    <groupId>rankingalgorithm</groupId>
                    <artifactId>rankingalgorithm</artifactId>
                    <version>3.0</version>
                    <scope>system</scope>
                    <systemPath>/eneeds/fs/jamwiki/jamwiki_latest/rankingalgorithm-3.0.jar</systemPath>
                </dependency>

This was then compiled with mvn package command ( maven is the build system for JAMWiki ). The package was then deployed on Glassfish application server. JAMWiki was then configured to use an existing wiki deployment on mysql.

Step 2: Next tests were done by initiating different searches to see if things including highlighting worked with RankingAlgorithm as the default search engine. The wiki content used in the test was the solr-ra wiki available at http://solr-ra.tgels.com/wiki. Tests were also done with Lucene as the default search engine. See test result below for the query “comparison with google”. Fig 1 shows results for the search with the RankingAlgorithm Search Engine and Fig 2 shows the results for the search with the Lucene Search Engine.

See test results:

Fig 1


case study JAMWiki Fig1.png

Fig 2


case study JAMWiki Fig2.png



Step 3: Once it was confirmed search worked well, RankingAlgorithm was integrated by adding a new Java class RankingAlgorithmSearchEngine.java by copying the source from LuceneSearchEngine.java The findResults method was then changed as below so as to now use the RankingAlgorithm library for search, the rest, creating an index, opening an index, etc. were left as it is so as to use the Lucene librray. (If the LuceneSearchEngine.java modifiers were all protected or public then RankingAlgorithmSearchEngine could have been derived from LuceneSearchEngine.java overriding only the findResults method). The RankingAlgorithm uses a reference to the opened IndexSearcher object to retrieve the Lucene IndexReader which is used to read the terms from the index. Once the terms are read the scoring and ranking is done by RankingAlgorithm and the results are passed back to JAMWiki. See code below:

public List<SearchResultEntry> findResults(String virtualWiki, String text) {
		StandardAnalyzer analyzer = new StandardAnalyzer(USE_LUCENE_VERSION);
		List<SearchResultEntry> results = new ArrayList<SearchResultEntry>();
		logger.trace("search text: " + text);
		try {
			BooleanQuery query = new BooleanQuery();
			QueryParser qp;
			qp = new QueryParser(USE_LUCENE_VERSION, ITYPE_TOPIC, analyzer);
			query.add(qp.parse(text), Occur.SHOULD);
			qp = new QueryParser(USE_LUCENE_VERSION, ITYPE_CONTENT, analyzer);
			query.add(qp.parse(text), Occur.SHOULD);
			Searcher searcher = this.retrieveIndexSearcher(virtualWiki);
			// rewrite the query to expand it - required for wildcards to work with highlighter
			Query rewrittenQuery = searcher.rewrite(query);
			// actually perform the search
			TopScoreDocCollector collector = TopScoreDocCollector.create(MAXIMUM_RESULTS_PER_SEARCH, true);		
			Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter("<span class=\"highlight\">", "</span>"), new SimpleHTMLEncoder(), new QueryScorer(rewrittenQuery));
		/* RankingAlgorithm  code */
			try {
/* Step 1  Create RankinQuery object */
				RankingQuery rq = new RankingQuery();
/* Step 2  Change Searcher back to IndexSearcher */
				IndexSearcher is = (IndexSearcher)searcher;
/* Step 3  call search method with the query terms and IndexSearcher */
				RankingHits hits = rq.search(rewrittenQuery, is);
/* Step 4  Loop through the return results */
				for (int i = 0; i < hits.length(); i++) {
					int docId = hits.docid(i);
					Document doc = searcher.doc(docId);
					String summary = retrieveResultSummary(doc, highlighter, analyzer);
					SearchResultEntry result = new SearchResultEntry();
					result.setRanking(hits.score(i));
					result.setTopic(doc.get(ITYPE_TOPIC_PLAIN));
					result.setSummary(summary);
					results.add(result);
				}
			} catch(Throwable t) {t.printStackTrace();}
/* Step 5, thats it */
			
		} catch (Exception e) {
			logger.error("Exception while searching for " + text, e);
		}
		return results;
	}

Step 4: RankingAlgorithm was also added to the JAMWiki configuration XML file and also jamwiki.properties and search.jsp and search-results.jsp files.

Change to search.jsp and search-results.jsp:

<% if (org.jamwiki.Environment.getValue(org.jamwiki.Environment.PROP_BASE_SEARCH_ENGINE).contains("RankingAlgorithm")) { %>
<font size="3"><b><[http://rankingalgorithm.tgels.com>RankingAlgorithm]</b></font>
<% }else { %>
[http://lucene.apache.org/java/ <img src="../images/lucene_green_100.gif" alt="Lucene" border="0" />]
<% } %>

(A more elegant way may be adding a method to SearchEngine.java interface that provides the SearchEngine identifier and transfer link making this more easier)

Change to jamwiki-configuration.xml:

<search-engines>
	<search-engine>
		<name>RankingAlgorithm</name>
		<class>org.jamwiki.search.RankingAlgorithmSearchEngine</class>
		<key>admin.searchengine.rankingalgorithm</key>
	</search-engine>
	<search-engine>
		<name>Lucene</name>
		<class>org.jamwiki.search.LuceneSearchEngine</class>
		<key>admin.searchengine.lucene</key>
	</search-engine>
</search-engines>

The RankingAlgorithm search engine was also added to the ApplicationResources.properties as below:

admin.searchengine.lucene=Lucene Search Engine
admin.searchengine.rankingalgorithm=RankingAlgorithm Search Engine

To make the RankingAlgorithm as the default search engine at deployment, the attribute PROP_BASE_SEARCH_ENGINE was set to RANKINGALGORITHM as below in Environment.java

this.defaults.setProperty(PROP_BASE_SEARCH_ENGINE, SearchEngine.SEARCH_ENGINE_RANKINGALGORITHM);

This enables the search-engine attribute to be set to RankingAlgorithm during the initial configuration in jamwiki.properties that is generated as below: search-engine=org.jamwiki.search.RankingAlgorithmSearchEngine making the RankingAlgorithm SearchEngine as the default.

A new attribute SEARCH_ENGINE_RANKINGALGORITHM was added to SearchEngine.java as below:

public static final String SEARCH_ENGINE_RANKINGALGORITHM = "org.jamwiki.search.RankingAlgorithmSearchEngine";

Step 5:

The maven build process was modified so that rankingalgorithm artifact could now be found in the local repository as well as bundled into the war file by first adding rankingalgorithm-3.0.jar artifact to the local maven repository as below:

The rankingalgorithm dependency was changed from system scope in pom.xml to being looked up in the repository.

<dependency>
                    <groupId>rankingalgorithm</groupId>
                    <artifactId>rankingalgorithm</artifactId>
                    <version>3.0</version>                  
                </dependency>

The rankingalgorithm-3.0.jar was published to the local repository with the command:

 mvn install:install-file -DgroupId=rankingalgorithm -DartifactId=rankingalgorithm -Dversion=3.0 -Dpackaging=jar -Dfile=rankingalgorithm-3.0.jar

A runtime dependency was also added to jamwiki-war/pom.xml as below so that rankingalgorith-3.0.jar was packaged with the warfile:

                <dependency>
                    <groupId>rankingalgorithm</groupId>
                    <artifactId>rankingalgorithm</artifactId>
                    <version>3.0</version>                  
				<scope>runtime</scope>
                </dependency>


Conclusion

It is very easy to add a new search engine to JAMWiki, especially RankingAlgorithm since it makes use of Lucene for indexing while replacing the searching, scoring and ranking with its own. The indexes built with the Lucene Search Engine can still be used as it is (no new indexes need to be built), changes are minimal and JAMWiki gains another search engine that is comparable to Google site search and much better than Lucene (see Perl index searches) and also Fig 1 and Fig 2. Lucene is a very good search engine but RankingAlgorithm seems to do a much better job of accurately and relevantly ranking the search terms.