I have pushed on GitHub
a projet on how to create a generic highlighter with
Apache Lucene.
Original Lucene Highlighter is too much coupled with snippet highlighting and :
- Do not allow easily to highlight a whole text
- Handles only text with a formatter strongly coupled to text
I have modified the original Lucene Highlighter to allow highlighting of "anything". The highlighter is a callback instead of a formatter and it's purpose is to find terms in a whole text with a score.
I used this code to highlight XML, PDF, HTML... with or without
Solr.
Note : This project is an extract of a large project with submodule.