Lucene Frequently Asked Questions

Welcome to Lucene FAQ. You can search the FAQ using the field text below. To view back the full list, simply submit an empty query. Note that all searches are case insensitive.

Search the faq:




  1. GENERAL SECTION

    1. What is Lucene
    2. Where is Lucene Home page ?
    3. Who is Doug Cutting ?
    4. Where is the home site of this FAQ ?
    5. Where can I download Lucene ?
    6. Will Lucene work with my Java application ?
    7. Who maintains Lucene ?
    8. How much does it cost ?
    9. Is Lucene governed by the LGPL ?
    10. Where can I learn more about Lucene ?
    11. Are there any alternative to Lucene ?
    12. What are Lucene system requirements ?
    13. Do I need any third party library to use with Lucene ?
    14. Does Lucene has a web crawler ?
    15. Does Lucene store a full copy of the indexed documents ?
    16. I want to help, where do I sign ?

  2. INDEXING SECTION

    1. What is indexing ?
    2. Where the index database is stored ?
    3. How do I perform a simple indexing of a set of documents ?
    4. How can I update the indexing of a document or a set of documents ?
    5. How can I delete a document or set of documents from an index ?
    6. How can I add document(s) to the index ?
    7. What the Document objects are used for ?
    8. What is a Field object and what are the possible field kinds ?
    9. How should I decide what field types to use ?
    10. Can I use Lucene to crawl my site or other sites on the Internet ?
    11. How can I extract the content of HTML pages ?
    12. How do I index other document types such as PDF and Word ?
    13. What are Analyzers and what are the differences between them ?
    14. Can I use the same Analyzer object again and again ?
    15. Why is it important to use the same analyzer type during indexing and search ?
    16. What Analyzer should I use ?
    17. Can I write my own custom analyzer ?
    18. I need an analyzer with functionailty that is not supported by Lucene token filters, what should I do ?
    19. Does the order of the token filters used by an analyzer matter ?
    20. I want to use a token filters that uses a word dictionary, are there any special considerations ?
    21. Can my token filter generate multiple tokens for a single input token ?
    22. How can I find the effect of an analyzer on a given text ?
    23. What is PorterStemmer and what it is good for ?
    24. What is index optimization and when should I use it ?
    25. What are Segments ?
    26. How can I make 'pig' also match 'hog' ?
    27. What is the Stop Filter ?
    28. What about non English and non Latin languages ?
    29. How fast is Lucene indexing ?
    30. How can I perform a long indexing without affecting ongoing searches ?
    31. Is Lucene index database platform independent ?
    32. When I recreate an index from scratch, do I have to delete the old index files ?

  3. SEARCHING SECTION

    1. What is searching ?
    2. What is a query ?
    3. How do I get a Query object to represent a given query ?
    4. What are 'terms' ?
    5. What is the syntax of queries parsed by the QueryParser class ?
    6. What is a term query ?
    7. What is a boost factor ?
    8. What is a phrase query ?
    9. What is a boolean query
    10. Can I use the same query object more than once ?
    11. How can I verify that the query structure I created is ok ?
    12. My documents have multiple fields, do I have to replicate a query for each of them ?
    13. What about partial matches and regular expressions ?
    14. Ok, I do have a Query object, how do I perform a search ?
    15. How do I get the hits and what information is available for each hit?
    16. What is filtering and how is it performed ?
    17. How can I restrict the number of hits ?
    18. How can I specify the starting position of the hit list
    19. How can I get hits only from a specific date range?
    20. Can I have multiple date values or multiple date fields ?
    21. Can I embed date values in regular text fields ?
    22. How can I implement paging in case the hit list is too long ?
    23. Are the search results deterministic ?
    24. How can I show a 'star' rating for each hit ?
    25. Are the searches case sensitive ?
    26. I searched for 'xyz' and could not find it, why ?
    27. How does Lucene handle numbers and special characters ?
    28. How can I search for 'any word', 'all words' or 'phrase' ?
    29. Will a search for 'cat' match 'cats' ?
    30. Can I define search aliases so 'dog' will also match 'pet' ?
    31. How does Lucene assigns scores to hits ?
    32. Does the position of the matches in the text affects the scoring ?
    33. Does the length of a field affects its scoring ?.
    34. How can I increase the score of certain document types ?
    35. How can I show excerpts with the hit results ? How about highlighting the matched words ?
    36. How can I show a cached version of the document with the matched words highlighted ?
    37. How can I perform a search on a subset of the documents ?
    38. How can I perform hierarchical searches
    39. How can I delete from an hierarchical index a subtree
    40. I have large documents, How can I reduce the overhead of reading them from the Index
    41. Can I modify the index while performing ongoing searches ?
    42. Can I hide certain documents from a user (for security reasons) ?

  4. MISCELLANEOUS SECTION

    1. I have a great question for this FAQ, where should I submit it ?
    2. How can I help maintaining this FAQ
    3. What are the copyright terms for using this FAQ ?




FAQ maintener tal@zapta.com.   FAQ created with FAQmanager