Hear us Roar
Article:
 |
|
Introduction to Text Indexing with Apache Jakarta Lucene
|
| Subject: |
|
How about more "fuzzy" operators and proximity weighting |
| Date: |
|
2003-03-06 16:27:26 |
| From: |
|
anonymous2
|
|
|
|
Maybe a mispelling / typo operator, or maybe soudex, or maybe a regex match.
I guess most operators would require the each token be reduced to some type of "hash code" and then that hash code would be stored in a separate field. Then the search would has query terms and check the hash-codes field.
But some operators would seem difficult to do if a source word could not be directly mapped to a single "hash code". For example, a regex match.
Also, would be nice to boost documents where two matching words are closer together.
|
|
| |