I found that in http://web.mit.edu/spotlight.html, there are a few keywords that are strings: "Harlem Nights" "Kiss me, Kismet!" ...
In our HttpTokenizer, we keep these strings intact, so they will be saved as single keywords into the index.
Should we break them apart? Because anyway we can search multiple words? Or should we strictly treat them as single keywords?
|