Yes, I think Ying makes a good suggestion. Abstractly, what we are trying to build is a "reverse index". In the Web, if you have the URL of a web page, you can retrieve it and get all of the keywords and links in that page. So, a URL is a key in an index structure called the World-Wide Web.
But a search engine builds a reverse index: Given a keyword, return the URLs of all pages that contain that keyword. This is the "reverse" of the Web, and hence the term "reverse index".
So, fundamentally, you need to find a way to build a structure that can search for keywords very quickly, and to associate each keyword with lists of URLs. |