Yes, we're working on the extra-credit info right now. If you or anyone else has ideas for improvements, you should feel free to post them. We will respond with an indication of extra-credit (or not).
Here are just a few of the kinds of things you might consider, focused on part 1:
- Suppose you have a web page at http://www.foo.com, and in this web page there is a passage that goes like this:
...documentation is available on how to build an invisibility potion...
Normally, the keywords such as "invisibility" and "potion" would be associated with www.foo.com. But in essence they should probably be associated with www.bar.com instead.
How would you have to change the structure of the search engine to do this?
- Special characters (eg, foreign characters, accented characters, etc) in a web document are indicated by special keywords that begin with the ampersand ("&") character and terminated with the semicolon (";"). So, for example, an incompressible space is indicated by " ". At the moment, the ampersand is treated as though it is whitespace, not a special keyword delimiter. As a result, keywords such as nbsp are indexed, when probably they ought to be ignored.
How would you change Part 1 so that these special formatting keywords are ignored?
- One special kind of tag in web documents is the "META" tag. For example, a web page for a very interesting course on data structures and algorithms might include meta tags like the following:
These meta tags don't get displayed by the web browser --- they are present only to "help" search engines do better indexing. The current Part 1, however, completely ignores meta tags.
How might you improve Part 1 so that meta tags are taken into account by the search engine?
|