Note: I am using the edition of http://web.mit.edu/ which celebrates their "Human/Computer Conversation: HAL and Beyond" conference which is white text, black background, and yellow active link-colored. Any other editions of http://www.mit.edu/ won't work for my question. Please answer using this particular edition. I've archived it to www.andrew.cmu.edu/~richardc/web.mit.edu.htm just in case MIT does a switcheroo to http://www.mit.edu.
1. In the example @ http://www-2.cs.cmu.edu/afs/andrew/course/15/211/www/hw5/writeup/webcrawler.html, is the reason the 1st line of the output reads "http://www.mit.edu": - a (indirect) consequence of the URL "http://www.mit.edu" being passed in as an argument to java WebCrawler, or - a (direct) consequence of the page http://www.mit.edu having its first hyperlink on it(self) as being http://www.mit.edu?
The answer to this question leads into my 2nd one:
2. Using http://www.mit.edu/ as an example and running "java WebCrawler http://www.mit.edu/ "foo.save 10", will WebCrawler: - extract the 1st 10 first hyperlinks off http://www.mit.edu/ and display those and stop there (unless there are less than 10 hyperlinks at http://www.mit.edu in which case, we'd have to access the page of the 1st hyperlink on http://www.mit.edu and extract the hyperlinks off that page), or - go to the 1st hyperlink then return the 1st hyperlink at that page, then go down and return the 1st hyperlink at that page, etc. until we reach 10 deep; I'm pretty sure the this method is wrong since we're doing a breadth-first traversal, but please clarify for me. |