Great question.
First of all, it is good to recognize the fact that highly popular web sites (ie, web sites with very high in-degree) are often popular not just because they have great content, but also because they provide direct links to other very useful and popular web sites.
So, for example, if cnn.com were to add a direct link to your web site, that would say something important about your web site.
If you wanted to take this into account in your search engine, how would you do that? One approach, and in fact this is the approach recommended for this assignment, is to build a complete graph data structure whose vertices represent all of the web pages that your WebSpider visits, with edges representing all of the hyperlinks between pages. Then, when you invoke the makeIndex() method, the first thing that can be done is to traverse the graph structure, computing the in-degree of each vertex. (We will spend all of this coming week on graph data structures and algorithms.)
But what if you didn't want to build an entire graph structure? What would you do in that case? |