Discussion Board
Go to the previous messageGo to the following message
Current Forum: Homework 5 - Parts 1 and 2
Date: Fri Nov 16 2001 8:55 pm
Author: Bortz, Andrew S. <abortz@andrew.cmu.edu>
Subject: Re: Queue size

You could use some sort of heuristic to guess the average success rate of link reliability and server reliability necessary in order to actually get some other number of pages crawled in total.

So, assuming an 80% success rate, you need 13 total pages to get 10 successes. Let's assume we already visited 4. So you guess that we can stop adding to the queue after it reached size = 9. Obviously, if your guess is off on the low end, then you run out of pages to try.

In practice, it takes far less memory to store a list of URLs than it does to store, say, the index, so this is not really a problem for us unless we try to crawl a HUGE number of pages.

In that case, you could use a class that stored and retrieved its contents from a file on the hard drive, or in a structured database locally or on another machine.
Post response

Go to the previous messageGo to the following message
Current Thread Detail:
Queue size      Scherer, Sebastian      Wed Nov 14 2001 2:11 pm       
Re: Queue size      Park, Daesik      Fri Nov 16 2001 1:08 pm       
Re: Queue size      Bortz, Andrew S.      Fri Nov 16 2001 8:55 pm       
Re: Queue size      Lee, Peter      Sat Nov 17 2001 3:56 pm       

Back to previous screen