The way I did it was to keep a list of seen links in a vector and everytime I dequeued something, I searched the vector to see if it was in the vector and if it was not then I added it to the vector.
Even though this is O(N^2)m since we're only crawling 100 pages I think it's acceptable. Another way to do this I guess would be to use a binary tree, I'm not sure if that works though, because links with different names may point to the same web page. In which case we end up double counting. |