Discussion Board
Go to the previous messageGo to the following message
Current Forum: Homework 5 General Forum
Date: Fri Nov 16 2001 8:50 am
Author: Ghosh, Debmallo S. <dsghosh@cmu.edu>
Subject: Re: sample output

What I've done to avoid this is to implement a protocol checking line in the WebSpider program, that avoids adding non-http: links to the queue. The Java URL class has a little function called getProtocol() that is very useful in doing this. I'm sure this isn't the most elegant way to do this -- for example, it totally ignores ftp: or https: links, but I don't expect many of those will turn up. But it does seem to work in preventing all the javascript and mailto HREFs ...
Post response

Go to the previous messageGo to the following message
Current Thread Detail:
sample output      Lee, Charles C.      Wed Nov 7 2001 7:49 pm       
Re: sample output +Attachment      Goodman, Brian J.      Thu Nov 8 2001 1:09 am       
Re: sample output      Liu, Limin Angela      Thu Nov 8 2001 9:18 am       
Re: sample output      Batra, Rohan      Sun Nov 11 2001 1:21 pm       
Re: sample output      Tanz, Ophir      Tue Nov 13 2001 9:42 pm       
Re: sample output      Ghosh, Debmallo S.      Fri Nov 16 2001 8:50 am       
Re: sample output +Attachment      Goodman, Brian J.      Mon Nov 12 2001 2:01 am       
Re: sample output      Katzhyman, Michael Zadok      Sat Nov 17 2001 4:22 pm       
Re: sample output      Goodman, Brian J.      Sat Nov 17 2001 5:38 pm       
Re: sample output      Turkaslan, Muhsine Tanyel      Sun Nov 18 2001 12:02 am       
Re: sample output      Goodman, Brian J.      Mon Nov 19 2001 9:12 pm       

Back to previous screen