Current Forum: Homework 5 General Forum |
Date: Fri Nov 16 2001 8:50 am |
Author: Ghosh, Debmallo S. <dsghosh@cmu.edu> |
Subject: Re: sample output |
|
|
What I've done to avoid this is to implement a protocol checking line in the WebSpider program, that avoids adding non-http: links to the queue. The Java URL class has a little function called getProtocol() that is very useful in doing this. I'm sure this isn't the most elegant way to do this -- for example, it totally ignores ftp: or https: links, but I don't expect many of those will turn up. But it does seem to work in preventing all the javascript and mailto HREFs ... |
|