Discussion Board
Go to the previous messageGo to the following message
Current Forum: Homework 5 - Parts 1 and 2
Date: Fri Nov 9 2001 11:45 pm
Author: Boonyatera, Sunya Paul <sunya@andrew.cmu.edu>
Subject: Re: Part I and II error caused by PageHref.java

If context has a trailing "/", then the original code does work. It's when that "/" is missing that problems occur. To get around this, I basically used your method, except I also checked that String h does not begin with a "/". If the trailing "/" in context and the beginning "/" in h do not occur, then I concatenate context + "/" + h. Otherwise, I use the code that was originally there.

Without the check on h, you could get some output like:

http://www.cs.cmu.edu/~petel//someotherpage.html.

Also (this I'm not as sure of), let's say a link like this is on the page:

<a href="/index.html">, which should point to http://www.cs.cmu.edu/index.html
Without the check, we would instead incorrectly spider http://www.cs.cmu.edu/~petel//index.html.

This doesn't take into account the situation when the page has a <base href> tag set, though. For that, I think you'd have to modify the state table?

BTW, thanks for posting this. I never would've noticed this problem if I hadn't seen this post.
Post response

Go to the previous messageGo to the following message
Current Thread Detail:
Part I and II error caused by PageHref...      Liu, Limin Angela      Wed Nov 7 2001 9:11 pm       
base URL      Liu, Limin Angela      Thu Nov 8 2001 9:37 am       
Re: Part I and II error caused by P...      Boonyatera, Sunya Paul      Fri Nov 9 2001 11:45 pm       
Re: Part I and II error caused b...      Liu, Limin Angela      Sat Nov 10 2001 1:11 pm       

Back to previous screen