(0000020)
rlawley (reporter)
2004.08.16 00:30
|
The behaviour of the expand variable causes a major problem with crawlers, specifically the msnbot. If a page is crawled by MSN, each link it sees in the index has the expand variable set. These are then added to the queue of pages to crawl, and it will come back again, looking at the page with expand. Next time, all of the links will have 2 values in expand, then 3 etc. This ends up with the crawler consuming large amounts of bandwidth infinitely crawling a website, and could cause people's bandwidth limit to be reached. In the example I have seen, it is causing 180Mb a day in text-only pages to be transferred! |