Getting filesize before downloading the page

balanagireddy · 09-01-2004, 12:31 AM

Hi,
I am doing a search engine project .I run the crawler everyday.When I am using to crawl the pages from the net I dont need to recrawl the page that was crawled before.
So,can neone gimme an idea abt whatz the best way to solve this problem??
I think one idea may be like comparing the page size of the page before crawling with already the page which is crawled.If the size doesnt vary then i will crawl the page.
But to implement the above strategy i need to know the pagesize before testing comparing page sizes.
So, can neone tell me how to get the pagesize before crawling it??

Waiting for ur suggestions,
M.Bala Nagi Reddy

ugenn · 09-01-2004, 12:48 AM

The size of a http response can be stored as an optional (Content-Length)field in the header.