Linux - NetworkingThis forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I have a question regarding how the data of a web page is partitioned into multiple TCP session by HTTP web server? That is if there is any identifications in these TCP session data that can be used by web browsers to reconstruct the original web page. Obviously those TCP session data must be organized in order and no one can be left out for a complete reconstruction. Where should such TCP session identification information be located? in TCP packet header? in TCP session header? in HTTP data? or elsewhere. Thanks a lot for any comments!
If i understand the question, i think you've misunderstood a few things about what tcp does. tcp has nothing to do with http other than it being a data protocol carried within tcp. a web page will be multiple http requests over multiple tcp connections. often a single connection can be used for multiple http gets but at the same time, multiple connections are used for speed. TCP implictly ensures that each piece of data passed is correct rebuilt as a single entity, but as a web page is usually about 30 different things, that's down to the html engine to reassemble each piece of html, gif, flash into a page irrespective of the network transport.
Yes, you are quite right on the question! That is what I have imagined. Therefore my ultimate goal is to find out the relationships among the received tcp sessions. There must be some sort of identities which can be used by web clients to reassemble these tcp sessions into a complete image of a web page. Maybe I should resort to the design of web server software to find out the TAG that the server software have put into those tcp session data.
Give me some hints if anyone happens to know. Many thanks!
web clients and tcp sessions never meet up. the tcp/ip stack returns to the app a given number of pieces of http data containing html, png etc.. it has no business caring about tcp itself when it's back in the realms of a browser.
So the browser wants index.html. It requests an HTTP get for index.html from its network stack and that goes away and opens a new tcp socket to the server, requests the data and passes it back to the browser side of the app. the browser then reads the app, and sees an <img /> tag and requests that gif file from the network stack. that will then open another socket, or reuse the existing one to request that, and then passes it back again, and so it goes on. Please note the demarcation of responsibilities of it all, that there are clear lines of responsibility within the app that keep different things very seperate.
Last edited by acid_kewpie; 11-17-2008 at 03:53 AM.
Thank you very much, Chris! Very clear and to the point! Let me study the protocols and relevant techniques a little bit more and get better understanding of how data is transferred and organized between web servers and client browsers. All in all, the way that data is transferred back and forth depends on the dynamic communications between them.
BTW. I am wondering if there exist some sort of static HTTP protocol/header parsers that can extract wanted certain type of MIME type data block out of direct dump of HTTP communication data? I mean development tools and libraries. Many thanks again!
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.