[SOLVED] packet handler sometimes causing soft locks
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Iv been working on a LKM packet handler, and I am experiancing soft locks when processing large volume of packets like downloading a 13MB file with the module enabled.
I am sure that it has something to do with this function __set_tcp_option(), but I am not sure what because it works when processing low volume traffic like a simple web page.
I thought the soft lock might be because of some loop, but I cannot find it, or a function that sleeps, but I cannnot find any of those either.
The input I am using for testing is (skb, 31, 6, 32bit int). The function can be used for setting TCP options with up to 64bits of data.
I am hoping someone can see what would cause the soft locks.
Code:
/*
* This function will attempt to update,
* or add a tcp option. By passing the
* skb, option number, option length in bytes, and
* data into the tcp segment.
*/
static inline int __set_tcp_option(struct sk_buff *skb, unsigned int tcpoptnum,
unsigned int tcpoptlen, u_int64_t tcpoptdata){
struct tcphdr *tcph;
struct iphdr *iph;
__u16 tcplen;
__u8 i, optspace, addoff, spaceneeded, bytefield, count, *opt;
iph = ip_hdr(skb);
tcph = (struct tcphdr *)(skb_network_header(skb) + ip_hdrlen(skb));
opt = (__u8 *)tcph + sizeof(struct tcphdr);
for (i = 0; i < tcph->doff*4; i += optlen(opt, i)) {
if ((opt[i] == tcpoptnum) && (tcph->doff*4 - i >= tcpoptlen) &&
(opt[i+1] == tcpoptlen)) { // TCP Option was found.
if (DEBUG_TCP_OPTIONS == TRUE){
printk(KERN_ALERT "INFO: TCP Option found in packet.\n");
}
count = tcpoptlen - 2; // get option length,
// and ignore header fields.
bytefield = 2; // the first data byte is always at i+2.
while (count > 0) {
count--;
if ((count) != 0) {
opt[i+bytefield] = (tcpoptdata >> 8 * count);
}
else {
opt[i+bytefield] = tcpoptdata & 0x00ff;
}
bytefield++;
}
// Get the TCP length.
tcplen = ntohs(iph->tot_len) - ip_hdrlen(skb);
if (DEBUG_TCP_OPTIONS == TRUE){
printk(KERN_ALERT "INFO: The TCP length is: %u.\n",
tcplen);
}
tcph->check = 0;
tcph->check = tcp_v4_check(tcph, tcplen,
iph->saddr,
iph->daddr,
csum_partial((char *)tcph, tcplen, 0));
iph->check = 0;
ip_send_check(iph); //ip checksum
return 0;
}
}
// TCP Option was not found!
if (DEBUG_TCP_OPTIONS == TRUE){
printk(KERN_ALERT "INFO: TCP Option was not found.\n");
}
optspace = 0;
if (tcph->doff > 5) {
if (DEBUG_TCP_OPTIONS == TRUE){
printk(KERN_ALERT "INFO: Defrag option space.\n");
}
// Remove TCPOPT_NOP, and find any free space in opt field.
for (i = 0; i < tcph->doff*4 - sizeof(struct tcphdr); i += optlen(opt, i)) {
// While TCP option space = TCPOPT_NOP.
while (opt[i] == TCPOPT_NOP) {
// Move options forward to use TCPOPT_NOP space.
memmove(opt + i,opt +i + 1,(((tcph->doff*4) - sizeof(struct tcphdr)) -i) -1);
opt[(((tcph->doff*4)- sizeof(struct tcphdr)) -1)] = 0;
}
// If TCP option = TCPOPT_EOL its the end of TCP options.
if (opt[i] == TCPOPT_EOL) {
if (DEBUG_TCP_OPTIONS == TRUE){
printk(KERN_ALERT "INFO: End of option space.\n");
}
optspace = ((tcph->doff*4) - sizeof(struct tcphdr)) -i;
break;
}
else { // No TCP option space avaliable.
if (DEBUG_TCP_OPTIONS == TRUE){
printk(KERN_ALERT "INFO: No TCP option space avaliable.\n");
}
optspace = 0;
}
}
}
addoff = 0;
spaceneeded = 0;
if (optspace < tcpoptlen) {
// Calculate the new space required to add this option.
spaceneeded = tcpoptlen - optspace;
// Calculate how many dwords need added (data offset must align on dwords)
addoff = spaceneeded/4;
// Odd number of bytes needed increase by one.
if (spaceneeded%4 != 0) {
addoff += 1;
}
}
if (DEBUG_TCP_OPTIONS == TRUE){
printk(KERN_ALERT "INFO: Optspace is: %u. Spaceneeded is: %u. Addoff is: %u. Option is: %u.\n",
optspace,
spaceneeded,
addoff,
tcpoptlen);
}
if (tcph->doff + addoff <= 15) {
if (addoff != 0) {
if (skb_tailroom(skb) < addoff*4) {
if (DEBUG_TCP_OPTIONS == TRUE){
printk(KERN_ALERT "ERROR: Need to expand SKB.\n");
}
if (pskb_expand_head(skb, 0, addoff*4 - skb_tailroom(skb)+128,
GFP_ATOMIC))
if (DEBUG_TCP_OPTIONS == TRUE){
printk(KERN_ALERT "ERROR: Failed SKB expand head.\n");
}
return -1;
tcph = (struct tcphdr *)(skb_network_header(skb) + ip_hdrlen(skb));
if (DEBUG_TCP_OPTIONS == TRUE){
printk(KERN_ALERT "INFO: Sucess SKB expand head.\n");
}
}
if (DEBUG_TCP_OPTIONS == TRUE){
printk(KERN_ALERT "INFO: SKB head large enough.\n");
}
// Get old TCP length.
tcplen = ntohs(iph->tot_len) - ip_hdrlen(skb);
if (DEBUG_TCP_OPTIONS == TRUE){
printk(KERN_ALERT "INFO: Old TCP length is: %u.\n",
tcplen);
}
// Putting additional data in the tail.
skb_put(skb, addoff*4);
// Moving tcp data back to new location.
memmove(((opt + tcph->doff*4) - sizeof(struct tcphdr)) + addoff*4,
((opt + tcph->doff*4) - sizeof(struct tcphdr)),
(skb->len - ip_hdrlen(skb)) - tcph->doff*4);
// Zero space between old, and new data offsets.
for (i = 0; i < addoff*4; i++) {
opt[((tcph->doff*4)- sizeof(struct tcphdr))+i] = 0;
}
}
// Moving options back to make room for new tcp option.
memmove(opt + tcpoptlen,
opt,
(tcph->doff*4 - sizeof(struct tcphdr)) - optspace);
count = tcpoptlen - 2; // Get option length, and ignore header fields.
bytefield = 2; // The first data byte is always at i+2.
// Writing new option to freed space.
opt[0] = tcpoptnum;
opt[1] = tcpoptlen;
while (count > 0) { //Writing TCP option data.
count--;
if ((count) != 0) {
opt[bytefield] = (tcpoptdata >> 8 * count);
}
else {
opt[bytefield] = tcpoptdata & 0x00ff;
}
bytefield++;
}
if (addoff != 0) { // Fixing data offset.
tcph->doff += addoff;
}
// Fix packet length.
iph->tot_len = htons(ntohs(iph->tot_len) + addoff*4);
// Get new TCP length.
tcplen = ntohs(iph->tot_len) - ip_hdrlen(skb);
if (DEBUG_TCP_OPTIONS == TRUE){
printk(KERN_ALERT "INFO: New TCP length is: %u.\n",
tcplen);
}
tcph->check = 0;
tcph->check = tcp_v4_check(tcph, tcplen,
iph->saddr,
iph->daddr,
csum_partial((char *)tcph, tcplen, 0));
iph->check = 0;
ip_send_check(iph); //ip checksum
return 0;
}
else {
if (DEBUG_TCP_OPTIONS == TRUE){
printk(KERN_ALERT "ERROR: TCP Option space is full!");
}
return -1;
}
}
i am using skbuff and tcp_header->check to see checksum. but when data is modified in skb bu using skb_put. there is a need to calculate tcp checksum again, since data modification is happening in a NF hook at L3. therefore i used csum_partial() . but this didnt give me result.can u please guide on usage of the function correctly
This probably would have been more appropriate for a new thread but from my old NF hook that was recalculating the check-sum I grabbed this piece of code.
Use of functions if using post routing hook to capture and modify application payload (ex. chat)
Thx yaple for replying so quick. i was getting exhausted in writing on forums . thx a ton .
i have a query on working of NF module. when the payload is modified , in a post routing netfilter hook, we need to calculate and put the correct tcp checksum in tcphdr->check before finally passing a NF_ACCEPT return, to allow the packet to take its course and reach the receiver stack and thereafter application ( for example say chat session).
so is there a need to calculate tcp pseudo header checksum using csum_tcpudp_magic or does tcp_v4_check() automatically takes into account calculation of tcp checksum by taking into account the :-
"Pesudo header + tcp header ( 20 bytes + including options) + application layer payload".
i found that this taking care of pseudo header. but when i am using this function in 2.6.35 kernel version the compiling error says " too many arguments"
file.c:318:1: warning: ISO C90 forbids mixed declarations and code
file.c:329:1: warning: passing argument 1 of ‘tcp_v4_check’ makes integer from pointer without a cast
.c:329:1: error: too many arguments to function ‘tcp_v4_check’
Well i am not changing anything other than application payload. therefore as of now there is no change in ip addr whether source's or destination's)
can you please guide on this
REGARDS
Quote:
Originally Posted by yaplej
This probably would have been more appropriate for a new thread but from my old NF hook that was recalculating the check-sum I grabbed this piece of code.
Hi yaplej
CAn you explain line no 327
tcph = (struct tcphdr *)(skb_network_header((*skb)) + ip_hdrlen((*skb))); // access tcp header
what is the need of adding ip_hdrlen(*skb) . is it becuase of old implementation as earlier it was sk_buff **skb; however in latest kernel at least 2.6.33 and above that i have come across it is sk_buff*skb; and tcp header shall be accessed by just (struct tcphdr*)(skb_network_header(skb);
Can you please explain your logic of addition of 2nd parameter
Everything I wrote was for CentOS 5.5 so the kernel is a little older. Newer kernel probably does not need to adjust for the IP header offset manually. Its the only way I could get it to work correctly though.
Wireshark, gives some checksum with validation disabled remark. but if preference is set ON , then it says checksum incorrect. The prints are as under:- (Snapshot attached)
Checksum: 0x3b8d [validation disabled]
when option (Edit->Preference->Protocol->TCP) VALIDATE TCP CHECKSUM IF POSSIBLE : WHEN CHECKED , it gives
Checksum: 0x3b8d [incorrect, should be 0xa86e (maybe caused by "TCP checksum offload"?)
However when ip_summed is checked it gives value 0 which is for CHEKCSUM_NONE which means that hardware is not calculating the checksum. then what is finally happening. Why wireshark is reporting incorrect. and what is finally the correct thing.
So using tcp_v4_check() , how do we make sure that in case we modify application data payload at post routing stage in L3 NFHOOK , then the checksum that function will calculate is the actual one AND CORRECT ONE.
What abt function tcp_v4_gso_send_check() and __tcp_v4_send_check() : why and under what scenario are they used. of course for calculating L4 checksum but when are they to be used.
thx.
vineet
Quote:
Originally Posted by yaplej
Everything I wrote was for CentOS 5.5 so the kernel is a little older. Newer kernel probably does not need to adjust for the IP header offset manually. Its the only way I could get it to work correctly though.
When i capture packets on my gateway's outgoing interface. It shows ICMP redirect message therefore the eth0 interface tries re transmissions , as eth1 interface of gateway gives ICMP redirect message. Thus data after modification in gateway does not reach receiver. In ip_forward hook when the capture takes place the TCP source and destination port are some different port numbers altogether(as seen by dmesg) as seen on wireshark. I think this could be a reason for wrong tcp checksum calculation, thereby resulting in tcp retransmission by the gateway incoming interface.. Can you please tell how to handle skbuff when there is a change in the data while it is passing by the gateway interface. what all points to be taken care to prepare skbuff when the IP is masqueraded .
Quote:
Originally Posted by kalloc
Wireshark, gives some checksum with validation disabled remark. but if preference is set ON , then it says checksum incorrect. The prints are as under:- (Snapshot attached)
Checksum: 0x3b8d [validation disabled]
when option (Edit->Preference->Protocol->TCP) VALIDATE TCP CHECKSUM IF POSSIBLE : WHEN CHECKED , it gives
Checksum: 0x3b8d [incorrect, should be 0xa86e (maybe caused by "TCP checksum offload"?)
However when ip_summed is checked it gives value 0 which is for CHEKCSUM_NONE which means that hardware is not calculating the checksum. then what is finally happening. Why wireshark is reporting incorrect. and what is finally the correct thing.
So using tcp_v4_check() , how do we make sure that in case we modify application data payload at post routing stage in L3 NFHOOK , then the checksum that function will calculate is the actual one AND CORRECT ONE.
What abt function tcp_v4_gso_send_check() and __tcp_v4_send_check() : why and under what scenario are they used. of course for calculating L4 checksum but when are they to be used.
So the best example I can give is this. Here I am creating an entire new packet from scratch populating required fields then sending the packet.
If you dont calculate the packet length correctly when running the checksum it will not calculate correctly. The checksum could vary depending on the version of kernel your using.
Code:
20 skb = alloc_skb(sizeof(struct iphdr) + sizeof(struct tcphdr), GFP_ATOMIC); // Allocate a new sk_buff with room for L2 header.
21
22 if (skb == NULL){
23 return;
24 }
25
26 skb->protocol = __constant_htons(ETH_P_IP); // This is an IP packet.
27 skb->pkt_type = PACKET_OUTGOING; // Its outgoing.
28 skb->ip_summed = CHECKSUM_NONE; // No need to checksum.
29 skb->nh.raw = skb->data; // Not sure if this is needed.
30
31 skb_reserve(skb, sizeof(struct iphdr) + sizeof(struct tcphdr)); // Reserve the space for the L3, and L4 headers.
32 tcph = (struct tcphdr *)skb_push(skb, sizeof(struct tcphdr)); // Setup pointer for the L4 header.
33 iph = (struct iphdr *)skb_push(skb, sizeof(struct iphdr)); // Setup pointer for the L3 header.
34
35 iph->ihl = 5; // IP header length.
36 iph->version = 4; // IPv4.
37 iph->tos = 0; // No TOS.
38 iph->tot_len=htons(sizeof(struct iphdr) + sizeof(struct tcphdr)); // L3 + L4 header length.
39 iph->id = 0; // What?
40 iph->frag_off = 0; // No fragmenting.
41 iph->ttl = 64; // Set a TTL.
42 iph->protocol = IPPROTO_TCP; // TCP protocol.
43 iph->check = 0; // No IP checksum yet.
44 iph->saddr = saddr; // Source IP.
45 iph->daddr = daddr; // Dest IP.
46
47 tcph->check = 0; // No TCP checksum yet.
48 tcph->source = source; // Source TCP Port.
49 tcph->dest = dest; // Destination TCP Port.
50 tcph->seq = htonl(seq - 1); // Current SEQ minus one is used for TCP keepalives.
51 tcph->ack_seq = htonl( ack_seq - 1); // Ummm not sure yet.
52 tcph->res1 = 0; // Not sure.
53 tcph->doff = 5; // TCP Offset. At least 5 if there are no TCP options.
54 tcph->fin = 0; // FIN flag.
55 tcph->syn = 0; // SYN flag.
56 tcph->rst = 0; // RST flag.
57 tcph->psh = 0; // PSH flag.
58 tcph->ack = 1; // ACK flag.
59 tcph->urg = 0; // URG flag.
60 tcph->ece = 0; // ECE flag? It should be 0.
61 tcph->cwr = 0; // CWR flag? It should be 0.
62
63 ip_route_input(skb, daddr, saddr, iph->tos, netdevice); // Populate the skb->dst structure.
64 ip_send_check(iph); // Calulcate an IP checksum.
65
66 skb->dev = skb->dst->dev; // Populate skb->dev or it wont send.
67
68 if (DEBUG_SESSION_CLEANUP == TRUE){
69 printk(KERN_ALERT "SKB device index: %u.\n",
70 skb->dev->ifindex);
71 }
72
73 if (DEBUG_SESSION_CLEANUP == TRUE){
74 printk(KERN_ALERT "Route device index: %u.\n",
75 skb->dst->dev->ifindex);
76 }
77
78 NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, skb->dst->dev, dst_output); // Sent the packet.
Hi,
I am setting up a gateway. so i had to engage in ip forwarding and then enabled IP Masquerading in order to forward my traffic from one private intranet to the other on the other side of gateway interface. Modification of skbuff reaches the other end when there is no tampering in the gateway. However when the skbuff is handled again in the gateway, it does not reach the receiver
.............--------------------------------------------------..................
client A ----| gateway eth0 -- gateway eth1 | ---- client B
.............--------------------------------------------------.................
Tried capturing the packet data for modification in pre routing hook of gateway eth0 so that rest is like a blackbox and routing and other issues of forwarding will be taken care of as normally. However with a pre-routing hook packet does not show up in eth1 of gateway, unless i apply forward hook. therefore all modification to skbuff is done after capture on forward hook. when on forward hook incoming interface is shown as eth0 and outgoing as eth1. so far so good. But port numbers change. Source port number is some different random port number and destination port is Port 61 . I was expecting that the source port and destination port should match the one being shown in wireshark.
source ip addr is shown of client A and destination ip is of client B. i was wondering that if this be the case then probably source IP in ip forward hook should be that of eth1 and port numbers should be one that are shown in wireshark capture and then TCP checksum to be calculated again after data modification.. But again if i change port numbers which may be internal of masquerading (alias NAT ) it may also be an issue..
and when packet is captured in ip forward then it is redirected to Post routing , how will local out hook play a role as you suggested.. need a small bed time story to understand it ..
I think i have created mess out of this question.. If you could understand the upper part , can u please give a clue.. in handling the hook or other issue of making the packet travel smoothly to receiver application layer. By the way in above setup the modified packet reaches the receiver (client B) network layer.. but not the application layer.
[QUOTE=yaplej;4342252]So the best example I can give is this. Here I am creating an entire new packet from scratch populating required fields then sending the packet.
If you dont calculate the packet length correctly when running the checksum it will not calculate correctly. The checksum could vary depending on the version of kernel your using.
Code:
20 skb = alloc_skb(sizeof(struct iphdr) + sizeof(struct tcphdr), GFP_ATOMIC); // Allocate a new sk_buff with room for L2 header.
21....
........
78 NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, skb->dst->dev, dst_output); // Sent the packet.
Looking at your diagram you have a single gateway connected to two hosts. The gateway has IP routing enabled between the two host networks. So if you only want to capture traffic routed from one network to the other ie packets in eth0 out eth1 and in eth1 out eth0 you should only use a forward hook.
If your trying something else like with a bridge interface rather than routed you would need to use either NF_IP_PRE_ROUTING or NF_IP_LOCAL_IN because your not actually forwarding packets.
The question is what are you doing to the packet after you capture them?
If you don't watch the endianness you will end-up with a seemingly random port number out of the hook. Its actually inverted so if the original source is 4 00000100 and you modify the source = 2 you think thats 00000010 but its written as 01000000 when it goes to the wire so it ends up as 64 rather than 2. You need to use source = htons(2) to correct the issue.
i want to add the gateway signature to the passing data, either by replacing the data bits or by appending some bits to the passing by data. so need to modify skbuff , by adding data to tail of skbuff. however change in data would of course change TCP packet length, it wud change IP total length. it wud change TCP checksum and IP checksum. so i have to recalculate all these.. So one way is to capture the incoming packet at PRE routing hook and modify the payload, and let the linux box do the forwarding job as it would have done even when the packed was not tampered with. But this assumption of treating the next steps as black box goes wrong as the packet do not show up at eth1 interface of the gateway. even in pre-hook the port numbers are not same as reported by wireshark. However when i do this between only two machines i get exact port numbers on which my application is running when counter checked by demsg.
So now when i activate a sniffer module to check what actually is going i realise that source port and destination ports are not the one as reported by wireshark. the port numbers have changed.. so that means other fields are getting effected when IP forwarding is enabled in linux gateway. so obviously checksum would not be same... So it is confusing what all is to be taken care of and what all can be treated as a black box.
My idea is capture traffic of one network coming on eth0 interface of gateway and modify it and let the modified data go out from eth1 interface of gateway to the receiver (CLient B) .
Checks: WHen i apply pre-routing hook at receiver B the port numbers are still not the ones that are shown in wireshark. However when ack for the received packet is sent and that is captured in post routing hook it shows correct value of ports except for doff
which i am checking as "tcpheadrlength is %d ", ntohs((tcp_hdr-> doff) <<2); it shows 0 value in prehook and value of 2048 in post hook.
NF hooks have hooked my brains . i hope i could clarify my problem sufficiently well along with some other test results .. can you please help again ..
Quote:
Originally Posted by yaplej
Looking at your diagram you have a single gateway connected to two hosts. The gateway has IP routing enabled between the two host networks. So if you only want to capture traffic routed from one network to the other ie packets in eth0 out eth1 and in eth1 out eth0 you should only use a forward hook.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.