LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Networking (https://www.linuxquestions.org/questions/linux-networking-3/)
-   -   weird network problem, only arp works (https://www.linuxquestions.org/questions/linux-networking-3/weird-network-problem-only-arp-works-939756/)

seccentral 04-14-2012 04:37 AM

weird network problem, only arp works
 
Hello people, i'm having a small problem on a network here and frankly it's vexing.

so here we are

LAN1=10.11.99.0/24
LAN2=10.11.221.0/24
yes i should have labeled the locations themselves but please think of LAN1 and LAN2 as different parts of the map [the real map]

in LAN1 i have a 10.11.99.200 dlink des-3526 switch that has a 1 portgbic in it's 25'th port that transports via one channel fiber across town at 1Gbit/s speed into LAN2. now when it lands in lan2 it lands in a media converter, then through copper medium it lands in LAN2 on 10.11.221.200 which is another dlink des-3526 switch in it's hybrid gigabit port 25 [copper]

so it's smth like this

1Gbit link: dlink1----fiber----mc---dlink2
also this link transports 2 vlans: 200[tagged - this is the vlan for LAN1] and 300[untagged this is not important]

the problem is, i have a host in LAN2 which is a linux box, [CentOS release 5.5 (Final)] that cannot communicate properly with lan1. i have given it an ip address that belongs to LAN1 using vconfig, but..
i can only use arp
ie: arp-scan -I eth2.200 -l shows me alot of hosts in LAN1.
but!! when i try to ping *anything* let's say 10.11.99.1 in LAN1 it's dead. and it's dead for real dead cause i dont even see icmp echo requests
in tcpdump (on 10.11.99.1).
however, no other machines that are in LAN2 and that i've configured to also have interfaces (using vlan tags) in LAN1 have this problem. only this one. and i really don't understand it.

PS: it has a substantial firewall config file, i thought maybe it's something messy there, couldnt find anything, even so i decided to disable iptables on it for testing purposes, even with iptables disabled, i still cant ping those hosts, but i can arping them no problem.

btmiller 04-14-2012 10:22 PM

Sounds like perhaps a mismatch between your VLANs and IP subnets. A VLAN is a layer 2 creature, i.e. it controls what hosts are on the same (virtual) network segment. However, IP is a layer 3 protocol. It sounds like (correct me if I am wrong) that you have put the Linux box on LAN2, but you have given it a LAN1 IP address. Is LAN2 its own VLAN? Does it get carried across the fiber? You need to check the routing table on the Linux box and make sure it has the correct routing parameters for LAN1 (since you gave it a LAN1 IP, it should, but you really need to check the routing table to be sure). If ARP is working, it sounds like everything is OK from a layer 2 standpoint, but there is no layer 3 communication. It sounds to me that one of the hosts does not have the correct layer 3 routing information to talk to the others. Hopefully checking and adjusting the routing tables (netstat -rn + the route command) will help clear up the confusion.

BTW, your description of LAN1 and LAN2 are really confusing. It sounds like you have two VLANs, but LAN2 is not its own VLAN. What this really means is that you have two subnets sharing the same (virtual) layer 2 LAN. The way I've always seen this done is to map the VLANs to subnets so that they are coterminal. Then you would have some form of router that would interchange layer 3 traffic between the two subnets. What you're doing should work though (if you can believe it, I have an even more screwed up situation where multiple subnets share a layer 2 network segment).

Hope this helps. If it doesn't, another thing to check would be to check the MAC routing tables on the switches themselves and see if both D-Links are learning the correct paths to forward traffic for both endpoint MAC addresses.

seccentral 04-15-2012 02:12 AM

hello and thank you for the reply.
yeeaaa... somehow i kind of find it hard to describe my situation clearly. given the confusing network layout and the fact that im not a native english speaker.
the thing is.. there are two physical locations. one is the office the other one holds some of our servers. the office is LAN1 and the servers are part of their own and i named it LAN2
further, there are more linux boxes in lan1 that receive lan2 traffic vlan tagged. and it works fine for those.
i understand what you are saying but i checked myself and i didnt find any problem from a config-point-of-view. not on the switches themselves, not on that linux box.
lan2(server hosting area) comes tagged and reaches the office site tagged.
the dlink switch we have in the office does nothing else but send tagged packets to this linux box.
ie i have our own office network on most ports untagged (lan1)
and the server network tagged on some ports (lan2) and this linux box is connected to one of theese ports.
on the same physical port i have multiple vlans tagged and among them is the one corresponding to the server network.
so in this linux box(which also serves as a router and local dhcp server for our office employees), on one interface we have the local office network untagged, and on the other we have tagged the following: 2 ISP links, and this server network. i really find it frustrating not knowing how i should explain the situation better.
to make things a bit clearer, lan1 (office network) is untagged, we are bringing lan2(server network) tagged into the office and this linux router cannot communicate with it, however any other hosts that receive the lan2 network can.(hosts that are INSIDE the office, connected to the same office dlink switch, receiving the same lan2 tagged).

nikmit 04-16-2012 04:19 AM

Check your arp cache and routes on the problematic machine for anything out of order.
Code:

arp
ip route show
ip rule show

If all that looks the same as for working servers, dump icmp and see if packets are going out or not. I would do that for all interfaces on the server.
Code:

tcpdump -i ethX icmp and host {yourhostip}

seccentral 04-16-2012 06:20 AM

nikmit, did that already and no resolution, however, i just thought i should update the kernel... you know, for no apparent reason.
well, i'm not a kernel coder so i wouldnt know what caused it. but after a kernel update and reboot everything just works...
so i am marking this as solved, could be some kernel bug as it was a custom built kernel by some previous admin and an older version.


All times are GMT -5. The time now is 07:01 AM.