LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Networking (https://www.linuxquestions.org/questions/linux-networking-3/)
-   -   VxWorks 6.2 Kernel WIND ver. 2.8 Issues with losing MPLS Connectivity (https://www.linuxquestions.org/questions/linux-networking-3/vxworks-6-2-kernel-wind-ver-2-8-issues-with-losing-mpls-connectivity-4175596352/)

ThingsofInternet 12-28-2016 04:29 PM

VxWorks 6.2 Kernel WIND ver. 2.8 Issues with losing MPLS Connectivity
 
Hello All, I'll be honest I'm moving in to the *nix world out of necessity. I work for a brand of PBX that uses voice switches VxWorks (I'll leave names out not sure that it advances the conversation). We're having a really odd issue in which overnight some nights we lose application layer and voice connectivity over the MPLS. What we know is that we can send a ping or perform a traceroute we re-gain application layer and voice connectivity over the MPLS.

The topology is this:

Windstream Fiber MPLS
Dell Sonicwall Firewall
HP Procurve Aruba series PoE for the LAN

The MPLS is not firewalled it is in a configured passthrough port (X2) on the Sonicwall. The Voice and Data are on Separate VLAN's and the VxWorks switches are 'untagged' for both. We learned that there were known issues with VxWorks and Sonicwalls issuing ICMP redirects when hops aren't available so Windstream implemented BGP active route monitoring so we can point to the Cisco IAD as the Gateway and not have to worry about the Sonicwall telling us that a hop isn't available and to do something else. The HP is "voice optimized".

What we know:
*We know that routinely we wake up to connectivity issues between the switches over the MPLS and it is not the same switch losing connectivity to the same switches each time.

*We also know that we can do something at layer 3 traceroute/ping to wake the link up and re-gain connectivity and again establish voice across the network.

*We also know (so far) that we never lose connectivity to anything during business hours, and that we only have to manually re-establish connectivity between switches when we get in the next morning.

I think I know that the routing conversation only happens at boot time, I can then manually add routes that will affect change but after these events I find that my manually added routes have been removed..

We have an escalations ticket with our PBX vendor and they are looking in to the network but aren't moving that fast.. I really need the answer to the 'why' question.. Haaalp? Plz?

Nate

ThingsofInternet 12-28-2016 06:30 PM

**Message to the Mod's** -I'm concerned that I'm steering folks that might be able to assist with simple checks regarding the interface and routing tables because of the network architecture.. forgetting the topology, etc. I have a very simple question.. assuming connectivity is lost whether it be interface, MPLS, or anything else why is it not restored when the network becomes available again?

ThingsofInternet 01-09-2017 09:39 AM

Just to keep the conversation fresh.. I noticed that I didn't include the additional info that Many Windows/Mail/Web servers never lose connectivity between each other. The only devices that seem to lose connectivity are the VxWorks devices.

We were given a next step of configuring all switch ports that the VxWorks devices are connected to to 100 Mb/Full-Duplex as well as the eth/0 of the VxWorks devices. So far we're status unchanged..

-Nate

jefro 01-10-2017 04:52 PM

Just a few notes here.

When you add to your original post and not edit it, the thread falls off the zero reply. Consider editing it.

If you feel that the vxworks devices are going to sleep then maybe contact the hardware vendor and or vxworks for more clues. My only guess is that the device is going into power save and not going out correctly but you'd have to run some tests on it. Wireshark may help or log into the switches and see network tools maybe. Might be a way to test with some keep alive signal??
Not sure if you can get command line in vxworks or a console monitor to see it's state. That would help.

ThingsofInternet 01-12-2017 08:03 AM

Thanks for the response Jefro!

I'm really new to forums in general, thanks for the note regarding editing the original response. We have opened up multiple tickets with the PBX vendor which sold us the hardware. I believe the prospect of RMA'ing 40 or so of these devices when we don't know what the actual problem is would be a no-go so we're kind of forced to work within the box in which we're given..

I found in another forum that it's possible that this could be TCP timers. There is a watchdog service that can time out and then it takes resetting the interface in order to get things working again. I have a send/receive test that I want to get done when it's down but really the timing needs to be right to catch it when it breaks which can be anywhere from 1-4a in the morning.. It's possible that I will actually have to baby sit and catch it in the act and run the test. Any additional thoughts would be really appreciated!


Thanks again for the reply!


All times are GMT -5. The time now is 04:01 PM.