LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices


Reply
  Search this Thread
Old 03-10-2011, 04:10 PM   #1
markfox
LQ Newbie
 
Registered: Jul 2010
Posts: 23

Rep: Reputation: 1
Underwhelming Linux routing/VLAN/bonding performance


Situation: Upgraded a school network from FastEthernet to GigabitEthernet. Broke the network up into VLANs. Discovered that the router (Cisco 2821) could only route between the VLANs at around 400Mb/s. Tested out some layer three switches. They work very nicely, but are more than we need. So I started putting some spare equipment we had together as a Linux router.

Result: Underwhelmed. The machine has two Intel GigE interfaces. With the machine configured to route between two test VLANs I get about 855Mb/s with a single interface (all VLANs trunked over the single interface). That's about what I'd expect. Maybe a little low. With the two interfaces bonded, I get about the same.

For testing, I set up eight Windows machines, four on each VLAN. The Linux router is the only machine that can route between the two VLANs. I used Iperf to generate traffic and measure throughput between pairs of machines. Two machines on the same VLAN get about 300Mb/s between themselves. With the four machines organized into cross-VLAN pairs, I get about 855Mb/s total throughput on a single interface and very slightly more with two interfaces bonded.

The Linux router has an Intel Xeon E5506 CPU running at 2.13GHz and these are Intel GigE interfaces (built-in). I would expect to get a large boost by adding the second interface. I've confirmed that bonding is working (by pulling either of the cables and watching everything continue to function).

Any ideas?
 
Old 03-10-2011, 05:50 PM   #2
markfox
LQ Newbie
 
Registered: Jul 2010
Posts: 23

Original Poster
Rep: Reputation: 1
Actually, I'm not sure that bonding is working at all.

When doing my tests, I kept track of the byte count on both Ethernet interfaces. They both seemed to go up after each round of tests. So I think it was working. However, now I'm noticing that bonding doesn't work if it is configured with both interfaces connected. Configure it with one interface, then connect the other, all seems well. But configuring them with both connected makes for a no-go.

This has pulled the rug out from under my confidence in bonding.

Honestly, we've had trouble with bonding ever since upgrading beyond Ubuntu 9.04.

I'm going to keep exploring this.
 
Old 03-11-2011, 10:37 AM   #3
markfox
LQ Newbie
 
Registered: Jul 2010
Posts: 23

Original Poster
Rep: Reputation: 1
Some more details: Bonding is definitely behaving strangely. As mentioned before, if a single interface is physically plugged in when the the bonding interface is brought up (ie. sudo /etc/init.d/networking start), it works. Plugging in the second interface at that point works as well. However, bringing up the bonding interface with both interfaces plugged results in no network connectivity. Bringing the interface up by hand also doesn't work. Looking at /proc/net/bonding/bond0, everything is fine with both links up, but there is no IP connectivity. Disconnecting one of the interfaces brings it up. It remains up after plugging it back in.

This is on Ubuntu 10.10. The Ethernet driver is e1000e. A D-Link DGS-1248T switch. I'm using balance-xor mode with the default layer 2 policy.

This worked fine under Ubuntu 9.04 for at least the last year. I'm baffled as to what has changed.

As luck would have it, I'm installing Debian on an identical system and will see what bonding does on it there.
 
Old 03-11-2011, 05:45 PM   #4
markfox
LQ Newbie
 
Registered: Jul 2010
Posts: 23

Original Poster
Rep: Reputation: 1
Same problem with Debian.

I'm still a little baffled as to what has changed. The switches we use (D-Link DGS-1248T) support what they call "trunking", but it isn't 802.3ad compatible. Instead, they are called "static". In any case, this worked fine for the last year.

I know for sure that this worked with the e1000 driver for Intel NICs. I'm reasonably sure that it worked with the e1000e driver as well, but under Ubuntu 9.04.

My thinking is that something changed in the drivers, either for the NICs or bonding, and that has caused a problem with the DGS-1248T's way of "trunking" links together. I'm going to grab a newer switch that supports 802.3ad next week and see if I can get that working.
 
Old 03-11-2011, 06:38 PM   #5
markfox
LQ Newbie
 
Registered: Jul 2010
Posts: 23

Original Poster
Rep: Reputation: 1
Did some more looking into this. Found one of our servers that has been chugging along with bonding working just fine. It is using the same driver (e1000e), kernel version 2.6.31-22, ifenslave version 1.1.0, and is running Ubuntu Server 9.10 (not 9.04 as I had said earlier).

In contrast, the servers that do not work are running kernel version 2.6.32-28, the same version of ifenslave (according to ifenslave --version), and Ubuntu Server 10.04.2 LTS.

When I get a chance, perhaps this weekend, but more likely late next week, I'll try installing Ubuntu 9.04 on one of the server boxes that is giving me guff, just so I can rule out the hardware.
 
Old 03-11-2011, 06:39 PM   #6
agentbuzz
Member
 
Registered: Oct 2010
Location: Texas
Distribution: Debian, Ubuntu, CentOS, RHEL
Posts: 131

Rep: Reputation: 25
Cisco switches that do LACP

Do you have any old Catalyst switches lying around, even with two ports that work? I know 2950s and 2960s talk LACP.
 
Old 03-14-2011, 10:54 AM   #7
markfox
LQ Newbie
 
Registered: Jul 2010
Posts: 23

Original Poster
Rep: Reputation: 1
Quote:
Originally Posted by agentbuzz View Post
Do you have any old Catalyst switches lying around, even with two ports that work? I know 2950s and 2960s talk LACP.
Thanks for the suggestion. Unfortunately we don't. But the switch I'm grabbing does do LACP. The thing that surprises me is that everything worked and now doesn't. I'll have some more time to focus on this on Wednesday.
 
Old 03-19-2011, 09:15 AM   #8
markfox
LQ Newbie
 
Registered: Jul 2010
Posts: 23

Original Poster
Rep: Reputation: 1
Spent some more time on this late in the week. No solution, but I was able to eliminate the switch as the culprit.

I used a slightly better switch (D-Link DGS-1210-48) that supports LACP, but I didn't have to mess with LACP. All I did was install Ubuntu 9.10 on a machine, configure bonding, and hook it up to the same ports that the 10.10 machine was using. It worked exactly as it should. So it's not the switch.

It could be the hardware on the 10.10 machine, so I will install 9.10 on it, just to rule that out.

I'm also working with a kind developer who works on Linux bonding related stuff.
 
Old 03-21-2011, 10:39 AM   #9
markfox
LQ Newbie
 
Registered: Jul 2010
Posts: 23

Original Poster
Rep: Reputation: 1
Installed Ubuntu 9.10 on the same hardware that was failing under Ubuntu 10.10 and configured bonding. It works great. So the problem must be a change in software. My guess is that it is the bonding driver or the e1000e driver for the Intel network interfaces.

If anybody has bonding working under Ubuntu 10.10 or Debian Squeeze, I'd love to hear about it.
 
Old 03-22-2011, 06:05 PM   #10
markfox
LQ Newbie
 
Registered: Jul 2010
Posts: 23

Original Poster
Rep: Reputation: 1
After much fiddling, I may have got bonding working on Ubuntu 10.10 and Debian Squeeze machines.

I was able to get bonding working manually under Ubuntu 10.10 on a little PC Engines box (which has VIA Ethernet interface hardware as opposed to our servers which have Intel Ethernet interfaces). This made me suspect problems in how I had set up /etc/network/interfaces. So I stripped away anything that wasn't strictly necessary and tried again. It started to work. I'm still a little cautious as it's only been working for an hour, but it does seem to survive reboots, having networking being brought down (ie. sudo service networking stop) then back up, and having cables yanked. So far so good.

A little bit of twiddling to get layer3+4 transmit hashing working, followed by some performance testing and this may be a solved problem.

For the curious, here's the relevant parts of my /etc/network/interfaces:

auto bond0
iface bond0 inet static
address 192.168.48.4
netmask 255.255.255.128
gateway 192.168.48.1
bond-slaves none
bond-mode balance-xor
bond-miimon 200

auto eth0
iface eth0 inet manual
bond-master bond0

auto eth1
iface eth1 inet manual
bond-master bond0


I also set /etc/modprobe.d/bonding.conf to the following (but it seems to ignore xmit_hash_policy):

alias bond0 bonding
options mode=balance-xor miimon=200 xmit_hash_policy=1

Last edited by markfox; 03-22-2011 at 06:07 PM.
 
Old 02-11-2015, 01:45 AM   #11
slugman
Member
 
Registered: Jun 2010
Location: AZ
Distribution: Slackware
Posts: 106

Rep: Reputation: 1
markfox

Mark,

I am doing some similar testing in my environment and trying to get bonding to work. I too have a DLINK DGS-1248T. I am trying to get mode:4 (802.3ad-link aggregation), to work but am having limited success.

In your final working example, was this with the original D-Link DGS-1248T switch? Or was this with the new D-Link 1210-48?

Also, in your initial configuration, where you strictly using mode:2 (balance-xor) in the bonding driver? Did you ever experiment with using mode 4?

Also, I am curious: what exactly was your initial working configuration when you had bonding working? From what I gathered:
- Ubuntu 9.04
- Intel e1000e interfaces
- D-Link DGS 1248T Switch

I have two sets of systems that I am currently trying to test bonding between. Although, I am using Slackware 14.1 (which is using the deprecated ifenslave (the linux bonding driver developers now recommend using the sysfs interface). Most of the interfaces are e1000e Intel. I've thought about creating a new post, however I'd like to hear what your thoughts are before proceeding.

Diego

Last edited by slugman; 02-11-2015 at 01:46 AM.
 
Old 02-11-2015, 04:13 AM   #12
slugman
Member
 
Registered: Jun 2010
Location: AZ
Distribution: Slackware
Posts: 106

Rep: Reputation: 1
Well, I just tried this approach using a mode other than 802.3ad (balance-rr aka mode=0), which seems to be working in my case. Everything seems to be working fine.. except there is some weird behavior, namely I can't seem to ssh between the two servers now. I may need to make this into another post.

EDIT: I resolved the ssh issue. I had "hardcoded the HW macaddress to use the same value in my init script.. 00:16:3c:aa:aa:aa... so I simply modified it to 00:16:3c:bb:aa:aa on my second system and that resolved the issue nicely.

Last edited by slugman; 02-11-2015 at 04:26 AM.
 
Old 02-12-2015, 11:17 PM   #13
markfox
LQ Newbie
 
Registered: Jul 2010
Posts: 23

Original Poster
Rep: Reputation: 1
Hi Slugman,

Those were my early, really my first, struggles with bonding. Shortly thereafter, I got it working with layer 3+4 hashing and 802.3ad (mode 4). It worked flawlessly on both the DGS-1248T and DGS-1210-48 switches.

Yes, originally I was using Ubuntu 9.10. I have since used 10.04 and 12.04. There was a problem with 10.04 and 12.04 related to Upstart and having /usr on its own partition. The bonded interface had to be brought up late in the boot process (ie. rc.local) as a work-around and the problem increased the boot/reboot time substantially. I'm not sure whether Ubuntu 14.04 resolved that problem. Except for the one PC Engines ALIX machine for testing, the interfaces were all Intel on the production servers. I also played with bonding under Arch Linux. The machines we were using were Supermicro rack-mount machines that we were quite happy with and were certainly a good value.

I ended up being disappointed with the performance of bonded interfaces on the D-Link switches. We only saw a performance increase by a factor of 1.6 (whether using 2, 3, or 4 links in the group) and that was when we had many machines hammering the server at once. (Our primary goal was to increase the speed of imaging a facilities computers.) With just two machines communicating over a bonded link, performance was the same as with a single Ethernet link (and worse unless using 802.3ad or layer 3+4 hashing). I was able to get some low-end and mid-range HP switches for testing, hoping that they would perform better, but was unable to get time to performance-test bonding in the 30-day window I had. Several people I talked to said that 802.3ad on decent HP switches got them very close to optimal performance, but I was never able to confirm. On the other hand, the redundancy was great. It was wonderful to be able to move cables around on switches without knocking the servers off-line. We also had a port on a DGS-1248T fail and didn't notice a problem for quite some time (months). So from a reliability stand-point, bonding was great.

We also did bonding on the same hardware running pfSense (which turns an ordinary PC into a fairly full-featured router, and quite nice). In pfSense bonding is called LAGG. That worked well, but had the same performance as Linux on the D-Link switches (ie. 1.6X).

We used bonding between multiple D-Link switches as well, thinking that a 4X group would give us 4X the performance between switches. Unfortunately, we witnessed the same performance bottle-neck factor of 1.6.

I would be very interested if anyone found a bonding configuration for Linux, using a low-end GigE switch, that produced a bandwidth increase close to proportional with the number of links. I would be even more impressed if such a feat could be achieved where only two machines were communicating (either via a switch or directly). We were exploring 10GigE and Infiniband, both of which are quite expensive compared to GigE. In our case, time and resources were very tight, so we never got anywhere with the project.


Quote:
Originally Posted by slugman View Post
Mark,

I am doing some similar testing in my environment and trying to get bonding to work. I too have a DLINK DGS-1248T. I am trying to get mode:4 (802.3ad-link aggregation), to work but am having limited success.

In your final working example, was this with the original D-Link DGS-1248T switch? Or was this with the new D-Link 1210-48?

Also, in your initial configuration, where you strictly using mode:2 (balance-xor) in the bonding driver? Did you ever experiment with using mode 4?

Also, I am curious: what exactly was your initial working configuration when you had bonding working? From what I gathered:
- Ubuntu 9.04
- Intel e1000e interfaces
- D-Link DGS 1248T Switch

I have two sets of systems that I am currently trying to test bonding between. Although, I am using Slackware 14.1 (which is using the deprecated ifenslave (the linux bonding driver developers now recommend using the sysfs interface). Most of the interfaces are e1000e Intel. I've thought about creating a new post, however I'd like to hear what your thoughts are before proceeding.

Diego
 
Old 02-15-2015, 11:35 AM   #14
slugman
Member
 
Registered: Jun 2010
Location: AZ
Distribution: Slackware
Posts: 106

Rep: Reputation: 1
Well, your Skill in the ways of the force are superior to mine.

Im not exactly sure what to think. I just finished rationalizing the dgs1248t didn't work w mode 4 (802.3ad), because it simply wasnt designed to support it. I just discovered in the bonding driver docs that trunking is a mfg's homebrew analog to cisco's etherchannel, which is there proprietary implementation of link aggregation. Only difference between the two is etherchanel supports isl and vtp, which is of course only relevant for cisco networks, but thats it.

However the pudding speaks for itself. If you got it to work, then im back to the drawing board, except this time ill know it should be able to function.

u know, im curious, what method did you use in measuring your benchmarks? Id like to run my tests similarly.

Also, fyi when I did enable mode 0 (balace-rr), I saw the transfer speed nearly double in throughput! 1g link transfered at 50MB/s, where as the bonded link (dual 1g slaves), resulted in 94MB/s.

Also, just a thought, if your goal is to speed up image transfers, bonded interfaces only serves part of the equation. If your imaging solution/software/script writes the image to disk as the image transfers, you are chasing smoke because the mechanical drives will always bottleneck the resulting operation. Especially if you are attempting to link 2+ links you are in the realm of creating a que of write operations which cannot keep up w the rate of data being provided over the bonded interface. In other words, the speed of the imaging process will not increase as linearly in proportion to the links you add to the network, precisely because of the time it has to wait to write the data to disk. Again this depends on exactly how the imaging solution you are employing works. I could see you employing a script which utilizes netcat, nfs, and a mounted ramdisk via tmpfs filesystem which could alleviate this bottleneck.

That is why in my testing I wrote data to a ramdisk mounted via tmpfs filesystem, to ensure my transfer rates would not be limited against my storage. A SSD would certainly help alleviate the storage bottleneck, although it doesn't come close to the speed of your RAM.

You can read more about it in my blog if youre interested, check out my latest post:

http://slugman01.blogspot.com

Last edited by slugman; 02-15-2015 at 07:20 PM.
 
Old 02-15-2015, 11:39 AM   #15
slugman
Member
 
Registered: Jun 2010
Location: AZ
Distribution: Slackware
Posts: 106

Rep: Reputation: 1
Also, a thought just occurred to me: can u tell me what hardware revision your dgs1248t was? MMine is hardware a, but there was a newer hardware revision w/ newer firmware in hardware revision b.

Then again, I know this was years ago.. and I just realized you most likely moved on from that assignment..
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Unable to ping/ssh/connect channel bonding from different VLAN albie17 Linux - Networking 10 03-01-2011 01:47 AM
vlan tagging with bonding not reaching gateway pat_33 Linux - Networking 2 01-31-2011 09:25 AM
bonding and vlans. Bonding a vlan interface vs applying vlans to a bond interface JasonCzerak Linux - Networking 0 09-11-2008 09:59 AM
static routes, bonding, vlan not working with RHEL3 onewave Linux - Networking 4 05-13-2005 06:50 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Networking

All times are GMT -5. The time now is 06:16 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration