LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 09-03-2009, 12:10 AM   #1
jjinno
LQ Newbie
 
Registered: Sep 2009
Posts: 23

Rep: Reputation: 16
FC7: eth2 & eth3 come up as __tmp# 60% of the time


I have been troubleshooting this problem for hours now, and the best I can come up with is a workaround.

I am trying to understand why 3/5 reboots results in ETH2 & ETH3 being named (and consequently useless by my software) something like "__tmp509338517".

To give you an example, just so you can also get an idea of the networking hardware specs, etc...

From /var/log/messages (on a GOOD run)...
Code:
Sep  3 10:58:56 localhost kernel: Uniform CD-ROM driver Revision: 3.20
Sep  3 10:58:56 localhost kernel: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.8.5b (Feb 9, 2009)
Sep  3 10:58:56 localhost kernel: GSI 25 sharing vector 0x52 and IRQ 25
Sep  3 10:58:56 localhost kernel: ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 36 (level, low) -> IRQ 82
Sep  3 10:58:56 localhost kernel: eth0: Broadcom NetXtreme II BCM5716 1000Base-T (C0) PCI Express found at mem da000000, IRQ 82, node addr 00:24:e8:5e:d6:60
Sep  3 10:58:56 localhost kernel: GSI 26 sharing vector 0x5A and IRQ 26
Sep  3 10:58:56 localhost kernel: ACPI: PCI Interrupt 0000:01:00.1[B] -> GSI 48 (level, low) -> IRQ 90
Sep  3 10:58:56 localhost kernel: eth1: Broadcom NetXtreme II BCM5716 1000Base-T (C0) PCI Express found at mem dc000000, IRQ 90, node addr 00:24:e8:5e:d6:61
Sep  3 10:58:56 localhost kernel: e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k4
Sep  3 10:58:56 localhost kernel: e1000e: Copyright (c) 1999-2008 Intel Corporation.
Sep  3 10:58:56 localhost kernel: GSI 27 sharing vector 0x62 and IRQ 27
Sep  3 10:58:56 localhost kernel: ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 38 (level, low) -> IRQ 98
Sep  3 10:58:56 localhost kernel: eth2: (PCI Express:2.5GB/s:Width x4) 00:12:32:00:38:fa
Sep  3 10:58:56 localhost kernel: eth2: Intel(R) PRO/1000 Network Connection
Sep  3 10:58:56 localhost kernel: eth2: MAC: 0, PHY: 1, PBA No: 000000-000
Sep  3 10:58:56 localhost kernel: GSI 28 sharing vector 0x72 and IRQ 28
Sep  3 10:58:56 localhost kernel: ACPI: PCI Interrupt 0000:03:00.1[B] -> GSI 45 (level, low) -> IRQ 114
Sep  3 10:58:56 localhost kernel: eth3: (PCI Express:2.5GB/s:Width x4) 00:12:32:00:38:fb
Sep  3 10:58:56 localhost kernel: eth3: Intel(R) PRO/1000 Network Connection
Sep  3 10:58:56 localhost kernel: eth3: MAC: 0, PHY: 1, PBA No: 000000-000
Sep  3 10:58:56 localhost kernel: loop: loaded (max 8 devices)
Sep  3 10:58:56 localhost kernel: floppy0: no floppy controllers found
... and from a BAD boot ...
Code:
Sep  3 11:05:11 localhost kernel: Uniform CD-ROM driver Revision: 3.20
Sep  3 11:05:11 localhost kernel: e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k4
Sep  3 11:05:11 localhost kernel: e1000e: Copyright (c) 1999-2008 Intel Corporation.
Sep  3 11:05:11 localhost kernel: GSI 25 sharing vector 0x52 and IRQ 25
Sep  3 11:05:11 localhost kernel: ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 38 (level, low) -> IRQ 82
Sep  3 11:05:11 localhost kernel: eth0: (PCI Express:2.5GB/s:Width x4) 00:12:32:00:38:fa
Sep  3 11:05:11 localhost kernel: eth0: Intel(R) PRO/1000 Network Connection
Sep  3 11:05:11 localhost kernel: eth0: MAC: 0, PHY: 1, PBA No: 000000-000
Sep  3 11:05:11 localhost kernel: GSI 26 sharing vector 0x62 and IRQ 26
Sep  3 11:05:11 localhost kernel: ACPI: PCI Interrupt 0000:03:00.1[B] -> GSI 45 (level, low) -> IRQ 98
Sep  3 11:05:11 localhost kernel: eth1: (PCI Express:2.5GB/s:Width x4) 00:12:32:00:38:fb
Sep  3 11:05:11 localhost kernel: eth1: Intel(R) PRO/1000 Network Connection
Sep  3 11:05:11 localhost kernel: eth1: MAC: 0, PHY: 1, PBA No: 000000-000
Sep  3 11:05:11 localhost kernel: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.8.5b (Feb 9, 2009)
Sep  3 11:05:11 localhost kernel: GSI 27 sharing vector 0x72 and IRQ 27
Sep  3 11:05:11 localhost kernel: ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 36 (level, low) -> IRQ 114
Sep  3 11:05:11 localhost kernel: eth2: Broadcom NetXtreme II BCM5716 1000Base-T (C0) PCI Express found at mem da000000, IRQ 114, node addr 00:24:e8:5e:d6:60
Sep  3 11:05:11 localhost kernel: GSI 28 sharing vector 0x7A and IRQ 28
Sep  3 11:05:11 localhost kernel: ACPI: PCI Interrupt 0000:01:00.1[B] -> GSI 48 (level, low) -> IRQ 122
Sep  3 11:05:11 localhost kernel: eth3: Broadcom NetXtreme II BCM5716 1000Base-T (C0) PCI Express found at mem dc000000, IRQ 122, node addr 00:24:e8:5e:d6:61
Sep  3 11:05:11 localhost kernel: loop: loaded (max 8 devices)
Sep  3 11:05:11 localhost kernel: floppy0: no floppy controllers found
... and *even though* you can see (BAD run) eth2 and eth3 are clearly found ...
Code:
__tmp509338517 Link encap:Ethernet  HWaddr 00:12:32:00:38:FB
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Memory:df3c0000-df3e0000

__tmp629584774 Link encap:Ethernet  HWaddr 00:12:32:00:38:FA
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Memory:df380000-df3a0000

eth0      Link encap:Ethernet  HWaddr 00:24:E8:5E:D6:60
          inet addr:10.4.140.204  Bcast:10.4.140.255  Mask:255.255.255.0
          inet6 addr: fe80::224:e8ff:fe5e:d660/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:11813 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1874 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3216790 (3.0 MiB)  TX bytes:444801 (434.3 KiB)
          Interrupt:114 Memory:da000000-da012800

eth1      Link encap:Ethernet  HWaddr 00:24:E8:5E:D6:61
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Interrupt:122 Memory:dc000000-dc012800
Now, I have come across various different workarounds, that mostly involve renaming the interface or statically defining the MAC address in the ifcfg-eth? file... unfortunately I'm a bit more curious than the rest of the world, and want to identify the actual problem.

From what I can tell, the only place that an interface can get a name like "__tmp509338517" from is in the source for "rename_device" (this assumption could be my first problem). The problem is that I cannot find where, after modprobe (in rc.sysinit), and before ifup, this change can occur.

And to make matters worse, I have (as of yet) been unable to follow the code-path that allows the workaround "put MAC address in ifcfg-eth?" to even work... as it is suggesting that either the __tmp-like name exists for every interface (to begin with) and successfully changes to eth0/eth1 OR it is suggesting that a good eth2/eth3 are renamed to __tmp *because* they didn't statically define the MAC address...

Appreciate the help,
- J
 
Old 09-03-2009, 05:19 PM   #2
jjinno
LQ Newbie
 
Registered: Sep 2009
Posts: 23

Original Poster
Rep: Reputation: 16
I have (through use of a superfluous number of echoes) determined that prior to calling "/sbin/start_udev" the only interface present is "lo". This is actually completely expected...

Narrowing my focus, I have tracked the interface recognition to the following (part of /sbin/start_udev):

Code:
        rm -f /dev/MODPROBE
        /sbin/udevcontrol env STARTUP=1
        /sbin/udevtrigger
        ret=$[$ret + $?]
        wait_for_queue $(getval udevtimeout $cmdline)
        ret=$[$ret + $?]
        test -e "$MCOLLECT" && /sbin/udevcontrol env MODPROBE_COLLECT=
        unset MCOLLECT
The wait_for_queue() function allows, from what I can tell, the devices to run through the udev rules in whatever order they are queued.

Code:
wait_for_queue() {
        local timeout=${1:-0}
        local ret=0
        if [ $timeout -gt 0 ]; then
            /sbin/udevsettle --timeout=$timeout
        else
            /sbin/udevsettle
        fi
        ret=$?
        if [ $ret -ne 0 ]; then
                echo -n "Wait timeout. Will continue in the background."
        fi
        return $ret;
}
Unfortunately there is the little mystery of udevsettle, and why it is not idempotently naming my Ethernet interfaces. I do notice though (a reference back to my original post) that the "/etc/udev/rules.d/60-net.rules" file specifically references the "/lib/udev/rename_device"... although as of yet, I am not an expert on udev-rules...
 
Old 09-03-2009, 05:46 PM   #3
lazlow
Senior Member
 
Registered: Jan 2006
Posts: 4,362

Rep: Reputation: 169Reputation: 169
Support was dropped for F7 quite a while ago (couple of years?). You might be better off using a supported version(F10 and F11 right now). The versions of the software you are looking at MAY be better able to handle your issues.
 
Old 09-03-2009, 07:36 PM   #4
John VV
Guru
 
Registered: Aug 2005
Location: Ann Arbor Mi.
Distribution: OpenSUSE 11.4 & Scientfic Linux 6.1
Posts: 7,280

Rep: Reputation: 706Reputation: 706Reputation: 706Reputation: 706Reputation: 706Reputation: 706Reputation: 706
device management has IMPROVED A LOT since fedora 7
7 has had NO updates in a long time

please install a supported OS like fedora 11 ( fedora 10 will hit End of Life in a few months)
 
Old 09-03-2009, 07:56 PM   #5
jjinno
LQ Newbie
 
Registered: Sep 2009
Posts: 23

Original Poster
Rep: Reputation: 16
Thanks for the input, but I think I would have skipped asking the question had I the option to just upgrade... but anyway

In any case, I think I have determined the problem and, best of all, the reason WHY the solution (putting HWADDR lines in ifcfg files) is appropriate.

First a good run:
1 – The PCI bus sees the two interface cards in the order Broadcom, and then Intel
2 – udev detects, loads drivers, & facilitates the creation of eth0/eth1 for Broadcom (because it got there first) and eth2/eth3 for Intel (because it got there second)
3 – because on detection of eth2/eth3, there were no eth2/eth3, the names are intact & everybody is happy

Now the bad run:
1 - The PCI bus sees the two interface cards in the order Intel, and then Broadcom
2 – udev detects, loads drivers, & facilitates the creation of eth0/eth1 for Intel (because it got there first) and eth2/eth3 for Broadcom (because it got there second)
3 – upon creation of the Intel interfaces, the application “rename_device” is unable to match a SysFS Hardware Address on “eth0” to the “eth0” MAC address listed in the ifcfg-eth0 file (Note that the MAC address in that file is actually for the Broadcom device)
3 – assuming that the ifcfg-eth* files are Law (which it does) the “rename_device” application promptly renames the conflicting device to “__tmp*” allowing the first Broadcom interface to take up position as eth0
4 – repeat 2 & 3 for eth1, changing Intel eth1 to “__tmp*” and allowing Broadcom to take its place as eth1

The issue (as far as I can see) is simply a hardware race condition. Also, the “rename_device” code is far from robust (in FC7 at least), and does not attempt to make better names for the “__tmp*” interfaces. From this point of view, the solution of putting the MAC addresses in the ifcfg-eth* files is actually the correct approach because it simply says “regardless of what order your device arrives in on the PCI bus, you should be named X if you have MAC address Y”…
 
Old 09-03-2009, 08:31 PM   #6
lazlow
Senior Member
 
Registered: Jan 2006
Posts: 4,362

Rep: Reputation: 169Reputation: 169
Why is upgrading out?
 
Old 09-03-2009, 09:40 PM   #7
John VV
Guru
 
Registered: Aug 2005
Location: Ann Arbor Mi.
Distribution: OpenSUSE 11.4 & Scientfic Linux 6.1
Posts: 7,280

Rep: Reputation: 706Reputation: 706Reputation: 706Reputation: 706Reputation: 706Reputation: 706Reputation: 706
the shear numbers of changes from fedora 7 to fedora 11

even the fedora devs state that a FRESH install is the BEST way
running preupgrade from yum to upgrade fedora 10 to fedora 11 dose not always work
so going from 7 to 11 will not work .

if you want to spend a few days straining out a busted system, then go ahead and try . But don't expect it to run or boot .
 
Old 09-04-2009, 02:16 AM   #8
jjinno
LQ Newbie
 
Registered: Sep 2009
Posts: 23

Original Poster
Rep: Reputation: 16
The upgrade is out... either because of a really long explanation, or a really short one... lets try the short one.

This is part of a product that my company ships. Way newer kernel, mostly updated networking stuff like DHCP, but almost other tidbits are FC7 stock. A Dell hardware change alerted us to this quirk, but as Dell has hard EOLs that do not necessarily follow other companies software release schedules... Hey, c'est la vie!
 
Old 09-04-2009, 07:40 AM   #9
lazlow
Senior Member
 
Registered: Jan 2006
Posts: 4,362

Rep: Reputation: 169Reputation: 169
If you are running a current kernel(and networking stuff) on F7 you are not running F7. There is no way to predict what kind of strange interactions you have going on.

Any idea why they did not choose a long term support distro (Centos/RHEL come to mind) to base the product on? It would have greatly simplified your life.
 
Old 09-04-2009, 12:12 PM   #10
jjinno
LQ Newbie
 
Registered: Sep 2009
Posts: 23

Original Poster
Rep: Reputation: 16
Without knowing the details, I can only speculate. I believe the newer products all standardize around RHEL, but this whole problem only surfaced when trying to support old code on new boxes anyway...
 
Old 09-04-2009, 12:22 PM   #11
lazlow
Senior Member
 
Registered: Jan 2006
Posts: 4,362

Rep: Reputation: 169Reputation: 169
Ok, when you say new boxes, do you mean hardware that came out after F7 or do you mean doing a fresh install on a different box?
 
Old 09-04-2009, 03:21 PM   #12
John VV
Guru
 
Registered: Aug 2005
Location: Ann Arbor Mi.
Distribution: OpenSUSE 11.4 & Scientfic Linux 6.1
Posts: 7,280

Rep: Reputation: 706Reputation: 706Reputation: 706Reputation: 706Reputation: 706Reputation: 706Reputation: 706
i suspect that it is new hardware that came out after fedora 7

your only answere is this

if you stay with fedora 7 then
do a full rewrite of the fedora 7 code base to get it compatible with the new hardware

this is the MAIN problem of the VERY POOR business idea of using fedora in the first place WITH OUT upgrading it every 6 months to the new version and installing the hundreds of "updates" that every fedora release has.

Last edited by John VV; 09-04-2009 at 03:23 PM.
 
Old 12-06-2009, 09:35 PM   #13
glitch1369
LQ Newbie
 
Registered: Dec 2009
Posts: 1

Rep: Reputation: 1
Thank You

Quote:
Originally Posted by jjinno View Post
Now the bad run:
1 - The PCI bus sees the two interface cards in the order Intel, and then Broadcom
2 – udev detects, loads drivers, & facilitates the creation of eth0/eth1 for Intel (because it got there first) and eth2/eth3 for Broadcom (because it got there second)
3 – upon creation of the Intel interfaces, the application “rename_device” is unable to match a SysFS Hardware Address on “eth0” to the “eth0” MAC address listed in the ifcfg-eth0 file (Note that the MAC address in that file is actually for the Broadcom device)
3 – assuming that the ifcfg-eth* files are Law (which it does) the “rename_device” application promptly renames the conflicting device to “__tmp*” allowing the first Broadcom interface to take up position as eth0
4 – repeat 2 & 3 for eth1, changing Intel eth1 to “__tmp*” and allowing Broadcom to take its place as eth1

The issue (as far as I can see) is simply a hardware race condition. Also, the “rename_device” code is far from robust (in FC7 at least), and does not attempt to make better names for the “__tmp*” interfaces. From this point of view, the solution of putting the MAC addresses in the ifcfg-eth* files is actually the correct approach because it simply says “regardless of what order your device arrives in on the PCI bus, you should be named X if you have MAC address Y”…
I just wanted to thank you for your troubleshooting and follow-through. I think most folks don't go back to a place they'd posted a question and document the solution they found. After reading your solution, I was able to track it down on my system here. Prior to reading your post, the only information I'd found related to wireless cards and reloading modules, something we don't have in our datacenter (wireless, that is). And I'm only counting machines that my own department works with...we've got many systems in datacenters around the world. Just keeping the systems up to date with the latest versions of *everything* could be a full-time job for a not-too-small department.

With regard to those that promote the upgrade route, well, that's something that's easy to do in a home or smaller environment but when you're running a Production Spec machine, any upgrade involves QA and Development testing and then the upgrading itself. In your environment, you have customer machines to take into account as well.

In our environment, we're roughly 50/50 Solaris/Linux. Our Linux machines numbered 574 at last count, and there are probably a couple hundred machines in one of our new datacenters that aren't in production yet so aren't in that count. We don't have the manpower to devote to upgrading/testing the Dev/QA machines (194 of 'em), much less the existing Production servers and then many, many new builds. I'm only counting machines that my own department manages...we've got many systems in datacenters around the world. Just keeping the systems up to date with the latest versions of *everything* could be a full-time job for a not-too-small department, with on-hand employees around the world.

In a perfect world, we'd all be able to keep our firmware (something most don't think about until there's a problem) and software up to date, but when the manpower doesn't exist for a project of that scale, it's hard to convince Management to spend the funds until there's a major issue which *requires* an upgrade. Patching, yes. OS reloads, not as easy to do.

</SOAPBOX>

Thank you for your follow-through and thank you for the information.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to determine which physical interface corresonds to eth0, eth1, eth2, and eth3 karrj Linux - Newbie 5 05-01-2008 07:44 AM
error message: In=eth3 OUT=eth3 SRc=10.0.0.... yourfriend Linux - Networking 3 04-06-2008 08:18 PM
FC7 and a time stopping Perlover Fedora 2 04-01-2008 11:01 PM
Resetting network interface to eth1 & eth2 kushalkoolwal Debian 15 08-24-2007 10:01 AM
how to reset eth2 eth3 on SUSE ??? achulxp Linux - Hardware 2 12-31-2006 01:30 AM


All times are GMT -5. The time now is 02:43 PM.

Main Menu
 
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration