LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Other *NIX Forums > AIX
User Name
Password
AIX This forum is for the discussion of IBM AIX.
eserver and other IBM related questions are also on topic.

Notices


Reply
  Search this Thread
Old 11-14-2014, 11:59 AM   #1
Mark_667
Member
 
Registered: Aug 2005
Location: Manchester, England
Distribution: Ubuntu 20.04
Posts: 383

Rep: Reputation: 30
Exclamation Hang at Reference Code 0518


I have a problem with the AIX61 VM where it went uncontactable and I was unable to ping it. When I rebooted it it got to Reference Code 0518 and stayed there.

I followed the below article and booted to a diagnostic mode. It checked the 2 file systems it found and they came back OK. When I tried to run a fsck myself it came back with an unknown command error so I had to just boot it back into normal mode.

Repairing File Systems with fsck in AIX (LED 517 or 518)
http://www-01.ibm.com/support/docvie...d=isg3T1000131

It's still hanging at 0518, any advice would be appreciated.
 
Old 11-14-2014, 06:24 PM   #2
wingnut64
Member
 
Registered: Sep 2004
Distribution: AIX, RHEL, Ubuntu
Posts: 51

Rep: Reputation: 23
Quote:
Originally Posted by Mark_667 View Post
It checked the 2 file systems it found and they came back OK. When I tried to run a fsck myself it came back with an unknown command error so I had to just boot it back into normal mode.
By that do you mean you ran 'fsck' by itself, it found issues with 2 filesystems but when you ran it immediately again it failed?

If /etc/filesystems (like /etc/fstab from linux, but in a paragraph format) is corrupted or misconfigured you can get LED 518; see step 10 for how you can access it. Note if you do all those steps you will overwrite it with a bare bones version from the boot media, it'll need to be manually fixed later. If you edited it by hand, look for typos or anything out of place. You can try (after you back it up!) removing any recently created non-stock filesystems to see if one of those is causing the problem.

Did you try anything related to recreating the JFS log? IIRC that command doesn't work with JFS2 filesystems, which you are probably (hopefully!) using if you're running TL9.

What kind of storage is rootvg on? Are you using VIOS?

A tip for working on the console: Instead of using the Java console window, you can SSH to the HMC and run 'vtmenu', then select the machine and LPAR and open a console that way. The advantage of this is that now you can use PuTTY's logging option (or an equivalent command under *nix) to capture the entire session output. Handy to have so you can keep it for later, post it or share with IBM.
 
Old 11-15-2014, 02:10 PM   #3
Mark_667
Member
 
Registered: Aug 2005
Location: Manchester, England
Distribution: Ubuntu 20.04
Posts: 383

Original Poster
Rep: Reputation: 30
Quote:
By that do you mean you ran 'fsck' by itself, it found issues with 2 filesystems but when you ran it immediately again it failed?
No, FSCK was run automatically without prompting and it came back clean on the 2 file systems scanned. It was when I tried invoking it myself when I got to step 8 from a prompt that it gave a command not found error.

I stopped at step 9 because it was talking about formatting things, but step 10 looks non-destructive and worth a shot.

Quote:
Did you try anything related to recreating the JFS log?
Nope, I'm new to AIX and didn't even know you could.

Quote:
What kind of storage is rootvg on? Are you using VIOS?
It's on local SAS disks (which makes me wonder about why it's complaining about not being able to mount over a network). No idea about VIOS I'm afraid.
Quote:
518 Remote mount of the root and /usr filesystems failed during network boot.
Quote:
A tip for working on the console: Instead of using the Java console window, you can SSH to the HMC and run 'vtmenu', then select the machine and LPAR and open a console that way. The advantage of this is that now you can use PuTTY's logging option (or an equivalent command under *nix) to capture the entire session output.
Thanks for the tip, that'd definitely make it easier to work with.
 
Old 11-15-2014, 03:48 PM   #4
wingnut64
Member
 
Registered: Sep 2004
Distribution: AIX, RHEL, Ubuntu
Posts: 51

Rep: Reputation: 23
Quote:
Originally Posted by Mark_667 View Post
I stopped at step 9 because it was talking about formatting things, but step 10 looks non-destructive and worth a shot.

Nope, I'm new to AIX and didn't even know you could.
That step is for re-creating the filesystem journal, which is OK because it only journals data while the filesystem is online. If it isn't mounted there is no data of interest in it.

Quote:
It's on local SAS disks (which makes me wonder about why it's complaining about not being able to mount over a network). No idea about VIOS I'm afraid.
If it is using internal SAS disks that eliminates a lot of the more exotic boot problems. That's a somewhat misleading description, you can get that even if the filesystems are local.

Do you have any NFS mounts defined? If so try commenting them out in /etc/filesystems. I'm trying to remember if an unreachable NFS server will give 518 or a different error. If you mount it without the background 'bg' option it can prevent booting in some cases.

You might have to use the absolute paths for some commands (/usr/sbin/fsck?).

If you boot it with a console attached, how far do you get into the boot process and what are the last messages printed? It might also be worth giving this a shot: Capturing Debug Boot Output With An HMC
 
Old 11-17-2014, 05:46 AM   #5
Mark_667
Member
 
Registered: Aug 2005
Location: Manchester, England
Distribution: Ubuntu 20.04
Posts: 383

Original Poster
Rep: Reputation: 30
I've attached a putty log of what happens when I try a normal boot, it never gets further than this.

I didn't try recreating /etc/filesystems because it looked fine to me. The file was present, didn't have any corrupt characters, etc.

Code:
Checking the / filesystem

The current volume is: /dev/hd4
Primary superblock is valid.
JS_LOGREDO: log redo processing for /dev/hd4
Primary superblock is valid.
Checking the /usr filesystem



The current volume is /dev/hd2
Primary superblock is valid.



#cat /etc/filesystems
These all looked the same with a vfs of jfs2 and log of /dev/hd8. The only thing that changed were the dev locations. I've only included these for the sake of brevity.

Code:
/:
	dev	/dev/hd4
/usr:
	dev /hd2
/var:
	dev /hd9var
/tmp:
	dev /dev/hd3
/home:
	dev /hd1
/opt:
	dev /dev/hd10opt
/admin:
	dev /dev/hd11admin
/var/adm/ras/livedump:
	dev /dev/livedump

cat /etc/vfstab gave a file not found error as did trying to cat /etc/mnttab

lslv -l rootvg
Unable to find rootvg in the Device Configuration Database
The only thing I can attribute this problem to is that when jumping through hoops trying to get Oracle installed I increased the swap size by 16GB (from 512MB) to match the amount of physical memory. Is there a way to undo this from the maintenance shell?
Attached Files
File Type: txt Putty output.txt (5.5 KB, 27 views)
 
Old 11-17-2014, 01:00 PM   #6
wingnut64
Member
 
Registered: Sep 2004
Distribution: AIX, RHEL, Ubuntu
Posts: 51

Rep: Reputation: 23
Quote:
cat /etc/vfstab gave a file not found error as did trying to cat /etc/mnttab

lslv -l rootvg
Unable to find rootvg in the Device Configuration Database[/CODE]
Those are expected; /etc/vfstab & mnttab are Solaris specific, and the command you want there is 'lsvg' (to list the Logical Volumes in the rootvg Volume Group); lslv lists information about specific Logical Volumes.

Quote:
The only thing I can attribute this problem to is that when jumping through hoops trying to get Oracle installed I increased the swap size by 16GB (from 512MB) to match the amount of physical memory. Is there a way to undo this from the maintenance shell?
Did you create a new swap space or extend the existing one? And did you use chps or smit -> Storage -> Logical Volume -> Paging space? If you used regular LVM commands to create or resize it that might have caused a problem.

/etc/swapspaces is used to specify what paging spaces are automatically activated at bootup. I'm not sure how you could easily alter paging space in maintenance mode but if you created a new paging space it should be easy enough to yank it out of that file.

I believe though there should be some other messages written to the console after the "Welcome to AIX" banner before paging space is activated. Maybe try booting with debug output and see if it offers anything else.
 
Old 11-18-2014, 04:05 AM   #7
Mark_667
Member
 
Registered: Aug 2005
Location: Manchester, England
Distribution: Ubuntu 20.04
Posts: 383

Original Poster
Rep: Reputation: 30
Quote:
Did you create a new swap space or extend the existing one? And did you use chps or smit -> Storage -> Logical Volume -> Paging space? If you used regular LVM commands to create or resize it that might have caused a problem.
Interesting, I didn't know you had to use certain commands for certain filesystems. I did:
Code:
extendlv hd6 16G
To check swap space:
Code:
$ lsps -a
Page Space      Physical Volume   Volume Group Size %Used Active  Auto  Type Chksum
hd6             hdisk0            rootvg       16896MB     0   yes   yes    lv     0
$
 
Old 11-18-2014, 08:42 AM   #8
Mark_667
Member
 
Registered: Aug 2005
Location: Manchester, England
Distribution: Ubuntu 20.04
Posts: 383

Original Poster
Rep: Reputation: 30
I booted to the Open Firmware OK Prompt and issued the following command:
Code:
boot -s verbose
The current volume is: /dev/hd4
Primary superblock is valid.
Primary superblock is valid.

Code:
+ print 0
+ 1> /tmp/rc
+ read rc
+ 0< /tmp/rc
+ [ 0 -ne 0 ]
+ echo rc.boot: executing "mount /"
+ 1>> /tmp/boot_log
+ tee -a /../tmp/boot_log
+ mount -f /
exec(/usr/bin/tee,-a,/../tmp/boot_log){1310800,1245224}
+ 2>& 1
exec(/usr/sbin/mount,-f,/){655388,1441856}
exec(/sbin/helpers/jfs2/mount,-V,jfs2,-o,rw,log=/dev/hd8,/dev/hd4,/){1376310,655388}
+ print 0
+ 1> /../tmp/rc
+ read rc
+ 0< /../tmp/rc
+ [ 0 -ne 0 ]
+ [ -d /dev.org ]
+ /usr/lib/boot/mergedev
+ 2>& 1
+ /../usr/bin/tee -a /../tmp/boot_log
exec(/../usr/bin/tee,-a,/../tmp/boot_log){655390,1245224}
exec(/usr/lib/boot/mergedev){1310802,1245224}
mergedev replaced 0 files in the hardfile /dev directory
+ /../usr/lib/methods/showled 0x517 MOUNT /USR
exec(/../usr/lib/methods/showled,0x517,MOUNT /USR){655392,1245224}
 showled + echo rc.boot: executing "fsck -fp /usr"
+ 1>> /../tmp/boot_log
+ /../usr/sbin/fsck -fp /usr
+ 2>& 1
exec(/../usr/sbin/fsck,-fp,/usr){655394,1245224}
+ /../usr/bin/tee -a /../tmp/boot_log
exec(/../usr/bin/tee,-a,/../tmp/boot_log){655396,1245224}
Cannot open /etc/filesystems
: /usr is not a known file system
+ echo rc.boot: executing "mount /usr"
+ 1>> /../tmp/boot_log
+ /../usr/sbin/mount /usr
+ 2>& 1
exec(/../usr/sbin/mount,/usr){1310804,655398}
+ print 1
+ 1> /../tmp/rc
+ /../usr/bin/tee -a /../tmp/boot_log
exec(/../usr/bin/tee,-a,/../tmp/boot_log){655400,1245224}
AFopen failed: No such file or directory
mount: /usr is not a known file system
+ read rc
+ 0< /../tmp/rc
+ [ 1 -ne 0 ]
+ loopled 0x518 /USR MNT FAILED
exec(/usr/lib/methods/showled,0x518,/USR MNT FAILED){655402,1245224}
 showled
I can cat /etc/filesystems from a maintenance shell, could it be a permissions issue?
 
Old 11-18-2014, 02:21 PM   #9
wingnut64
Member
 
Registered: Sep 2004
Distribution: AIX, RHEL, Ubuntu
Posts: 51

Rep: Reputation: 23
Okay, it looks like this is while running the following code in /sbin/rc.boot, which is the second entry in your /etc/inittab and is ran right after init is launched:

Code:
# Mount /usr
/../usr/lib/methods/showled 0x517 "MOUNT /USR"
echo "rc.boot: executing \"fsck -fp /usr\"" \
        >>/../tmp/boot_log
/../usr/sbin/fsck -fp /usr 2>&1 | \
        /../usr/bin/tee -a /../tmp/boot_log
echo "rc.boot: executing \"mount /usr\"" \
        >>/../tmp/boot_log
{ /../usr/sbin/mount /usr 2>&1; \
        print $? >/../tmp/rc; } | \
        /../usr/bin/tee -a /../tmp/boot_log
read rc </../tmp/rc
[ "$rc" -ne 0 ] && loopled 0x518 "/USR MNT FAILED"
So it is failing while trying to fsck /usr, not necessarily because /usr is damaged but because it does not know about it because /etc/filesystems is not accessible. I've seen other causes of issues where startup scripts do weird things like namefs mounts over top of /etc/filesystems, but this is happening so early in the boot process it sounds like the file is just screwed up.

Just to confirm, when you look at /etc/filesystems after booting in maintenance mode, you're selecting the option to import rootvg before mounting filesystems, explicitly mounting /dev/hd4 on a temporary directory, and then checking etc/filesystems under that temp mount? Otherwise you will be looking at a default file that's on the boot media.

Even if looks OK I would try recreating the file from IBM's troubleshooting guide. As long as you make a backup copy you can always fall back, and if this fixes the issue you can add any new filesystems you created to it by hand if you need to.

Quote:
Step 10:
If your system is hanging at LED 517 or 518 during a Normal mode boot, it is possible the /etc/filesystems file is corrupt or missing. To temporarily replace the disk-based /etc/filesystems file, run the following commands:

mount /dev/hd4 /mnt
mv /mnt/etc/filesystems /mnt/etc/filesystems.[MMDDYY]
cp /etc/filesystems /mnt/etc/filesystems
umount /mnt

MMDDYY represents the current two-digit representation of the Month, Day and Year, respectively.
If you make any changes while in maintenance mode I always run the 'sync' command to flush everything to disk, otherwise they might get lost if you forcibly restart the LPAR right after.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
undefined reference to `cyg_profile_func_enter' in FORTRAN code Thomas_G_Cook Linux - Newbie 0 11-04-2014 02:46 PM
LXer: Wayland Reference Code Being Re-Licensed LXer Syndicated Linux News 0 09-20-2011 08:12 PM
Qt - How to reference a widget in code binary_pearl Programming 2 08-08-2011 06:33 AM
c++ code syntax help, : Passing by Reference or val ?? vikasumit Programming 4 06-15-2006 08:54 AM
Need Assembly Code Reference Table marky782 Linux - General 6 02-16-2004 06:53 PM

LinuxQuestions.org > Forums > Other *NIX Forums > AIX

All times are GMT -5. The time now is 06:24 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration