amanda not able to copy BAckup file from holding disk to Tapes
Hi Guys
I have some issue with my amanda backup server, which is connecting with Scalar Quantum i500 via FC. I got the error as below 3 days ago. These dumps were to tape 000289. *** A TAPE ERROR OCCURRED: [No more writable valid tape found]. Normally I will load the proper tapes and run the amflush to push stuff from the holding disk to tapes manually. However this time amflush in this case did not help, Amanda immediately responded with an out of tape error again. Meanwhile I got some errors from dmesg as well st3: Error 18 (sugg. bt 0x0, driver bt 0x0, host bt 0x0). scsi1 (0,3,0) : reservation conflict I could not do any backup for my company in last 3 days now, I've tried whatever I could. please somebody help!!! Any response will be appreciated. Thanks millions in advance. saucer |
No need to shout.
Focus on the tape drive. That's where the error is coming from. Fix that, and Amanda will work again. Use mt to see what is happening and forget about Amanda for the moment. If you are getting dmesg errors for st3 indicating scsi issues, then that is the source of your problem. Start out by reviewing anything that you or anyone else has changed about that system (or bumped into) since it last worked properly. Check all the connections and the configurations. Is someone else messing with the configuration of the tape system and how it connects over FC? Or did you change anything? Try things like using mt to get the status of the drive. If you can get past that, then you can get Amanda working. |
Also, see if you can find the manual for the tape system and look up troubleshooting and error messages. By googling "scsi reservation conflict Scalar Quantum i500", I found a similar tape system sold by DELL, and it says:
Quote:
|
Quote:
the i500 actually managed by other team, I am using one of library in it and sharing the drive. I haven't chnage anything on my amanda media server, I think you are right , there could be too many hands on that Fxxx thing. I might chase it up just in case someone changed anything without telling others. anyway this is my mt status as below: # mt -f /dev/nst3 status SCSI 2 tape drive: File number=2, block number=0, partition=0. Tape block size 0 bytes. Density code 0x44 (no translation). Soft error count since last status=0 General status bits on (81010000): EOF ONLINE IM_REP_EN cheers |
So, that mt status actually looks alright. Are you using mtx to move tapes? Can you do an mtx to get the status of the library? If you aren't sure, you should be able to find the scsi device for the library in your amanda.conf as changerdev. Then reference that with -f in an mtx status, the same way you referenced /dev/nst3 in your mt status.
|
Quote:
I got all the slots details and tapes by mtx -f /dev/sg4 status. but every time I tried amflush, then it return disk full. weired! |
Quote:
That's a somewhat generic tape error from Amanda. Two things: 1. Have you changed anything about your configuration since it was working? 2. Have you looked at /tmp/amanda/ for debug information on the error? The files in there are organized by date-time stamped names within a couple of directories. Find the ones with the time stamp that corresponds to your attempt at amflush. Scan through those looking for specific error messages. In those error messages, make sure the references are to /dev/nst3 and /dev/sg4 as you would expect (and also that those match your amanda.conf entries). Anyway, those debug logs should help you figure out what is going on. |
Quote:
According to the logs I got, I could not see anything particularly, only thing is that it just not write anything on the drive. 19:16:36 Config info: firstslot = "1" lastslot = "24" cleanslot = "-1" cleancycle = "120" offline_before_unload = "0" unloadpause = "0" autoclean = "0" autocleancount = "99" havereader = "1" driveslot = "0" poll_drive_ready = "10" initial_poll_delay = "20" max_drive_wait = "240" 19:16:36 LOADSLOT -> load drive 0 (/dev/nst3) from slot next 19:16:36 STATUS -> currently loaded slot = 11 -> currently loaded barcode = "000300" 19:16:36 EJECT -> ejecting tape from /dev/nst3 19:16:36 STATUS -> currently loaded slot = 11 -> currently loaded barcode = "000300" 19:16:36 EJECT -> moving tape from drive 0 to storage slot 11 19:16:36 Running: mtx unload 11 0 19:17:20 Exit code: 0 Stderr: Unloading Data Transfer Element into Storage Element 11...done 19:17:20 -> status 0, result "Unloading Data Transfer Element into Storage Element 11...done" 19:17:20 Return (0) -> 11 /dev/nst3 19:17:20 LOADSLOT -> loading tape from slot 12 to drive 0 (/dev/nst3) 19:17:20 Running: mtx load 12 0 19:17:45 Exit code: 0 19:17:45 -> status 0, result "" 19:18:05 Running: /bin/mt -f /dev/nst3 rewind 19:18:05 Exit code: 0 19:18:05 Exit (0) -> 12 /dev/nst3 chg-zd-mtx: pid 28482 finish time Fri May 14 19:18:05 2010 again I just continuously getting conflict error from dmesg about /dev/nst3, which is the drive Because the i500 is not only used by me, there are 6 drives in the machine and only one is dedicated to me. I am start to thinking it could be already deployed accidently by their fancy shiny netbackup media server. it might takes me few weeks time to chase it up across the teams, you know some times when shit happens , it happens. thanks, I've already attached my Amanda server with a LTO-3 drive locally via scsi and it works for this time being. |
That debug log appears to be the mtx debug log, and it appears to be alright. Did you look through all the other debug logs. Each process has a debug log, and the time stamps should follow the one you listed above. I'm not sure what the name is of the one you should be looking for, but maybe it would be taper. One of those should have a more explicit error.
I'm not sure how the reservations work on that tape system. Perhaps you can move tapes to the drive but are not allowed to write to it? That would seem odd; but, then, maybe someone who was changing the configuration made a mistake. Since they are on a different team and don't seem to communicate with you, they wouldn't realize they had made a mistake unless you talk to them. But that's speculation. You would have to talk to them to find out. |
All times are GMT -5. The time now is 12:28 PM. |