LinuxQuestions.org - amanda not able to copy BAckup file from holding disk to Tapes

- Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)

- - amanda not able to copy BAckup file from holding disk to Tapes (https://www.linuxquestions.org/questions/linux-server-73/amanda-not-able-to-copy-backup-file-from-holding-disk-to-tapes-807875/)

amanda not able to copy BAckup file from holding disk to Tapes

Hi Guys

I have some issue with my amanda backup server, which is connecting with Scalar Quantum i500 via FC.

I got the error as below 3 days ago.

These dumps were to tape 000289.
*** A TAPE ERROR OCCURRED: [No more writable valid tape found].

Normally I will load the proper tapes and run the amflush to push stuff from the holding disk to tapes manually.
However this time amflush in this case did not help, Amanda immediately responded with an out of tape error again.

Meanwhile I got some errors from dmesg as well
st3: Error 18 (sugg. bt 0x0, driver bt 0x0, host bt 0x0).
scsi1 (0,3,0) : reservation conflict

I could not do any backup for my company in last 3 days now, I've tried whatever I could. please somebody help!!!

Any response will be appreciated.

Thanks millions in advance.

saucer

No need to shout.

Focus on the tape drive. That's where the error is coming from. Fix that, and Amanda will work again. Use mt to see what is happening and forget about Amanda for the moment. If you are getting dmesg errors for st3 indicating scsi issues, then that is the source of your problem.

Start out by reviewing anything that you or anyone else has changed about that system (or bumped into) since it last worked properly. Check all the connections and the configurations. Is someone else messing with the configuration of the tape system and how it connects over FC? Or did you change anything?

Try things like using mt to get the status of the drive. If you can get past that, then you can get Amanda working.

Quote:

Reservation Conflict - The library returns a Reservation Conflict (18h) whenever an initiator attempts to access a logical unit that has been reserved by another initiator

So, chase that clue. Did you set the system up? Do you have notes on the setup and configuration?

Quote:

Originally Posted by choogendyk (Post 3969063)

Also, see if you can find the manual for the tape system and look up troubleshooting and error messages. By googling "scsi reservation conflict Scalar Quantum i500", I found a similar tape system sold by DELL, and it says:

So, chase that clue. Did you set the system up? Do you have notes on the setup and configuration?

thank you very much.
the i500 actually managed by other team, I am using one of library in it and sharing the drive. I haven't chnage anything on my amanda media server, I think you are right , there could be too many hands on that Fxxx thing. I might chase it up just in case someone changed anything without telling others.

anyway this is my mt status as below:
# mt -f /dev/nst3 status
SCSI 2 tape drive:
File number=2, block number=0, partition=0.
Tape block size 0 bytes. Density code 0x44 (no translation).
Soft error count since last status=0
General status bits on (81010000):
EOF ONLINE IM_REP_EN

cheers

So, that mt status actually looks alright. Are you using mtx to move tapes? Can you do an mtx to get the status of the library? If you aren't sure, you should be able to find the scsi device for the library in your amanda.conf as changerdev. Then reference that with -f in an mtx status, the same way you referenced /dev/nst3 in your mt status.

Quote:

Originally Posted by choogendyk (Post 3969476)

yes,, mtx looks fine, i could manually load and unload the tapes.
I got all the slots details and tapes by mtx -f /dev/sg4 status.

but every time I tried amflush, then it return disk full.

weired!

Quote:

Originally Posted by xudonw1 (Post 3971402)

but every time I tried amflush, then it return disk full.

You mean it returns: "[No more writable valid tape found]"?

That's a somewhat generic tape error from Amanda.

Two things:

1. Have you changed anything about your configuration since it was working?

2. Have you looked at /tmp/amanda/ for debug information on the error? The files in there are organized by date-time stamped names within a couple of directories. Find the ones with the time stamp that corresponds to your attempt at amflush. Scan through those looking for specific error messages. In those error messages, make sure the references are to /dev/nst3 and /dev/sg4 as you would expect (and also that those match your amanda.conf entries). Anyway, those debug logs should help you figure out what is going on.

Quote:

Originally Posted by choogendyk (Post 3972015)

Thank you for your response again

According to the logs I got, I could not see anything particularly, only thing is that it just not write anything on the drive.

19:16:36 Config info:
firstslot = "1"
lastslot = "24"
cleanslot = "-1"
cleancycle = "120"
offline_before_unload = "0"
unloadpause = "0"
autoclean = "0"
autocleancount = "99"
havereader = "1"
driveslot = "0"
poll_drive_ready = "10"
initial_poll_delay = "20"
max_drive_wait = "240"
19:16:36 LOADSLOT -> load drive 0 (/dev/nst3) from slot next
19:16:36 STATUS -> currently loaded slot = 11
-> currently loaded barcode = "000300"
19:16:36 EJECT -> ejecting tape from /dev/nst3
19:16:36 STATUS -> currently loaded slot = 11
-> currently loaded barcode = "000300"
19:16:36 EJECT -> moving tape from drive 0 to storage slot 11
19:16:36 Running: mtx unload 11 0
19:17:20 Exit code: 0
Stderr:
Unloading Data Transfer Element into Storage Element 11...done
19:17:20 -> status 0, result "Unloading Data Transfer Element into Storage Element 11...done"
19:17:20 Return (0) -> 11 /dev/nst3
19:17:20 LOADSLOT -> loading tape from slot 12 to drive 0 (/dev/nst3)
19:17:20 Running: mtx load 12 0
19:17:45 Exit code: 0
19:17:45 -> status 0, result ""
19:18:05 Running: /bin/mt -f /dev/nst3 rewind
19:18:05 Exit code: 0
19:18:05 Exit (0) -> 12 /dev/nst3
chg-zd-mtx: pid 28482 finish time Fri May 14 19:18:05 2010

again I just continuously getting conflict error from dmesg about /dev/nst3, which is the drive
Because the i500 is not only used by me, there are 6 drives in the machine and only one is dedicated to me. I am start to thinking it could be already deployed accidently by their fancy shiny netbackup media server. it might takes me few weeks time to chase it up across the teams, you know some times when shit happens , it happens.

thanks,

I've already attached my Amanda server with a LTO-3 drive locally via scsi and it works for this time being.

That debug log appears to be the mtx debug log, and it appears to be alright. Did you look through all the other debug logs. Each process has a debug log, and the time stamps should follow the one you listed above. I'm not sure what the name is of the one you should be looking for, but maybe it would be taper. One of those should have a more explicit error.

I'm not sure how the reservations work on that tape system. Perhaps you can move tapes to the drive but are not allowed to write to it? That would seem odd; but, then, maybe someone who was changing the configuration made a mistake. Since they are on a different team and don't seem to communicate with you, they wouldn't realize they had made a mistake unless you talk to them. But that's speculation. You would have to talk to them to find out.