LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 04-04-2014, 12:48 PM   #1
spoonervt
LQ Newbie
 
Registered: May 2009
Posts: 11

Rep: Reputation: 0
reading 'unknown' data from a tape


This is the issue:
I want to duplicate an LTO tape to create a monthly backup tape and reuse the original tape to keep sequentially numbered tape for the library automation.

This is what I have:
A daily backup consists of a tape file written with 'tar' which documents the current version of the backup scripts and some more information (# hosts backed up, encryption key reference etc).
Following this is a varying number of encrypted data sets, written with dd.

All commands are using the 'no-rewind device' to keep the tape in its current position. On the next day this procedure repeats itself (except for that its now an incremental backup).

After 1 week, the library moves on to a new tape.

To duplicate a tape, I can write a script, that reads the tar file, parses the information and uses as many dd commands to read the next files for the day, read the next day's tar file ... quite some scripting that is error prone and I feel should not be needed.

I was expecting that I can use 'dd' to read the tar file off the disk and just dump it on the next tape again with dd. This way I would just use dd to read from the original and dump it directly on the copy. (I have 2 drives in the library)

That does not work, though.
'dd' seems incapable of reading any data from the tape that was written with tar. Instead of data the reflect the tar data, I get no data back at all. If I repeat dd commands 4 more times I eventually get the encrypted data that I originally wrote with the dd command.

So it looks like tar creates a data format on the tape, that dd can not get any data from, but after 4 unsuccessful read attempts (0 bytes each) I finally can read the data that were written with dd after the tar. So even when no data are read, the tape seems to be moving on to a new file header.
That pattern repeats itself.

Is there any way to perform a 'dumb' copy that is unaware of the actual data structure on the tape?

Sorry for the long question.
-Steffen
 
Old 04-04-2014, 02:18 PM   #2
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,138

Rep: Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263
I haven't messed with tape in quite a while but we used to have to give the correct block size for dd to work. Is that still true?
 
1 members found this post helpful.
Old 04-04-2014, 02:46 PM   #3
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,776

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
Each of those encrypted data sets is a separate file on the tape, each followed by a file mark.** dd is never going to read past a file mark, so you're always going to need a script to copy the tape. It doesn't need to be complex, though. Just keep reading files with dd until you get 0 bytes transferred and "mt status" indicates that you are at EOD, though it might still be a good idea to do your current "error prone" parsing just to verify that it is consistent with what you are reading from the tape.

** I'm not sure that LTO tapes work that way, but the effect will still be the same.
 
1 members found this post helpful.
Old 04-04-2014, 03:02 PM   #4
spoonervt
LQ Newbie
 
Registered: May 2009
Posts: 11

Original Poster
Rep: Reputation: 0
>>I haven't messed with tape in quite a while but we used to have to give the correct block size for dd to work. Is that still true?

dd reads data to the next file marker - this is the easy part. This is why I though it would also be able to just slurp in the tar data without even checking.

I can easily read the encrypted data that were written with dd - I just don't get *anything* from the tar written data.
Since this section contains the reference to the encryption key used I need that.

I could have written the tarfile (created from disk) with a dd command - if I had known ;-( . That does not help me with my current issue.


I could buy a new set of number barcodes labels and relabel the tapes. I just wasn't expecting this to be an issue. I use dd for disk and partition cloning and expected it to work as easy on tapes.

>>Each of those encrypted data sets is a separate file on the tape, each followed by a file mark.** dd is never going to read past a file mark, so you're always going to need a script to copy the tape.

I expected that to be a very brief script: write a loop that repeats dd commands until it doesn't return any more data => end of tape.
 
Old 04-04-2014, 03:19 PM   #5
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,776

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
If the tar files were written with tar's default blocking factor (20 * 512 = 10240 bytes) and you are trying to read them while the drive is set for a larger fixed block size, those reads may fail. There should be something in /var/log/messages about that. My lack of knowledge about LTO tapes is a problem here. What was the dd blocksize for reading/writing the encrypted files? If that was just the default "bs=512", that is probably going to fail for the larger blocks from tar. It might be sufficient just to increase the blocksize for the copying operation to the largest one expected (dd will not try to perform multiple reads to fill that size unless you specifically request it, so all the output blocks will match the sizes of the input blocks).
 
1 members found this post helpful.
Old 04-05-2014, 04:15 PM   #6
spoonervt
LQ Newbie
 
Registered: May 2009
Posts: 11

Original Poster
Rep: Reputation: 0
>>It might be sufficient just to increase the blocksize for the copying operation to the largest one expected (dd will not try to perform multiple reads to fill that size unless you specifically request it, so all the output blocks will match the sizes of the input blocks).

That is doing the trick, thanks. I wasn't even thinking of checking /var/log/messages, rather expecting an error message directly to STDERR.

I don't know what the block size picked by 'tar' or 'dd' is, but if I set it to 15000 (i.e. dd bs=15000 if=....., that was the first test and it worked right away) I can read the tar and dd files. If I dump them to a disk file, I can use tar to get the information from the tar created data, and use the reverse of the command chain that I used to create the encrypted data to decrypt those data as well.
I have not tested to perform a tape-2-tape copy (i.e. 'dd bs=15000 if=/dev/nst0 of=/dev/nst1') but I would assume that it works - and of cause test the correctness of the new data tape before reusing the original

Is there a good command line command that I could use to verify data, like 'diff (read data from tape0) (read data from tape1)'?
 
Old 04-05-2014, 10:25 PM   #7
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,776

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
Easiest way to find the actual block size for a given file on the tape is to run
Code:
dd if=/dev/nst0 of=/dev/null bs=64k count=1
and look at the number of bytes dd reports for that single block.

Most basic way to compare:
Code:
cmp <(dd if=/dev/nst0 bs=32k) <(dd if=/dev/nst1 bs=32k) && echo OK
Adjust the block size as you wish, of course, as long as it is large enough.
 
1 members found this post helpful.
Old 04-06-2014, 09:09 AM   #8
spoonervt
LQ Newbie
 
Registered: May 2009
Posts: 11

Original Poster
Rep: Reputation: 0
thanks a lot,
I'll try that the next time both tape drives are idle.
-Steffen
 
Old 04-09-2014, 10:56 PM   #9
spoonervt
LQ Newbie
 
Registered: May 2009
Posts: 11

Original Poster
Rep: Reputation: 0
I'm back with more questions after I tried to apply the postings from above.

I need to read with the blocksize that the tape was written or the performance is so poor that I'll have to wait for weeks (~100kB/s data rate).

I wrote a script that does the following:

'dd if=/dev/nst0 of=/dev/null bs=64k count=1' # command from an earlier postiong
to determine the blocksize and use an
'mt -f /dev/nst0 bsr 1 '
to rewind the tape back to the start of the file before I read it with the block size determined.

what I get back is , e.g.
0+1 records in 0+1 records out 16384 bytes (16 kB) copied, 0.00407101 s, 4.0 MB/s
for command to test the block size, and
15+1623 records in 15+1623 records out 7061440 bytes (7.1 MB) copied, 7.29006 s, 969 kB/s
for the actual data read.

QUESTION 1:
what does the '15+1623 records' mean?
I could not find an explanation anywhere in the documentation that explains what the report means.


QUESTION 2:
I wrote a test tape with a tar block (bs=10240) followed by dd written blocks (bs=16384).

The algorithm for reading this tape is as follow:

1 check the block size for the next file on tape by reading one block and bs=64k (assuming that the real bs is smaller)
2 rewind one block
3 read the file with the found bs
repeat steps 1..3 until end of tape

This is what I get as a log:
9 Wed Apr 9 23:01:37 EDT 2014
10 Start of reading file 1
11 Checking the block size of the next file
12 STDERR=0+1 records in 0+1 records out 10240 bytes (10 kB) copied, 0.00444374 s, 2.3 MB/s
13 This file was written with a block size bs=10240
14 .. now winding back 1 records to get back to the beginning of the file
15 Wed Apr 9 23:01:37 EDT 2014: start of reading the 1th file on tape, bs=10240
16 STDERR=3+0 records in 3+0 records out 30720 bytes (31 kB) copied, 0.00779656 s, 3.9 MB/s
17 Reading and copying of the 1th file on tape completed
18 after a total of 30720 bytes (31 kB) in 0 record(s) and a speed of 3.9 MB/s
19 --------------------------------------------------------------------------------
20 Wed Apr 9 23:01:37 EDT 2014
21 Start of reading file 2
22 Checking the block size of the next file
23 STDERR=0+1 records in 0+1 records out 8192 bytes (8.2 kB) copied, 0.00525879 s, 1.6 MB/s
24 This file was written with a block size bs=8192
25 .. now winding back 1 records to get back to the beginning of the file
26 Wed Apr 9 23:01:37 EDT 2014: start of reading the 2th file on tape, bs=8192
27 STDERR=dd: reading `/dev/nst1': Cannot allocate memory 1+0 records in 1+0 records out 8192 bytes (8.2 kB) copied, 0.00714527 s, 1.1 MB/s
28 Reading and copying of the 2th file on tape completed
29 after a total of 8192 bytes (8.2 kB) in 0 record(s) and a speed of 1.1 MB/s
30 --------------------------------------------------------------------------------
31 Wed Apr 9 23:01:37 EDT 2014
32 Start of reading file 3
33 Checking the block size of the next file
34 STDERR=0+1 records in 0+1 records out 16384 bytes (16 kB) copied, 0.00392653 s, 4.2 MB/s
35 This file was written with a block size bs=16384
36 .. now winding back 1 records to get back to the beginning of the file
37 Wed Apr 9 23:01:37 EDT 2014: start of reading the 3th file on tape, bs=16384
38 STDERR=25+3529 records in 25+3529 records out 14950424 bytes (15 MB) copied, 16.4022 s, 911 kB/s
39 Reading and copying of the 3th file on tape completed
40 after a total of 14950424 bytes (15 MB) in 3529 record(s) and a speed of 911 kB/s
41 --------------------------------------------------------------------------------

in lines 9-18 I check the block size of the first file (tar) rewind one block and read it for real, giving me 3 blocks. THe resulting data file can be read by tar and produces the correct data. My assumption is that the tape is now at the start of file 2 and I expect a block size of 16k for file 2

Lines 20-29 I do the same for file 2, finding a block size of 8k and obviously reading 8k despite the error message. This is (supposed to be) the start of data written with 'dd -f /dev/nst1 bs=16k'

lines 31-40 read another 18M of data, this time in bs=16k blocks

data from lines 20-40 combined give me data that appear to belong together as the decryption command is correctly decrypting the file listing if a concatenate the 2 files.

What am I missing from the explanations in the previous posts?

Is there anywhere a good reference document on the web that explains how data on a tape is stored and how the end-of-file marker and block sizes partition the separate data files on a tape?

another long message - sorry.
-Steffen

ps: some of the messages in the log are obviously wrong, but since I don't know yet how to interpret the STDERR output of dd I was just guessing the format from a limited sampling of the output of read commands.
 
Old 04-10-2014, 08:01 AM   #10
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,776

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
The "15+1623 records" means 15 full blocks (16KiB in that case) plus 1623 partial blocks (something less than 16KiB).

The "Cannot allocate memory" occurs when reading a tape block that is larger than the size (8KiB for that case) for which the kernel has allocated a buffer.

What this all means is that the block size on the tape is not constant. I don't know enough about LTO tapes to answer your final question.

I suggest just reading/copying everything at a 16k block size. That seems to be the largest you encounter. dd defaults to doing block-for-block copies, so your tar files will be reproduced with their 10240 block size and the encrypted files will have their 8k/16k/whatever block sizes reproduced on the new tape. The output blocking does not matter if you are going to disk -- it's just a continuous stream of bytes there. If you are going tape-to-tape you should maintain the same block sizes.
 
1 members found this post helpful.
Old 04-10-2014, 10:15 AM   #11
spoonervt
LQ Newbie
 
Registered: May 2009
Posts: 11

Original Poster
Rep: Reputation: 0
Thanks for the explanation (I guessed so much, but thought I must be wrong as information does not help a lot if the partial blocks may have different sizes)
If I read the tapes that were written with the default dd block size bs=512 with a larger block size the tape performance drops - I assume because it needs to wind back and forth all the time.

I realized that I could have a tape performance boost by increasing the block size from the default 512 bytes to 16k (almost a factor 10 in speed) and trying to read a (test)-tape that I created this way is what I described in my previous post.

Going back to the task at hand I can use my script successfully on the original tapes (tar with 10240Byte blocks, and dd with 512Byte blocks). I started the tape-2-tape copy but have not compared/verified that the data are identical. I can't take advantage of the faster tape speed, but this is what I was using up to now anyway.

I assume that some speed bottleneck comes from the way how I collect data to be backed up from various hosts:

ssh <hostname> "tar <tar options> --file=-|openssl <openssl options>"|dd of=/dev/nst0

My assumption was that this provides a constant data stream fast through the ssh tunnel but maybe this creates the varying block size if the tape is faster than data being provided through the pipe.
After we upgrade to a more recent LTO6 system I can get rid of the encryption step as LTO6 does the encryption in hardware and I may have more luck with large blocks.


Thanks anyway,
Steffen
 
Old 04-10-2014, 10:42 AM   #12
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,776

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
What I thought you were doing now is just copying the encrypted data, not decrypting/encrypting it. In that pipeline, the ssl encryption could well be what is slowing you down, but that's just a guess. Back when I was using the slower DDS2 and DDS3 tapes for backup, I had to do some fancy pipelining with a 256KiB ring buffer and some nonblocking I/O in order to keep the devices streaming. Of course that was on a much older and slower machine (i486).

I don't think I have much more to offer here. Sorry.
 
Old 04-13-2014, 04:09 PM   #13
spoonervt
LQ Newbie
 
Registered: May 2009
Posts: 11

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by rknichols View Post
(...)

I don't think I have much more to offer here. Sorry.
Follow up: the script that I created based on your info in this thread did copy the entire script and I created a verification script verified that the copied tape does indeed have an identical copy.
Thanks again - you provided the information that I needed - Steffen
 
Old 07-07-2021, 02:25 AM   #14
unix-fan
LQ Newbie
 
Registered: Jul 2021
Posts: 12

Rep: Reputation: Disabled
Quote:
Originally Posted by rknichols View Post
What I thought you were doing now is just copying the encrypted data, not decrypting/encrypting it. In that pipeline, the ssl encryption could well be what is slowing you down, but that's just a guess. Back when I was using the slower DDS2 and DDS3 tapes for backup, I had to do some fancy pipelining with a 256KiB ring buffer and some nonblocking I/O in order to keep the devices streaming. Of course that was on a much older and slower machine (i486).

I don't think I have much more to offer here. Sorry.
Hello, I've a similar problem: got some old DDS_1-2 tapes that I would read from.
I wonder if you can state me some deepen documentation about either dd and tapes, and tape data protection.
The dd man page alone it's not enough clear for a newbie like me.
Many thanks.

Last edited by unix-fan; 07-07-2021 at 02:26 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
xfsrestore -t -- Why is it reading the whole tape? rperlberg Linux - General 3 04-20-2021 03:01 PM
[SOLVED] Serial port : Read data problem, not reading complete data anujmehta Linux - Networking 5 09-06-2010 06:10 AM
need help reading data files from DDS-4 SCSI tape empulse Linux - Software 1 05-29-2009 04:47 PM
reading back a tape Wimpie22 Linux - General 1 09-13-2003 07:15 PM
Reading from a tape backup jmarsh Linux - Networking 4 02-19-2003 03:33 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 06:53 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration