I'm not sure what you mean, but does
Code:
awk '(NR==1 && $0 ~ /\.doc /) { p=1; s=$0 ; next } { print } END { if (p) print s }' input-file > output-file
do what you mean? Or is the second line in the input line blank, and should be skipped? If the above works for you, you should be able to write the necessary loop around it; just remember to write the changes to a temporary file first. (In general, you should not try to read from and write to the same file at the same time; it will usually fail.)
Actually, you could do this in plain Bash (and cat):
Code:
#!/bin/bash
# Create a safe directory for temporary files.
WORK="$(mktemp -d)" || exit $?
# When the script exits, for whatever reason,
# remove the temporary directory, and close descriptor 4.
trap "rm -rf '$WORK'; exec 4<&-" EXIT
# Loop over each file supplied on the command line.
while [ $# -gt 0 ]; do
file="$1"
shift 1
# Redirect descriptor 4 from the current file.
exec 4<"$file" || exit $?
# Read the first line from descriptor 4.
read -u 4 -r first
# Does the first line NOT contain (glob match) ".doc "?
if [ "$first" = "${first/.doc /}" ]; then
echo "$file: No match on first line. Skipped." >&2
exec 4<&-
continue
fi
# Copy the rest of the file into temporary file,
cat <&4 >"$WORK/tempfile"
# and append the first line to it.
echo "$first" >> "$WORK/tempfile"
# Close descriptor 4.
exec 4<&-
# Copy the contents of the temporary file back to the file.
cat "$WORK/tempfile" >"$file" || exit $?
echo "$file: Edited successfully." >&2
done
The above scriptlet uses the Bash
read built-in to read the first line of the file. The -r option means that backslashes in the file are treated as ordinary characters, not as an escaping character (for line continuations and such). File descriptor 4 is used for accessing the file (as an input stream). Descriptors 0, 1 and 2 are used for standard input, output and error, and descriptor 3 is often used to keep a handle on the working directory. Thus, 4 is usually the first free descriptor. Using a descriptors allows us to use multiple commands to step through the file contents, instead of reading from the beginning every time.
The match is a glob match, meaning that you can use
* and
? the same way as you can for file name matches. The comparison logic is simple: If the original string matches one where the first occurrence of ".doc " is removed, the body of the if clause is executed. (It just tests if ".doc " cannot be removed from the string. If true, the string does not contain ".doc ".)
Since the input file was read through descriptor 4,
cat <&4 can be used to read
the rest of the file. If you used the file name here, it would read the entire file, including the first line. The output is saved to the temporary file. Finally, the first line is appended to the end of the temporary file.
As usual, a temporary file is the safe way to handle file modifications. In this case, we
cat the modified contents back to the original file. It only replaces the content (and modification timestamp), access rights et cetera will be kept unchanged.
When the temporary file and the target file are on different filesystems, this is just as reliable as
cp -f . If the file is very long, or your filesystem is very slow, then you might interrupt the operation mid-copy, leaving you with just partial contents in the file.
If that is too dangerous, if you want to be really, REALLY sure you will be never left with broken data, you can change the script a little bit:
Code:
#!/bin/bash
# Loop over each file supplied on the command line.
while [ $# -gt 0 ]; do
file="$1"
shift 1
# Make sure descriptor 4 is closed when this script exits.
trap "exec 4<&-" EXIT
# Redirect descriptor 4 from the current file.
exec 4<"$file" || exit $?
# Read the first line from descriptor 4.
read -u 4 -r first
# Does the first line NOT contain (glob match) ".doc "?
if [ "$first" = "${first/.doc /}" ]; then
echo "$file: No match on first line. Skipped." >&2
exec 4<&-
trap "" EXIT
continue
fi
# Create a temporary directory in the target directory.
temp="$file.$$-$RANDOM$RANDOM"
while ! mkdir -m 0700 "$temp" &>/dev/null ; do
temp="$file.$$-$RANDOM$RANDOM"
done
# If the script exits, remove the temporary directory.
trap "rm -rf '$temp' ; exec 4<&-" EXIT
# Copy the rest of the file into temporary file,
tempfile="$temp/$RANDOM$RANDOM"
cat <&4 >"$tempfile" || exit $?
# and append the first line to it.
echo "$first" >>"$tempfile" || exit $?
# Close descriptor 4.
exec 4<&-
# (Try to) copy owner and mode from original file.
chown --reference="$file" "$tempfile" &>/dev/null
chmod --reference="$file" "$tempfile" &>/dev/null
# Replace the original file using a hardlink.
ln -f "$file" "$tempfile" || exit $?
echo "$file: Edited successfully." >&2
# Remove the temporary directory, and cancel exit trap.
rm -rf "$temp"
trap "" EXIT
done
The difference here is that the temporary directory is created in the same directory as the target file is, using a random name. (I use random names to make certain bait-and-switch link attacks very difficult.. just consider it a bit paranoid.) This should make sure that the temporary directory is on the same filesystem as the target file is.
As you can see, this time the replacement is done using the
ln command. It creates a hardlink. (With the -f flag, it will replace an existing file with a hardlink if necessary.) If you are unfamiliar with hardlinks, they're the "name" part of a file; any number of hardlinks can point to the same file content and metadata (ownership, access mode, timestamps), and they're all equal "names" for the file. (There is no "original" name, either; just the initial hardlink. When the last hardlink is removed to a file, and it's no longer open by any process, the file is removed from storage.)
This means that if no other process is editing the file at the same time, and your storage device does not break, you will always get either the old or new contents, never anything in between.
For security enthusiasts that have read thus far: If you were to take a write lease on the original file, you could postpone other processes from reading or writing it. (Unfortunately Bash does not have a built-in for this -- it's too Linux-specific.) Any postponed reader/writer will then still see the old file contents, unless you overwrite the contents first. If you also take a write lease on the temporary file, you can do the editing safely and securely, knowing that nobody else has opened either one without your knowledge. The maximum number of seconds a lease can postpone access is defined in
/proc/sys/fs/lease-break-time, and it's 45 seconds by default; certainly enough for most purposes. Because closing the file descriptor will release the lease, you'll need to use
fdatasync() or
fsync() explicitly to make sure the contents hit the disk. After that, you can do the hard link switcharoo, and only after that close the file descriptors. It might be interesting to write the skeleton for such an utility, but it would be Linux-only.