How about
Code:
awk -v c=20 -v n=9 'BEGIN { RS = "(\n|\r|\r\n|\n\r)"; FS = "[\n\r]"; RT = "\n";
sp = " ";
while (length(sp) < n) sp = sp sp;
sp = substr(sp, 1, n);
}
{ s = $0;
i = match(substr(s, c, n), /[A-Za-z]/);
if (i > 0) s = substr(s, 1, c+i-2) substr(sp, 1, n-i+1) substr(s, c+n);
printf("%s%s", s, RT);
}' input-file > output-file
I added the semicolons, so you can cram the entire thing on one single line if you want.
On the first line,
c defines the first column in the desired field (first column being column 1), and
n is the number of characters in the column. If your file contains non-ASCII characters, you need to use a matching locale: define LANG and LC_ALL environment variables accordingly. At least GNU awk will then calculate characters and not bytes.
The BEGIN rule sets the record separator to any newline convention. It will set the field separator to a newline character, so awk will not split the records into fields. GNU awk (gawk) will set RT to the string that matched the record separator for each record; the snippet uses it to retain whatever newlines you use. Since other awk variants do not provide RT, it sets it to UNIX newline, so that they'll work too, just use \n newlines in the output.
The main logic is in the default rule.
s is set to the complete record. This is an optimization; if we modified $0 directly, awk would every time see if it needs to be resplit, wasting CPU time.
i will contain the index of the first letter within the field, or 0 if the field does not contain letters.
If your input may contain non-ASCII letters, you might wish to use a different pattern, for example
/[^0-9]/ to look for any non-digit. Or
/[^-+0-9 ]/ to accept digits, space, plus + and minus -, but nothing else. In principle, it is always better to check if the string contains only acceptable characters, rather than to check for unacceptable characters. You can always miss some, after all.
If the field contains a letter, then the entire record is reconstructed. The first substr() retains everything before the current field, and the current field before the match. The second substr() adds the proper number of spaces, and the third retains everything after the field.
Given this input,
Code:
233450212 20111230 90354332 101010 2A1
233450213 20111230 90354B32 101011 2A2
233450214 20111231 9035433A 101012 2A3
233450215 20111231 90354331 101013 2A4
the command above will yield
Code:
233450212 20111230 90354332 101010 2A1
233450213 20111230 90354 101011 2A2
233450214 20111231 9035433 101012 2A3
233450215 20111231 90354331 101013 2A4
The command does not rely on spaces or field separators, only on
c and
n .
If you use GNU awk (
gawk), you'll retain the newline convention. Any newline convention is accepted in the input by all awk variants, but other awk variants will convert the newlines to UNIX newlines ("\n") in the output.
Hope this helps,