Quote:
Originally Posted by csegau
Hi all,
i am finding it difficult to handle multiline pattern matching. problem is like this.
|
csegau, I understood that your files have similar structure to this:
Code:
# Name Age
1 Alice 25
2 Bob 26
3 Carol 25
4 ThePersonT26
hatHasAVer
yLongName
5 Dave 26
and you're having a problem for example searching all names that contain "Very". Am I correct?
If the first field is empty for all continuation lines, then this is quite easy to solve using awk. GNU awk versions 2.1.3 and later do have a facility that makes this much easier, but it's not too hard with any awk -- this is for any awk:
Code:
awk 'BEGIN { RS="[\t\n\v\f\r ]*[\r\n]+"
FS="\n"
OFS="\t"
col[1] = 1; len[1] = 2
col[2] = 3; len[2] = 10
col[3] = 13; len[3] = 2
cols = 3
row = 1
}
{ if (substr($0, col[1], len[1]) ~ /^[\t ]*$/)
for (i = 1; i <= cols; i++)
field[i] = field[i] substr($0, col[i], len[i])
else {
row = NR
for (i = 1; i <= cols; i++)
field[i] = substr($0, col[i], len[i])
}
NF = cols
for (i = 1; i <= cols; i++) $i = field[i]
}
# Now you can use $1 .. $cols (or field[1] to field[cols]).
# The starting row is in variable 'row'.
$2 ~ /Very/ { print $0 }
'
The second to last line checks if the second logical field contains Very, and if so, prints the entire record with tabs between each field (since OFS is a tab).
Another alternative is to reconstruct the data, using e.g. tabs \t or pipes | as the field separator:
Code:
awk '
BEGIN { RS="[\t\n\v\f\r ]*[\r\n]+"
FS="\n"
OFS="\t"
col[1] = 1; len[1] = 2
col[2] = 3; len[2] = 10
col[3] = 13; len[3] = 2
cols = 3
row = 0
}
{ if (substr($0, col[1], len[1]) ~ /^[\t ]*$/)
for (i = 1; i <= cols; i++)
field[i] = field[i] substr($0, col[i], len[i])
else {
if (row) {
printf("%s", field[1])
for (i = 2; i <= cols; i++)
printf("%s%s", OFS, field[i])
printf("\n")
}
row = NR
for (i = 1; i <= cols; i++)
field[i] = substr($0, col[i], len[i])
}
}
END { if (row) {
printf("%s", field[1])
for (i = 2; i <= cols; i++)
printf("%s%s", OFS, field[i])
printf("\n")
}
}'
Since the latter script will merge all split fields, you can use grep or sed on the output.
Hope this helps.