Sed one-liner to drop data from beginning of file?
Hello, I'm trying to drop all data from the beginning of a file up to the first occurence of a specific opening xml tag. I need this operation to run as fast as possible since it will be used on huge files (several GB) that, for the most part, don't have any newlines in them.
This is the best I can come up with sofar and it doesn't quite work... sed '1,/<foo / s/^.*<foo /<foo /' when I run it on this file: --- asdga sdf asdf asf a garbage garbage</foo><foo xmlns=...</foo><foo ...></foo> --- I get --- asdga sdf asdf asf a <foo ...></foo> --- So it doesn't remove the garbage in the beginning, and also removes too many 'foo' tags because of the greedy pattern match. How can I get sed to match *everything* up to a token, not just line-by-line? Or alternatively, is there some other command I can use that would still run as fast? Thanks. |
With the limited amount of input data, the following seems to work:
Code:
sed -n -e '/<foo /,/<foo / s/^.*<foo /<foo /' -e '/<foo /,$ p' foo Forrest |
Csplit may be faster, however. I haven't checked.
|
Thanks forrestt, but that throws away every foo except one. I need to preserve all the foo tags I can (basically I just have to drop some malformed xml from the beginning of the file, and then keep processing from the first opening tag).
Thanks for the help though. |
Moin,
I don't know, if my "solution" ;-) fits your needs - it's a little bit strange. You don't have a ungreedy qualifier without using Perl compatible regex (that's why you should check if can do the job in Perl). I went another way: Code:
sed -r '1,/<foo /{s/$/|/;s/(<foo )/|\1/;s/[^|]*\|//;/^\|*$/d;s/\|//}' foo.xml Code:
jan@jack:~/tmp> sed -r '1,/<foo /{ # start a code block Jan |
Code:
awk '/<\/foo>/{ |
Moin,
Quote:
Code:
awk ' BEGIN { found = 0; } |
All times are GMT -5. The time now is 03:15 AM. |