Get strings distributed along up to 3 lines
Hello to all in forum,
Please some help. I don't know if is a work for awk, sed, perl,etc. Having the following text, I want to extract 2 patterns and print related patterns in the same line: Code:
pattern1: bc[0-9]d the next pattern found is pattern2, then they are related and should be printed in the same line. If 2 consecutive patterns1 are found (in 2 or 3 lines or in the same line), it means that for the previous pattern1 there is no pattern2. Input: Code:
abc1defghi Code:
bc1d ijk3lmnopqrs reads line by line and as you can see, the patterns could begin in one line and ends in the next one. And even begin in one line and ends 2 lines below. The goal is know how to do it for this sample file and then, extend it for a big file. Thanks in advance for any help. |
If your text is not divided by newlines you could use grep:
Code:
grep -o -e 'bc[0-9]d' -e 'jk[0-9]lmnopqrs' file Code:
bc1d |
Hello konsolebox,
Thanks for answer. The file doesn't have blank lines, but it has newlines characters as any standard file. I'm trying in Cygwin but I only get this result. Code:
$ grep -o -e 'bc[0-9]d' -e 'jk[0-9]lmnopqrs' file |
You can use this C code to convert your file:
Code:
#include <unistd.h> Code:
./output_binary < file | grep -o -e 'bc[0-9]d' -e 'jk[0-9]lmnopqrs' |
Quote:
Code:
paste -s -d"\0" <$InFile2 \ Code:
bc1d jk3lmnopqrs |
Hello konsolbox and Daniel,
I'll try asap your code. The original input file is a dump from a binary file got with xxd command and produces a file of 4GB with 256 characters per line. Do you I could use the same code with this large file? Or there is a way to use the regex for the patterns to read directly from binary? Thanks for help again. |
Hello konsolebox and Daniel,
I have an issue to extract patterns when they are in the same line. If I want to extract the patterns c+number+some characters + k+ number + 7 chracters (in blue below): Code:
abc1defghijk3lyyuopqtstuvwxyzzabc4defghijklmnopuqrstuvwxxyzabc8defghijk5lmnopqrstuvwxyzwwww Code:
$ echo "abc1defghijk3lmnopqrstuvwxyzzabc4defghijklmnopuqrstuvwxxyzabc8defghijk5lmnopqrstuvwxyzwwww" | grep -o -e 'c[0-9].*k[0-9].\{7\}' Thanks in advance for your help. |
I would try to use the string bc as line separator (instead of newline)
next remove all the newlines finally print matching lines using regexp like ^[0-9]d.*jk[0-9]lmnopqrs you can use awk or perl to implement it |
Hello Pan64,
May you help me please in how to it in awk or perl. The thing is as explained in first post, I need 2 patterns. Pattern1 always happens And patter2 not always, but both could be in more than one or two lines With an input of 128 bytes per line (xxd used to dump). Thanks for any help |
@Perseus Have you tried my solution? So how was it? What was needed to change it?
|
Something like this:
\n? is there because newline can be found almost anywhere Code:
awk 'BEGIN { RS="b\n?c"; } # set record separator |
Quote:
Code:
awk -F "" 'BEGIN {RS="c"} |
Hello to all
Mamy thanks for the help and time to help. Sure I've tried the codes of all of you, but when I try to replicate in a real file with grep or awk, it seems the regex is not working for pattern-2. I want to extract these patterns: pattern-1: ff77 + 6 to 18 characters + 532064 + 10 characters + 814 + 13 characters pattern-2: 059 + 32 to 34 characters + some characters + 940e + 28 characters For pattern1 the regex I'm using is working, but for the pattern 2 is taken more characters that I want. Regex used for pattern-1: ff77.{6,18}532064.{10}814.{13} --> it works Regex for pattern-2: 059.{32,34}.*940e.\{28\} --> Is taken character belonging to more than one pattern2. Always, after the end of pattern-2 it follows 9506. The regex for pattern-2 I have now is taken all characters in red. Code:
93114444444c55535f529332939333303693303032353807ffffffffffffffff77000001532064022272619f81422060001fffff0015000a4800015a00074200 Code:
93114444444c55535f529332939333303693303032353807ffffffffffffffff77000001532064022272619f81422060001fffff0015000a4800015a00074200 |
yes, this is the greediness of the regexp I think. You need to set ff77 as record separator to avoid such problems.
|
Hi.
If you use grep or perl, you may use non-greedy regex `.*?', like this: Code:
$ tr -d '\n' <infile | grep -Po '059.{32,34}.*?940e.{28}' |
All times are GMT -5. The time now is 06:03 PM. |