LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-12-2011, 05:43 AM   #1
csegau
LQ Newbie
 
Registered: Oct 2009
Posts: 27

Rep: Reputation: 0
multiline pattern matching


Hi all,

i am finding it difficult to handle multiline pattern matching. problem is like this.

I have a formatted text file, in which each column has width of 'X'number of character. So when a text in column exceeds 'X' number of character, then remaining character are placed in next line. same happens with other column of text file too.

now, i have to do search for a string which is of length > 'X' number of character.

now problem is my search string is in single line and text file contains this search string in multiple line because of column width.

how to search it?

Thanks in advance
 
Old 05-12-2011, 06:58 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
I know what you have written is probably clear to you, but I am way lost. How about you show an actual example of the input and the desired output?

Also, what have you tried?
 
Old 05-13-2011, 02:06 PM   #3
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by csegau View Post
Hi all,

i am finding it difficult to handle multiline pattern matching. problem is like this.
csegau, I understood that your files have similar structure to this:
Code:
# Name      Age
1 Alice     25
2 Bob       26
3 Carol     25
4 ThePersonT26
  hatHasAVer
  yLongName
5 Dave      26
and you're having a problem for example searching all names that contain "Very". Am I correct?

If the first field is empty for all continuation lines, then this is quite easy to solve using awk. GNU awk versions 2.1.3 and later do have a facility that makes this much easier, but it's not too hard with any awk -- this is for any awk:
Code:
awk 'BEGIN    { RS="[\t\n\v\f\r ]*[\r\n]+"
                FS="\n"
                OFS="\t"
                col[1] = 1;  len[1] = 2
                col[2] = 3;  len[2] = 10
                col[3] = 13; len[3] = 2
                cols   = 3
                row    = 1
              }
              { if (substr($0, col[1], len[1]) ~ /^[\t ]*$/)
                    for (i = 1; i <= cols; i++)
                        field[i] = field[i] substr($0, col[i], len[i])
                else {
                    row = NR
                    for (i = 1; i <= cols; i++)
                        field[i] = substr($0, col[i], len[i])
                }

                NF = cols
                for (i = 1; i <= cols; i++) $i = field[i]
              }

     # Now you can use $1 .. $cols (or field[1] to field[cols]).
     # The starting row is in variable 'row'.

     $2 ~ /Very/ { print $0 }
    '
The second to last line checks if the second logical field contains Very, and if so, prints the entire record with tabs between each field (since OFS is a tab).

Another alternative is to reconstruct the data, using e.g. tabs \t or pipes | as the field separator:
Code:
awk '
BEGIN { RS="[\t\n\v\f\r ]*[\r\n]+"
        FS="\n"
        OFS="\t"
        col[1] = 1;  len[1] = 2
        col[2] = 3;  len[2] = 10
        col[3] = 13; len[3] = 2
        cols   = 3
        row    = 0
      }
      { if (substr($0, col[1], len[1]) ~ /^[\t ]*$/)
            for (i = 1; i <= cols; i++)
                field[i] = field[i] substr($0, col[i], len[i])
        else {
            if (row) {
                printf("%s", field[1])
                for (i = 2; i <= cols; i++)
                    printf("%s%s", OFS, field[i])
                printf("\n")
            }
            row = NR
            for (i = 1; i <= cols; i++)
                field[i] = substr($0, col[i], len[i])
        }
      }
END   { if (row) {
            printf("%s", field[1])
            for (i = 2; i <= cols; i++)
                printf("%s%s", OFS, field[i])
             printf("\n")
        }
      }'
Since the latter script will merge all split fields, you can use grep or sed on the output.

Hope this helps.

Last edited by Nominal Animal; 05-13-2011 at 02:07 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
pattern matching vinaytp Linux - Newbie 2 10-10-2009 06:06 AM
SED multiline pattern matching AutoC Programming 1 07-18-2009 12:04 AM
Pattern Matching Aveltium Linux - Newbie 5 04-12-2009 11:14 PM
vim or sed multiline regexp matching eentonig Programming 1 09-08-2008 09:06 AM
multiline pattern search in a file mimi Linux - General 1 09-01-2002 12:22 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:07 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration