Help wit awk

mierdatuti · 08-04-2015, 08:43 AM

Hi,
I must do a script with awk by these way... I have a txt file with these format:

Code:

xxxddd
kk example
lsdjfs

sdkjlf
jfdsjlkf

kk example2 
fsdlkfj
----
sdfkjldsf

kk example3
djfjff

Well I would like to make tree files. Every file must have the content between two kk phrase like these:

file-> example

Code:

lsdjfs

sdkjlf
jfdsjlkf

file-> example2

Code:

fsdlkfj
----
sdfkjldsf

file-> example3

Code:

djfjff

I'm trying with awk parsing every line but cant works.

Code:

{
     if (keep == 0) {
          fnd=index($0,"kk");
             if (fnd) {
                #keep=1;
                if ($2) {
                        print $2;
                        var=$2;
                        print "creating file " $2>>$2
                        keep=1;
                        } 
              }else
      {
          keep=0
           }

     }
     if (keep == 1) {

          getline sig;
          print "next line ....." sig; 
          fnd=index(sig,"kk");
          if (fnd) {
            keep=0;
          }else
           print "-------------------variable is ...." sig
           print sig >> var
          } 
}

Some guru could help me please?
Thanks

HMW · 08-04-2015, 09:25 AM

Well, if you HAVE to do this with awk, I am sure one of the 'awk gurus' (I am not a member of that specific clan) can help you out. But! If it were me, I would check out csplit, which ought to be able to do what you want.

Can't try it out for you right now since I am <rant>stuck behind one of those wondrous machines designed in Cupertino that ships with a version of csplit that lacks half of the funcionality one would EXPECT!!!</rant>

Anyway, check out this thread:
http://www.linuxquestions.org/questi...es-4175546320/

What happens if you try this command:

Code:

csplit --suppress-matched infile.txt '/^kk/' {*}

Does it do the trick?

Best regards,
HMW

mierdatuti · 08-04-2015, 09:36 AM

Quote:

Originally Posted by HMW

Well, if you HAVE to do this with awk, I am sure one of the 'awk gurus' (I am not a member of that specific clan) can help you out. But! If it were me, I would check out csplit, which ought to be able to do what you want.

Can't try it out for you right now since I am <rant>stuck behind one of those wondrous machines designed in Cupertino that ships with a version of csplit that lacks half of the funcionality one would EXPECT!!!</rant>

Anyway, check out this thread:
http://www.linuxquestions.org/questi...es-4175546320/

What happens if you try this command:

Code:

csplit --suppress-matched infile.txt '/^kk/' {*}

Does it do the trick?

Best regards,
HMW

thanks, I would like to make these with AWK, meanwhile I'm trying your method:

Code:

csplit: unrecognized option '--suppress-matched'
Try `csplit --help' for more information.
[sp80439@oc6566503017 kk]$ csplit --help
Usage: csplit [OPTION]... FILE PATTERN...
Output pieces of FILE separated by PATTERN(s) to files `xx00', `xx01', ...,
and output byte counts of each piece to standard output.

Mandatory arguments to long options are mandatory for short options too.
  -b, --suffix-format=FORMAT  use sprintf FORMAT instead of %02d
  -f, --prefix=PREFIX        use PREFIX instead of `xx'
  -k, --keep-files           do not remove output files on errors
  -n, --digits=DIGITS        use specified number of digits instead of 2
  -s, --quiet, --silent      do not print counts of output file sizes
  -z, --elide-empty-files    remove empty output files
      --help     display this help and exit
      --version  output version information and exit

Read standard input if FILE is -.  Each PATTERN may be:

  INTEGER            copy up to but not including specified line number
  /REGEXP/[OFFSET]   copy up to but not including a matching line
  %REGEXP%[OFFSET]   skip to, but not including a matching line
  {INTEGER}          repeat the previous pattern specified number of times
  {*}                repeat the previous pattern as many times as possible

A line OFFSET is a required `+' or `-' followed by a positive integer.

grail · 08-04-2015, 09:47 AM

Code:

awk 'NR>1{print > file[2]}{split(RT,file)}' ORS="" RS="kk[^\n]*\n" file

HMW · 08-04-2015, 11:09 AM

Quote:

Originally Posted by mierdatuti

thanks, I would like to make these with AWK, meanwhile I'm trying your method:

Code:

csplit: unrecognized option '--suppress-matched'
Try `csplit --help' for more information.
[sp80439@oc6566503017 kk]$ csplit --help
Usage: csplit [OPTION]... FILE PATTERN...
Output pieces of FILE separated by PATTERN(s) to files `xx00', `xx01', ...,
and output byte counts of each piece to standard output.

Mandatory arguments to long options are mandatory for short options too.
  -b, --suffix-format=FORMAT  use sprintf FORMAT instead of %02d
  -f, --prefix=PREFIX        use PREFIX instead of `xx'
  -k, --keep-files           do not remove output files on errors
  -n, --digits=DIGITS        use specified number of digits instead of 2
  -s, --quiet, --silent      do not print counts of output file sizes
  -z, --elide-empty-files    remove empty output files
      --help     display this help and exit
      --version  output version information and exit

Read standard input if FILE is -.  Each PATTERN may be:

  INTEGER            copy up to but not including specified line number
  /REGEXP/[OFFSET]   copy up to but not including a matching line
  %REGEXP%[OFFSET]   skip to, but not including a matching line
  {INTEGER}          repeat the previous pattern specified number of times
  {*}                repeat the previous pattern as many times as possible

A line OFFSET is a required `+' or `-' followed by a positive integer.

Strange, it works for me. My version of csplit (using Debian 8.1):

Code:

$ csplit --version
csplit (GNU coreutils) 8.23
Copyright © 2014 Free Software Foundation, Inc.

But, my approach with csplit:

Code:

csplit --suppress-matched lqcsplit.txt '/kk/' {*}

Produces four files:

Code:

$ cat xx00
xxxddd

Code:

$ cat xx01
lsdjfs

sdkjlf
jfdsjlkf

Code:

$ cat xx02
fsdlkfj
----
sdfkjldsf

Code:

$ cat xx03
djfjff

So it's close, but no cigar. Check out grail's awk instead.

Best regards,
HMW

syg00 · 08-04-2015, 07:29 PM

One of the hard things I found with learning awk (and perl) is to not try and code like you would for a "traditional" language.
awk offers a lot of facilities that help (and hide) with the mundane work that has to be done. You would do well to examine grails post - I imagine understanding it will be difficult without having the documentation handy.

Nice use of RT there grail.

astrogeek · 08-04-2015, 08:47 PM

Quote:

Originally Posted by grail

Code:

awk 'NR>1{print > file[2]}{split(RT,file)}' ORS="" RS="kk[^\n]*\n" file

I stand in awe...

Quote:

Originally Posted by syg00

One of the hard things I found with learning awk (and perl) is to not try and code like you would for a "traditional" language.
awk offers a lot of facilities that help (and hide) with the mundane work that has to be done. You would do well to examine grails post - I imagine understanding it will be difficult without having the documentation handy.

Nice use of RT there grail.

I have the O'Reilly sed & awk 2nd ed in hand and still had to break it down before understanding it. This is the first actual use of RT that I recall seeing and had to refer to it (page 266) twice before it sunk in! Very nice - I'll "steal" (i.e. learn from) this one - thanks!

Note: RT appears to be a gawk-ism so may not work on other awks.

syg00 · 08-04-2015, 09:44 PM

I have books on everything - except awk. I didn't see the need initially, as I thought it was something I wouldn't get to use much.
Wrong.
I make do with the manual - people like grail keep teaching me new things all the time.

astrogeek · 08-04-2015, 11:56 PM

Yea, I always buy the books...

For some reason awk has never stuck, in the sense that I know when to use it but I rarely see the intuitive path to a good awk solution quickly. As a result, I end up doing many things with sed and shell scripts that would be better done with awk. I then see an effective awk one-liner from someone like grail and cry...

I have recently been following and now participating in awk problem threads to try to remedy that.

AnanthaP · 08-05-2015, 01:42 AM

If line begins with kk, the destination file name is in $2 (second word). All other lines ($0) get redirected to this file.
So

Code:

BEGIN {
 redirFile=stdout ;
}
{
 if(substr($1,1,2)=="kk" redirFile=$2 ;
 else print $0>>redirFile
}

OK

astrogeek · 08-05-2015, 02:23 AM

Quote:

Originally Posted by AnanthaP

If line begins with kk, the destination file name is in $2 (second word). All other lines ($0) get redirected to this file.
So

Code:

BEGIN {
 redirFile=stdout ;
}
{
 if(substr($1,1,2)=="kk" redirFile=$2 ;
 else print $0>>redirFile
}

Starting premis looks OK but there are problems if you test it...

Not "all other lines get redirected", only those between "kk" lines. The leading lines cause a problem with the redirect.

You are missing a closing parenthesis on the if(... clause, probably typo but should have been tested.

The definition of redirFile=stdout is invalid in the redirect, so it fails on the leading lines before the opening "kk" line.

If you fix the parenthesis and avoid the NULL redirect it works. But if you run it twice it appends to existing files instead of writing them from input, requires removal of pre-existing output files as extra step and can be confusing!

Nice try, see if you can tweak it up from here!

grail · 08-05-2015, 05:16 AM

Quote:

Originally Posted by astrogeek

I have recently been following and now participating in awk problem threads to try to remedy that.

That and the online manual was how I got to know awk. I had never used it prior to seeing it in some posts by ghostdog when I first joined LQ

AnanthaP · 08-05-2015, 05:51 AM

Hi astrogooek

In post #11

Yes all other lines get redirected.
No starting parenthesis and hence no closing parenthesis in the if. (A single statement doesn't require ripple bracket parenthesis so long as it ends with a semi colon).
Double redirection not really required in awk. The first use zaps the destination file. (By habit I wrote >>).
Yet to test it out. Feel free to improve on it.

OK

grail · 08-05-2015, 07:37 AM

Actually point 2 is correct however it is the missing round bracket to your if that astrogeek was referring to, I believe

Also, once bracket is in, testing yields:

Code:

$ ./anathap.awk op_data
awk: ./anathap.awk:8: (FILENAME=op_data FNR=1) fatal: expression for `>>' redirection has null string value

danielbmartin · 08-05-2015, 09:33 AM

Quote:

Originally Posted by grail

Code:

awk 'NR>1{print > file[2]}{split(RT,file)}' ORS="" RS="kk[^\n]*\n" file

I'm in the ditch.

I used this ...

Code:

   Path=${0%%.*}
 InFile=$Path"inp.txt"

echo; echo; echo "Method of LQ Guru grail."
# awk               'NR>1{print > file[2]}{split(RT,file)}' ORS="" RS="kk[^\n]*\n" file
awk -v file=$InFile 'NR>1{print > file[2]}{split(RT,file)}' ORS="" RS="kk[^\n]*\n" $InFile

... and got this result ...

Code:

Method of LQ Guru grail.
awk: NR>1{print > file[2]}{split(RT,file)}
awk:                    ^ use of non-array as array

Please advise.

Daniel B. Martin