LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-18-2012, 12:34 PM   #1
Linux_Kidd
Member
 
Registered: Jan 2006
Location: USA
Posts: 737

Rep: Reputation: 78
Another newb Awk Q


does regex "^" (begins with directive) still apply inside when ~ is being used, or is there a replacement for ~ that means "begins with"?

if ( NF == 0 || $0 ~ /^(xxx)|(yyy)|(zzz)|(aaa)|(^)/ ) {}
 
Old 04-18-2012, 12:54 PM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,
Quote:
Originally Posted by Linux_Kidd View Post
does regex "^" (begins with directive) still apply inside when ~ is being used, or is there a replacement for ~ that means "begins with"?
Yes, it does.
Quote:
if ( NF == 0 || $0 ~ /^(xxx)|(yyy)|(zzz)|(aaa)|(^)/ ) {}
I'm not sure about this command though. Have a look at this:
Code:
if ( NF == 0 || $0 ~ /^(xxx|yyy|zzz|aaa)/ ) { }
I removed the (^) part (why is it there?) and rewrote the way words are OR-ed (blue part).

Hope this helps.
 
Old 04-18-2012, 12:59 PM   #3
Linux_Kidd
Member
 
Registered: Jan 2006
Location: USA
Posts: 737

Original Poster
Rep: Reputation: 78
i'll try that.
i believe my regex was taking (^) as a literal, need to verify
 
Old 04-18-2012, 01:05 PM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,
Quote:
Originally Posted by Linux_Kidd View Post
i believe my regex was taking (^) as a literal, need to verify
If that is true then you need to escape it, otherwise it will be seen as at the beginning of the string and consequently all lines in the file are shown.

With looking for a literal ^ also included, the command becomes:
Code:
if ( NF == 0 || $0 ~ /^(xxx|yyy|zzz|aaa|\^)/ ) { }
Hope this helps.

Last edited by druuna; 04-18-2012 at 01:10 PM. Reason: litteral -> literal
 
Old 04-18-2012, 01:37 PM   #5
Linux_Kidd
Member
 
Registered: Jan 2006
Location: USA
Posts: 737

Original Poster
Rep: Reputation: 78
how to get remainder of fields into a variable?

for (i=7; i<=NF; i++) {
last = $last + $i }
print $1,$2,last


thnx

Last edited by Linux_Kidd; 04-18-2012 at 01:49 PM.
 
Old 04-18-2012, 02:01 PM   #6
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Have a look at this example:
Code:
for ( i=1 ; i<=NF ; i++ ) { printf("%s ", $i) } { print "no more fields" }
Example run:
Code:
$ cat infile
1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
$ awk '{ for ( i=1 ; i<=NF ; i++ ) { printf("%s ", $i) } { print "no more fields" }}' infile 
1 no more fields
1 2 no more fields
1 2 3 no more fields
1 2 3 4 no more fields
1 2 3 4 5 no more fields
Info about printf: Examples Using printf (gnu.org)

Hope this helps.
 
Old 04-18-2012, 02:14 PM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
If I am following the last example I assume you wish to concatenate all remaining fields into the variable last?
If so then simply place the FS between items, assuming you still wish to use it, as + is for arithmetic:
Code:
for (i=7; i<=NF; i++)
    last = last FS $i

print $1,$2,last
Also note that last is a string variable and not a field identifier so the $ is not needed.

If I am on the wrong path you can ignore me
 
Old 04-18-2012, 02:17 PM   #8
Linux_Kidd
Member
 
Registered: Jan 2006
Location: USA
Posts: 737

Original Poster
Rep: Reputation: 78
ok, my skills withing awk script of very rusty....

from within awk script, strip out the zero from field #1 when printing $1,$2,$3,$4,$5

0ABCD XXXX YYYY DDDD FFFF
 
Old 04-18-2012, 02:32 PM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
I am not sure I understand the difficulty?
Code:
awk 'sub(/0/,"",$1)' file
 
Old 04-18-2012, 02:50 PM   #10
Linux_Kidd
Member
 
Registered: Jan 2006
Location: USA
Posts: 737

Original Poster
Rep: Reputation: 78
i do something like this, was looking to make it a tad more elegant

Code:
#!/bin/bash
awk '
BEGIN {
OFS=",";
}
{
        if ( NF == 0 || $1 ~ /^(aaa|bbb|-|+|=|0$|1\/|\/\/|TPP|PASSWORD|DYNAMIC|1E)/ ) {}
        else {
        last = $6$7$8$9$10$11$12$13
        print $1,$2,$3,$4,$5,last;}
} ' | sed 's/^0\(.*\)/\1/'
i execute like this
./script.awk < infile > outfile

i also need to add regex for mm/dd/yy in the If statement, but having a hard time as \d{2}/\d{2}/\d{2} fails
should i use [:digit:]\{2\}\/[:digit:]\{2\}\/[:digit:]\{2\}

Last edited by Linux_Kidd; 04-18-2012 at 02:53 PM.
 
Old 04-18-2012, 03:08 PM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Well I'll answer the last part first, unless using version 4 or greater of gawk you must use --re-interval as a switch to use {}.

Obviously the sed is a complete waste as shown previously to remove a zero.

As the truth of the if does nothing I would suggest simply negating the current test.
Code:
if ( NF && $1 !~ /^(aaa|bbb|-|+|=|0$|1\/|\/\/|TPP|PASSWORD|DYNAMIC|1E)/ )
And as for the concatenation, is the fifth field unique or the first of its kind? if so, you could do something like:
Code:
#if first of its kind
last = gensub(".*"$5,"","1"); gsub(/ /,"",last)

#if unique you can do the above all together
last = gensub(".*"$5"| ","","g")
 
Old 04-18-2012, 03:59 PM   #12
Linux_Kidd
Member
 
Registered: Jan 2006
Location: USA
Posts: 737

Original Poster
Rep: Reputation: 78
Quote:
Originally Posted by grail View Post
If I am following the last example I assume you wish to concatenate all remaining fields into the variable last?
If so then simply place the FS between items, assuming you still wish to use it, as + is for arithmetic:
Code:
for (i=7; i<=NF; i++)
    last = last FS $i

print $1,$2,last
Also note that last is a string variable and not a field identifier so the $ is not needed.

If I am on the wrong path you can ignore me
all helps a lot. i dont need FS in between, i need a space.
 
Old 04-19-2012, 12:29 AM   #13
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,
Quote:
Originally Posted by Linux_Kidd View Post
all helps a lot.
You're welcome.

Quote:
i dont need FS in between, i need a space.
The default FS is a space. Using FS instead of a real space make the script more flexible. If you ever need to change the FS to something other then a space, you only need to change it once and do not have to search the complete script for relevant spaces that need changing.

When you create a small awk script (or one-liner) this isn't too relevant, but it is good practise nonetheless.

Hope this clears things up.
 
Old 04-19-2012, 04:54 AM   #14
Linux_Kidd
Member
 
Registered: Jan 2006
Location: USA
Posts: 737

Original Poster
Rep: Reputation: 78
Quote:
Originally Posted by druuna View Post
Hi,
You're welcome.

The default FS is a space. Using FS instead of a real space make the script more flexible. If you ever need to change the FS to something other then a space, you only need to change it once and do not have to search the complete script for relevant spaces that need changing.

When you create a small awk script (or one-liner) this isn't too relevant, but it is good practise nonetheless.

Hope this clears things up.
my bad, i thought i defined FS, but it was OFS. i will try with FS in there and test. too many things going on right now.

this small script processes a bunch of txt files, spits it out pipe delimited for Excel. script took 3sec to process the files. the files were previously processed manually by a person, that person estimated he spent approx 40 man hrs to process the files !!!
 
Old 04-19-2012, 06:50 AM   #15
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
You don't happen to work in a bank by chance? I had a very similar experience where a process could only be run over the weekend as it took almost 40hrs to complete a single run. After I had been
there 3 months we were able to run it adhoc whenever we liked as it took about 3 minutes
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
awk error awk: line 2: missing } near end of file boscop Linux - Networking 2 04-08-2012 10:49 AM
[SOLVED] call awk from bash script behaves differently to awk from CLI = missing newlines titanium_geek Programming 4 05-26-2011 09:06 PM
shell command using awk fields inside awk one71 Programming 6 06-26-2008 04:11 PM
Suse Newb: Not Linux Newb rodericj SUSE / openSUSE 9 03-25-2005 10:03 AM
The first step to ascending newb status, acknowledging you're a newb :P LordRaven LinuxQuestions.org Member Intro 1 08-24-2004 05:05 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:35 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration