LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 03-06-2014, 11:40 AM   #1
newmanium2001
LQ Newbie
 
Registered: Apr 2009
Posts: 14

Rep: Reputation: 1
Question Shell Scripting: gawk Record Separator and Regex


Hi,

I've been going crazy trying to figure out what I'm doing wrong with gawk here. Would appreciate any feedback I could get on this.

Given an input file like this:

Code:
*******************************************
*******************************************
foo bar
something something
*.somewhere.com
foo again

*******************************************
*******************************************

foo bar
*.somewhereelse.com
some more stuff

*******************************************
*******************************************
I want to read it in with gawk and use the two lines of asterisks as the record separator (RS). Here's what I've got so far, but it only creates one record in awk every time I try it:

Code:
gawk -W dump-variables=awk.out 'BEGIN {RS="^\*+"}  {
        print $0
        print NR
        print "\n"

}'
Output:
Code:
*******************************************
*******************************************
foo bar
something something
*.somewhere.com
foo again

*******************************************
*******************************************

foo bar
*.somewhereelse.com
some more stuff

*******************************************
*******************************************

1
What am I doing wrong with the RS regular expression?
 
Old 03-06-2014, 11:57 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
If you are going to use both lines of asterisk as the separator then you need to tell awk that. Also, I think your anchor is not helping

Something like this should work:
Code:
awk '{print "|"$0"|"}END{print NR}' RS='[*]+\n[*]+\n' file
I often find escaping something like an asterisk can also cause problems, so I prefer '[*]' instead.
 
1 members found this post helpful.
Old 03-06-2014, 03:20 PM   #3
newmanium2001
LQ Newbie
 
Registered: Apr 2009
Posts: 14

Original Poster
Rep: Reputation: 1
That did it. Thanks! I was trying it simple by doing just one line of asterisks at a time, but yours works for both lines. And yeah, I'd agree that something funky is going on in trying to escape the asterisk in my example. Bracketing works much better!
 
  


Reply

Tags
bash, gawk



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
last occurrence of duplicate record, gawk lqd9o Linux - Newbie 5 10-07-2012 11:48 AM
gawk scripting question Quest ion Programming 26 12-12-2011 07:58 PM
[SOLVED] differences between shell regex and php regex and perl regex and javascript and mysql golden_boy615 Linux - General 2 04-19-2011 01:10 AM
CSV | GAWK | Record merge problem! lmedland Programming 4 07-30-2008 08:10 AM
awk record separator question johnpaulodonnell Linux - Newbie 2 07-30-2007 09:35 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 05:02 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration