LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-16-2013, 11:33 AM   #16
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660

Quote:
Originally Posted by grail View Post
Code:
awk '{print > "file"++i}' RS="" infile
Remarkably concise, but I don't understand how it works. Please walk us through it.

Daniel B. Martin
 
Old 03-16-2013, 12:53 PM   #17
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
RS="" - Set record separator to an empty line

print > "file"++i - print the current record (ie all up to the empty line) into a file called "fileN", where N is 1, 2, 3, etc
 
1 members found this post helpful.
Old 03-16-2013, 01:47 PM   #18
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by grail View Post
RS="" - Set record separator to an empty line

print > "file"++i - print the current record (ie all up to the empty line) into a file called "fileN", where N is 1, 2, 3, etc
Thank you for this explanation. I now understand a distinction between record and line.

Now, a nitpick. Empty line could mean a null line, or it could mean a line containing only white space. When displayed on the screen both look alike. Your solution is short and sweet (I admire that) but it depends on empty line = null line.

Daniel B. Martin
 
Old 03-17-2013, 10:51 AM   #19
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Quote:
Now, a nitpick. Empty line could mean a null line, or it could mean a line containing only white space. When displayed on the screen both look alike. Your solution is short and sweet (I admire that) but it depends on empty line = null line.
And I am sure by now you could easily convert this to allow for whitespace
 
Old 03-17-2013, 08:31 PM   #20
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
This is an interesting problem and, as a learning experience, I improved on previous solutions.

Instead of sequence numbers I used the first line in each "paragraph" as part of the output file names.

This InFile ...
Code:
able
choice1-1
choice1-2
choice1-3

baker
choice2-1
choice2-2
choice2-3
choice2-4
choice2-5

charlie
choice3-1
choice3-2

dog
choice4-1
choice4-2
choice4-3
... produces these four OutFiles ...
dbm686out.able
Code:
able
choice1-1
choice1-2
choice1-3
dbm686out.baker
Code:
baker
choice2-1
choice2-2
choice2-3
choice2-4
choice2-5
dbm686out.charlie
Code:
charlie
choice3-1
choice3-2
dbm686out.dog
Code:
dog
choice4-1
choice4-2
choice4-3
This code (using bash) does the job...
Code:
# File identification
   Path=$(readlink -f $0 | cut -d'.' -f1)
 InFile=$Path"inp.txt"
 
# In this version each output file includes the first line of each paragraph.
o=$(readlink -f $0 | cut -d'.' -f1)"out"  #o = output file names
rm  $o'.'*  # Blow away any leftover output files.
ofid=""  # Initialize ofid, Output File IDentfier.
while read line
  do
    if [[ "$ofid" == "" ]]
      then ofid=$line
    fi
    if [[ "$line" == "" ]]
      then ofid=""
      else echo $line >> $o'.'$ofid
    fi
  done < $InFile
... and this code (based grail's superb awk one-liner) is more concise...
Code:
# File identification
   Path=$(readlink -f $0 | cut -d'.' -f1)
 InFile=$Path"inp.txt"

# In this version each output file includes the first line of each paragraph.
o=$(readlink -f $0 | cut -d'.' -f1)"out"  #o = output file names
awk -v o=$o '{print > o"."$1}' RS="" $InFile
Suggestions and corrections are gratefully accepted.

Daniel B. Martin
 
Old 03-17-2013, 08:37 PM   #21
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
This is an interesting problem and, as a learning experience, I improved on previous solutions.

Instead of sequence numbers I used the first line in each "paragraph" as part of the output file names.

This InFile ...
Code:
able
choice1-1
choice1-2
choice1-3

baker
choice2-1
choice2-2
choice2-3
choice2-4
choice2-5

charlie
choice3-1
choice3-2

dog
choice4-1
choice4-2
choice4-3
... produces these four OutFiles ...
dbm690out.able
Code:
choice1-1
choice1-2
choice1-3
dbm690out.baker
Code:
choice2-1
choice2-2
choice2-3
choice2-4
choice2-5
dbm690out.charlie
Code:
choice3-1
choice3-2
dbm690out.dog
Code:
choice4-1
choice4-2
choice4-3
This code (using bash) does the job...
Code:
# File identification
   Path=$(readlink -f $0 | cut -d'.' -f1)
 InFile=$Path"inp.txt"

# In this version each output file excludes the first line of each paragraph.
o=$(readlink -f $0 | cut -d'.' -f1)"out"  #o = output file names
rm $o'.'*  # Blow away any leftover output files.
ofid=""  # Initialize ofid, Output File IDentfier.
while read line
  do
    if [[ "$ofid" == "" ]];
      then ofid=$line;
    fi
    if [[ "$line" == "" ]];
      then ofid="";
    fi
    if [[ "$ofid" != "$line" ]];
      then echo $line >> $o'.'$ofid
    fi
  done < $InFile
... and this code (based grail's superb awk one-liner) is more concise...
Code:
# File identification
   Path=$(readlink -f $0 | cut -d'.' -f1)
 InFile=$Path"inp.txt"

# In this version each output file excludes the first line of each paragraph.
o=$(readlink -f $0 | cut -d'.' -f1)"out"  #o = output file names
awk -v o=$o '{t=$1;$1="";sub(/^ /,"");gsub(" ","\n")} {print > o"."t}' RS="" $InFile
Suggestions and corrections are gratefully accepted.

Daniel B. Martin

Last edited by danielbmartin; 03-18-2013 at 10:02 AM. Reason: Fix bug identified by grail (post #22)
 
Old 03-18-2013, 08:41 AM   #22
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Might want to check the output files that are using the second awk solution. I think you will find that your data is not line for line, but now on a single line.

Example:

Instead of dbm690out.able being:
Code:
choice1-1
choice1-2
choice1-3
I believe it will look like:
Code:
choice1-1 choice1-2 choice1-3
 
1 members found this post helpful.
Old 03-18-2013, 10:05 AM   #23
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by grail View Post
Might want to check the output files that are using the second awk solution. I think you will find that your data is not line for line, but now on a single line.
Recognition of a bug is the first step toward fixing the bug. The man who points out a flaw in my code is helping me. Thank you, grail.

I edited post #21 to show corrected code. It works but is unlovely. Is there a cleaner way?

Daniel B. Martin
 
Old 03-18-2013, 01:19 PM   #24
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
How about:
Code:
awk -vo=$o '{t=$1;$1="";sub(/^\n/,"");print > o "." t}' RS="" OFS="\n" file
And just as a quickie, a ruby alternative:
Code:
ruby -ane 'BEGIN{$/=""};IO.write("name."+ $F[0],$F[1..-1]*"\n")' file
 
1 members found this post helpful.
Old 03-18-2013, 09:42 PM   #25
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Now, let's make the problem more challenging by permitting multi-word "choice" lines.

With this InFile ...
Code:
able
how now
brown cow

baker
now is the time
for all good men
to come to the aid
of their party

charlie
the quick brown fox
jumps over
the lazy programmer

dog
words to live by:
let sleeping dogs lie
...this bash code ...
Code:
# File identification
   Path=$(readlink -f $0 | cut -d'.' -f1)
 InFile=$Path"inp.txt"

# In this version each output file excludes the first line of each paragraph.
o=$(readlink -f $0 | cut -d'.' -f1)"out"  #o = output file names
rm $o'.'*  # Blow away any leftover output files.
ofid=""  # Initialize ofid, Output File IDentfier.
while read line
  do
    if [[ "$ofid" == "" ]];
      then ofid=$line;
    fi
    if [[ "$line" == "" ]];
      then ofid="";
    fi
    if [[ "$ofid" != "$line" ]];
      then echo $line >> $o'.'$ofid
    fi
  done < $InFile

# For debugging...
for file in $o*; do echo; echo $file "..."; cat $file; done
... produces this result ...
Code:
/home/daniel/Desktop/LQfiles/dbm690out.able ...
how now
brown cow

/home/daniel/Desktop/LQfiles/dbm690out.baker ...
now is the time
for all good men
to come to the aid
of their party

/home/daniel/Desktop/LQfiles/dbm690out.charlie ...
the quick brown fox
jumps over
the lazy programmer

/home/daniel/Desktop/LQfiles/dbm690out.dog ...
words to live by:
let sleeping dogs lie
... but I'm unable to code an equivalent in awk. Anyone care to take a shot at it?

Daniel B. Martin
 
Old 03-19-2013, 12:43 AM   #26
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
My hint will be, have a look at the input field separator (FS)
 
  


Reply

Tags
bash scripting, sed, shell script, shell scripting



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sed: global search and replace if a string isn't anywhere in that line linux_kung_fu Linux - General 5 03-09-2012 10:53 AM
sed: remove newline except when it's a blank line muzzol Linux - Newbie 6 02-12-2012 01:52 PM
sed multi-line search/replace woes djmm Programming 8 03-17-2009 05:25 AM
Putting blank line after the search pattern. dina3e Programming 2 09-21-2008 07:38 AM
grab the line below a blank line and the line above the next blank line awk or perl? Pantomime Linux - General 7 06-26-2008 08:13 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:17 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration