ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
$ cat infile
One or more
lines of text immediately
followed by.
The author name on one line.
More
lines.
firstfire
$ sed -nr 'H; x; $ba; /\n\n/{:a; s/^\n|\n\n//g; s/\n/ /g; s/([^.]+\.) +(.*)/"\1", "\2"/; p; b}; x;' infile
"One or more lines of text immediately followed by.", "The author name on one line."
"More lines.", "firstfire"
This is hardly a one-liner though..
Last edited by firstfire; 09-12-2012 at 11:17 AM.
Reason: Fixed file name
# a one liner should be less than 80 characters
# 84 chars puts it a bit over
awk -F'\n' -vRS='\n\n\n' '{author=$NF; NF--; printf("\"%s\",\"%s\"\n", $0, author)}'
# we can squeeze it down a bit: 62 chars
awk -F\\n -vRS='\n\n\n' -vQ=\" '{a=$NF;NF--;$0=Q$0Q","Q a Q}1'
# 60 chars, requires gawk version 4+
awk -F\\n -vRS=\\n{3} -vQ=\" '{a=$NF;NF--;$0=Q$0Q","Q a Q}1'
# But if the last "famous quote" isn't followed by 2 line breaks you need
# -vRS='\n(\n\n|$)'
# I also thought a=$(NF--); should be equivalent to a=$NF;NF--;
# but this doesn't work for some reason...
As a learning exercise I like to implement and test solutions posted by respondents to interesting problems such as this one. My test program (shown below) uses four solutions -- my own, and those already posted by firstfire, colucix, and ntubski. I constructed a test file of real-world quotations.
It is perplexing to find that the output files from the four solutions differ to some degree. Perhaps I have misunderstood the problem; perhaps there is ambiguity in the OP's problem statement.
Input file ...
Code:
Politics is the art of looking for trouble, finding it everywhere,
diagnosing it incorrectly, and applying the wrong remedies.
Groucho Marx
Too bad all the people who know how to run the country are
busy driving cabs and cutting hair.
George Burns
My husband and I are either going to buy a dog or have a child.
We can't decide whether to ruin our carpet or ruin our lives.
Rita Rudner
Men occasionally stumble over the truth, but most of them
pick themselves up and hurry off as if nothing happened.
Winston Churchill
Giving money and power to government is like giving whiskey and
car keys to teenage boys.
P.J. O'Rourke
The difference between genius and stupidity is that genius has limits.
Albert Einstein
My test program ...
Code:
#!/bin/bash
# Daniel B. Martin Sep12
#
# To execute this program, launch a terminal session and enter:
# bash /home/daniel/Desktop/LQfiles/dbm472.bin
#
# This program inspired by
# http://www.linuxquestions.org/questions/programming-9/
# convert-text-paragraph-for-database-4175426737/
# File identification
InFile='/home/daniel/Desktop/LQfiles/dbm472inp.txt'
OutFile1='/home/daniel/Desktop/LQfiles/dbm472out1.txt'
OutFile2='/home/daniel/Desktop/LQfiles/dbm472out2.txt'
OutFile3='/home/daniel/Desktop/LQfiles/dbm472out3.txt'
OutFile4='/home/daniel/Desktop/LQfiles/dbm472out4.txt'
# 1) Change all line breaks to tildes.
# 2) Change all double tildes to single line breaks.
# 3) Prefix and postfix every line with a double-quote .. and ..
# replace the last tilde in each line with a comma.
# 4) Change all tildes to blanks.
echo; echo "Method of DBM"
tr "\n" "~" < $InFile \
|sed -r 's/~~/\n/g' \
|sed -r 's/(.*)~(.*)/"\1","\2"/' \
|tr '~' ' ' \
> $OutFile1
cat $OutFile1
echo; echo "Method of LQ member firstfire"
sed -nr 'H; x; $ba; /\n\n/{:a; s/^\n|\n\n//g;
s/\n/ /g; s/([^.]+\.) +(.*)/"\1", "\2"/; p; b}; x;' $InFile > $OutFile2
cat $OutFile2
echo; echo "Method of LQ moderator colucix"
awk 'BEGIN { RS = "\n\n\n" }
{
gsub(/^|$/,"\"")
sub(/\n+"$/,"\"")
$0 = gensub(/\n([^\n]+"$)/,"\",\"\\1","g")
gsub(/\n/," ")
sub(/$/,"\n")
print
}' $InFile > $OutFile3
cat $OutFile3
echo; echo "Method of LQ Senior Member ntubski"
awk -F\\n -vRS='\n(\n\n|$)' -vQ=\" '{a=$NF;NF--;$0=Q$0Q","Q a Q}1' $InFile > $OutFile4
cat $OutFile4
echo; echo "Normal end of job."; echo
exit
Readers are invited to comment on any aspect of my testing and/or correct their solutions.
Daniel B. Martin
Last edited by danielbmartin; 09-12-2012 at 09:11 AM.
Reason: Correct t7po
It is perplexing to find that the output files from the four solutions differ to some degree. Perhaps I have misunderstood the problem; perhaps there is ambiguity in the OP's problem statement.
firstfire, colucix, and I all understood the seperator between quotes to be 2 empty lines (which is 3 newline characters), whereas you understood it to be 2 newline characters (which is 1 empty line). Upon rereading the original post I suspect your interpretation was the intended one, although the notation of the example input is kind of confusing...
Code:
One or more
lines of text immediately followed by.
The author name on one line.
[Two line] # looks like
[breaks.] # 2 empty lines
Code:
"One or more lines of text immediately followed by.","The author name on one line."
[Line break] # but nobody thought I didn't think this indicated an empty line
I think the lesson here is always give a concrete example input and output.
Last edited by ntubski; 09-12-2012 at 05:54 PM.
Reason: I should speak for myself :/
I quickly noticed my question was inaccurate of how many linebreaks there is. Thanks for the comment ntubski.
- so I replaced \n\n\n with \n\n\n* where appropriate.
- The use of "gensub" in a script worked only after installing gawk.
--
"Oikeastaan tiedämme vain, kun tiedämme vähän: tietämisen mukana kasvaa epäilys.","Goethe, Maximen und Reflexionen."
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.