LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Sed substitution using & (http://www.linuxquestions.org/questions/linux-newbie-8/sed-substitution-using-and-868734/)

linuxScriptGirl 03-15-2011 11:01 AM

Sed substitution using &
 
I have searched high and low and hope you can help me. I need to find each line in a file which does NOT begin with a double quote (") and append that line to the previous line.

I have been successful doing this using the following command:

cat filname.csv | sed -e :a -e '$!Ns/\n[^"]//;ta -e 'P;D' > newfilename.csv

My issue is the substitution. As you would expect after the line is appended to the previous line the first character is removed. I need it to not be removed. I tried
cat filname.csv | sed -e :a -e '$!Ns/\n[^"]/&/;ta -e 'P;D' > newfilename.csv

but it just hangs.

Goal:
Input:
"line 1"
line 2
Output with existing sed command is:
line 1ine2

I need it to be line1line2.

Any help you can provide would be GREATLY appreciated!!

Snark1994 03-15-2011 11:59 AM

I can't quite follow your syntax (it doesn't seem to work on my prompt) but I can see what the problem is (I think).

You're calling
Code:

s/\n[^"]//
This will delete the newline and the character after it. You need to add a group:

Code:

s/\n([^"])/\1/
This replaces \n and a non-quote with the non-quote character.

Hope this helps,

linuxScriptGirl 03-15-2011 01:07 PM

Quote:

Originally Posted by Snark1994 (Post 4291647)
I can't quite follow your syntax (it doesn't seem to work on my prompt) but I can see what the problem is (I think).

You're calling
Code:

s/\n[^"]//
This will delete the newline and the character after it. You need to add a group:

Code:

s/\n([^"])/\1/
This replaces \n and a non-quote with the non-quote character.

Hope this helps,



==========================================================
Thanks for your suggestion.

I tried your \1 within the quotes and i get a message:
sed: -e expression #2, char 1: invalid reference \1 on `s' command's RHS.
Do you know what that means?

Ignotum Per Ignotius 03-15-2011 01:44 PM

Hi linuxScriptGirl.

...Seems like you are strangely attractive to Welsh slackers...

I think the answer (well an answer) to your problem is to break it into three operations, since the newline is a bit of a pain. By translating the newline into some obscure character (i.e. a character which one could reliably assume will never appear in your input file), it becomes pretty straightforward. Here's my stab at it, anyway...

Code:

cat filename.csv | tr '\n' '' | sed 's/\([^\"]\)/ \1/g' | tr '' '\n' > filename.csv
I tried it on this file:

Code:

"line 1"
"line 2"
"line 3"
line 4
"line 5"
"line 6"
line 7
line 8
"line 9"
line 10

...and got this:

Code:

"line 1"
"line 2"
"line 3" line 4
"line 5"
"line 6" line 7 line 8
"line 9" line 10

The newline is changed into a space: you can easily eliminate this if you don't want it, by tweaking the sed script.

Nos da cariad... :)

Snark1994 03-15-2011 02:55 PM

Quote:

Originally Posted by Ignotum Per Ignotius (Post 4291728)
...Seems like you are strangely attractive to Welsh slackers...

Nah, we're strangely attractive to everyone else is what it is... isn't it? ;)

EDIT: Darn, forgot to answer the question. You just need to escape the parentheses:
Code:

s/\n\([^"]\)/\1/

grail 03-15-2011 06:00 PM

How about:
Code:

sed -r ':a /^"/{N;s/\n([^"])/\1/};ta' filname.csv > newfilname.csv

Ignotum Per Ignotius 03-15-2011 06:12 PM

Snark's answer's better than my quick 'n dirty effort, since it makes no assumptions about file content.

Follow his advice & ignore mine.

Quote:

I tried
Code:

cat filname.csv | sed -e :a -e '$!Ns/\n[^"]/&/;ta -e 'P;D' > newfilename.csv
but it just hangs.
...Out of interest, whence came this script? It doesn't work properly even with Snark's correction.

If you're interested in the post-mortem, there are a few typos in there: the odd number of single quotes means that the thing will go into interactive mode (you need a single quote after the ta); missing a semicolon after the N prompts sed to grumble about "extra characters after command", and lastly the ampersand causes it to grind to a halt after a couple of lines.

If you need some good ready-made SED scripts, this is a great place (particularly if you're as idle as I am) --- the script you need is a simple modification of this one

Code:

# if a line begins with an equal sign, append it to the previous line
 # and replace the "=" with a single space
 sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D'

(which is listed a couple of screens down the page). It The only modifications to make to it are to swap the = for a [^"] and to mark out the latter as a sub-expression (using \( \) and \1). Which gives you this

Code:

sed -e :a -e '$!N;s/\n\([^"]\)/\1/;ta' -e 'P;D'
I should also point out that you won't be able simply to pipe in your file and redirect the output back into the same file, since sed is still reading the file --- chances are you'll end up with an empty file.

If you want to change the file itself, use the -i switch to edit the file in place:

Code:

sed -i -e :a -e '$!N;s/\n\([^"]\)/\1/;ta' -e 'P;D' filename.csv
Hope this answers your question!

Wales over & out.

Ignotum Per Ignotius 03-16-2011 05:45 PM

linuxScriptGirl,

How did you get on? Did our suggestions do what you wanted, or do you require further assistance? If the former, then be a dear and mark the thread [SOLVED]; if the latter, let us have your questions... :)

linuxScriptGirl 03-17-2011 10:18 AM

Quote:

Originally Posted by Snark1994 (Post 4291814)
Nah, we're strangely attractive to everyone else is what it is... isn't it? ;)

EDIT: Darn, forgot to answer the question. You just need to escape the parentheses:
Code:

s/\n\([^"]\)/\1/

=============================
Thanks so much. It is working now!

linuxScriptGirl 03-17-2011 10:20 AM

Quote:

Originally Posted by Ignotum Per Ignotius (Post 4291728)
Hi linuxScriptGirl.

...Seems like you are strangely attractive to Welsh slackers...

I think the answer (well an answer) to your problem is to break it into three operations, since the newline is a bit of a pain. By translating the newline into some obscure character (i.e. a character which one could reliably assume will never appear in your input file), it becomes pretty straightforward. Here's my stab at it, anyway...

Code:

cat filename.csv | tr '\n' '' | sed 's/\([^\"]\)/ \1/g' | tr '' '\n' > filename.csv
I tried it on this file:

Code:

"line 1"
"line 2"
"line 3"
line 4
"line 5"
"line 6"
line 7
line 8
"line 9"
line 10

...and got this:

Code:

"line 1"
"line 2"
"line 3" line 4
"line 5"
"line 6" line 7 line 8
"line 9" line 10

The newline is changed into a space: you can easily eliminate this if you don't want it, by tweaking the sed script.

Nos da cariad... :)

========================================
Thanks for your suggestion. I was able to be succesful with the sugg from SNARK1994. I will keep this handy though for the future.


All times are GMT -5. The time now is 01:56 AM.