LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   print only changed line with sed after double substition (https://www.linuxquestions.org/questions/linux-newbie-8/print-only-changed-line-with-sed-after-double-substition-4175601021/)

vincix 03-03-2017 04:01 PM

print only changed line with sed after double substition
 
Hi,

As the title says, I'm trying to print only the lines that have been changed with sed after a double substitution.
This is the text file:
Code:

The grand old Duke of York
He had ten thousand men
He marched them up to the top of the hill
And he marched them down again
And when they were up they were up
And when they were down they were down
And when they were only half-way up
They were neither up nor down

Normally I know that I need to use the -n option with the p flag, but in this case I'm trying to print only the lines that were altered by both substitions. The two substitions are 's/down/DOWN/' and 's/up/UP/'.
So for instance, sed -e '/s/up/UP' -e 's/down/DOWN' file.txt will display the whole file, including the lines which haven't been altered.

sed -n -e 's/up/UP/p' -e 's/down/DOWN/p' file.txt won't work either, because lines containing both 'up' and 'down' are going to be displayed twice (once for each substition).

So how should I go about this problem?

syg00 03-03-2017 04:15 PM

Use regex to do the substitution for both in one stanza.

vincix 03-04-2017 01:43 AM

I can only think of the\U option, which turns what it matches into uppercase. I don't want the solution, but I'd like to know in principle how I could actually use regex for two different strings and apply the same action to both. I was thinking of something like 's/"up|down"/\U/', which doesn't work because it interprets them literally. And neither would \| work between "up" and "down" :)

Turbocapitalist 03-04-2017 02:16 AM

Quote:

Originally Posted by vincix (Post 5678742)
So how should I go about this problem?

You probably need to clarify the problem a little more. You can do a lot with t, b, and : alone. The t will branch if s/up/&/; succeeds, though in effect it is just a check.

If you are only ever going to be using GNU sed then you can do it more concisely with T instead and skip the jumping.

syg00 03-04-2017 02:19 AM

Perhaps you shouldn't be so keen to reject options. You might be pleasantly surprised.

Some tips if I might:
- look at "-r"
- check the (GNU) doco for the first few sentences describing the "s" command. Particularly re the matched portion of the pattern space.

(no need for branching in this case)

vincix 03-04-2017 02:58 AM

I'm not rejecting options. After all -n and -e are options. It's just that you suggested using regex and still don't know exactly how I could solve the problem through regex alone. Yes, actually, I've already looked at -r, and sed does interpret | as or, but that doesn't seem to be the right solution. Anyway, I haven't heard of t, b, or :, so I'll have to read a little bit more.

astrogeek 03-04-2017 03:00 AM

Think like this...

Code:

$ sed -rn /ADDRESS/s/MATCH/REPLACE/gp
...using the hints provided by syg00. It works.

I had to use the ADDRESS to get only the desired line(s), then using the not so subtle 's' hint provided by syg00 and a previously mentioned operator provided the right result.

Not to give everything away, here is an obfuscated example with the text in ud.txt:

Code:

$ sed -rn '/.../s/up|down/.../gp' ud.txt
They were neither UP nor DOWN

Perhaps syg00 has seen a way to get it without the address...?

MadeInGermany 03-04-2017 03:12 AM

With the t and d commands
Code:

sed '
s/up/UP/
t s2
d
:s2
s/down/DOWN/
t
d
' file.txt

With awk
Code:

awk 'sub(/up/,"UP") && sub(/down/,"DOWN")' file.txt
If you want to replace multiple ups and downs per line then you need the g modifier in sed or gsub in awk.

syg00 03-04-2017 04:19 AM

Quote:

Originally Posted by vincix (Post 5678872)
I don't want the solution, but I'd like to know in principle how I could actually use regex

This is the approach that will encourage contributions. You are making the effort, I am happy to help. If you eventually feel lost, ask and I will supply my solution. It may not be correct, or sufficient, but hopefully we may all learn something by the exercise.

vincix 03-04-2017 12:42 PM

This is what I came up with:
Code:

sed -nE '/up|down/s/up|down/\U/gp' duke.txt
The problem is that "U" is interpreted as literal "U", and it doesn't convert to uppercase letters. How do I make sed interpret it correctly?

By the way, I think the correct option was -E, not -r, in order to make sed interpret extended regex. I was referring to -E when I said that sed was eventually interpreting | as "or".

Turbocapitalist 03-04-2017 01:22 PM

You'll probably use an ampersand & instead of \U

Here's another alternative:

Code:

sed -e '/down/s/up/&/; t; d;' duke.txt
Though neither example do much with regex, more with sed programming.

The t is a conditional jump. When used without a destination it defaults to a jump to the end of the sed script.
Thus if the // pattern matches AND the s/// substitution succeeds, hop over the command to delete the line.

vincix 03-04-2017 01:29 PM

I don't insist doing it with regex (only). syg00 had suggested it at the beginning of the thread and that's why I was curious. I'm fine with using sed options. The question is, why doesn't \U work? I've seen several examples on the internet.

P.S. Only now did I see that on mac it works only with -E (for extended regex), but on Centos it seems to be working with -r (only?).

Turbocapitalist 03-04-2017 01:33 PM

In which context have you seen \U mentioned? I don't see it in the regex manual or in the manual for sed itself.

Code:

man 7 regex
man sed

Though \U does have a meaning in perl's pattern matching

Code:

man perlre

vincix 03-04-2017 01:34 PM

https://www.gnu.org/software/sed/man...s_0022-Command

And yes, I was working with sed on mac, and now I see it's behaving slightly differently on Centos 7 when using \U. It doesn't interpret it as a literal \U, but it still doesn't work. It simply deletes both matches ("up" and "down").

Turbocapitalist 03-04-2017 01:44 PM

If you want portability you'll need to give up on \U in sed scripting and stay closer to POSIX.

Code:

sed 's/up/UP/g; t up; d; b; :up { s/down/DOWN/g; t; d; }' duke.txt

astrogeek 03-04-2017 02:23 PM

Here is my complete solution, just for... completeness...

Code:

$ sed -rn '/up.*down/s/up|down/\U&/gp' ud.txt
They were neither UP nor DOWN

This is GNU of course, on Slackware, mileage may vary on the mac.

vincix 03-04-2017 02:28 PM

That's good to know, but that's not exactly what I wanted to it to display. I wanted it to display ALL altered lines only ONCE. The problem is that with my initial solution, even though it displays lines that contain either 'up' or 'down', lines that contain both 'up' and 'down' are displayed twice (which is quite an obvious behaviour - first it changes up to UP, the it displays the line as it is, and then that line is in turn altered, by changing down to DOWN, so that's the second time). So your solution only solves this part of the problem, but not the whole problem through one command.

astrogeek 03-04-2017 02:36 PM

Quote:

Originally Posted by vincix (Post 5679141)
I wanted it to display ALL altered lines only ONCE.

Then the rules seem tohave been changed...

Quote:

Originally Posted by vincix (Post 5678742)
in this case I'm trying to print only the lines that were altered by both substitions. The two substitions are 's/down/DOWN/' and 's/up/UP/'.

...which will be only the one line.

If you wnat to also show lines which only include one of the words, simply remove the address...

Code:

$ sed -rn 's/up|down/\U&/gp' ud.txt
He marched them UP to the top of the hill
And he marched them DOWN again
And when they were UP they were UP
And when they were DOWN they were DOWN
And when they were only half-way UP
They were neither UP nor DOWN


vincix 03-04-2017 02:45 PM

I don't see any difference in meaning between the two quotes, but ok. Come to think about it, though, I understand now why the initial post might have been ambiguous - because you're thinking that I'm talking about both substitutions on the same line, but I was referring on the file as a whole. Anyway, now we're talking about the same thing. It's much easier than I initially thought.

What is the role of & after U?

astrogeek 03-04-2017 02:49 PM

Perhaps there is a language barrier, but those sentences are quite different to my understanding.

Quote:

Originally Posted by vincix (Post 5679155)
What is the role of & after U?

Did you not read post #5?

vincix 03-04-2017 02:57 PM

Oh, yes, I know about &. I guess I thought differently about it in this context because of the \U. So \U alters the matched pattern.

Yeah, I guess you're right about the sentence, I realised now after rereading it.

astrogeek 03-04-2017 03:13 PM

I have to admit that even though I use sed daily, I did not see this case until reading syg00's hint. I think that was the pleasant surprise they mentioned.

Using sed is a lot like learning to ride a bicycle, skill improves only with number of attempts!

If your question is sufficiently answered then please mark the thread as solved.

Good luck!

MadeInGermany 03-06-2017 04:28 AM

With the OR condition it is nearly impossible with a non-GNU sed.
With awk
Code:

awk '{ p1=sub(/up/,"UP"); p2=sub(/down/,"DOWN") } (p1 || p2)' file.txt

vincix 03-06-2017 05:54 AM

@MadeInGermany I did see your previous solution - you understood it the same way astrogeek did (which was linguistically correct, even if it wasn't my intention exactly)-, but I wanted to go through each step by myself as much as possible, that's why I tried not to take it into consideration until I reached a later stage.

After I finish (as it were) with sed, I'll go on to learn some awk, too. I had already started it, but I combined it with some bash and so on, and so forth.


All times are GMT -5. The time now is 06:34 AM.