LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-03-2023, 08:26 AM   #16
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660

Quote:
Originally Posted by astrogeek View Post
You may want to reconsider the thread title: SED syntax question

Great solutions to the problem however!
Yes, the thread veered off course but that's okay. Each member contributed new ideas.

This thread is not marked SOLVED because we might still get back to sed syntax.

I don't understand how to use (. ){4} to indicate "repeat four times." No knowing the proper name of this language construct I couldn't do an effective search. Please advise.

Daniel B. Martin

.
 
Old 03-03-2023, 08:29 AM   #17
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,597

Rep: Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545
Quote:
Originally Posted by pan64 View Post
using two tools instead of only one costs more
It's more accurate to say that using multiple tools has an overhead of transferring data between them, which can reduce performance. Two fast tools with an overhead may still end up faster than a single slow tool.

However, I'm intrigued that the cut+paste solution appears so much faster. If Shruggy was still around he would already have pointed out that mawk is faster than gawk.


Quote:
Do not try to measure performance on a few lines (because you will not be able to produce interpretable result), but millions of data.
Or put another way: non-real testing does not predict real world performance.

As a learning exercise, there's no "real world" target here, but it's useful to remember a few things...

* Small amounts of data/iterations can mask random fluctuations, and also doesn't reveal overheads that affect how code scales.
* There's a difference between one file with a million lines, a million files with one line, (and a thousand files with a thousand lines).
* Many systems are complex and simply looping lots of times often does not reflect what a live system will be doing.
* Sometimes the right choice can be to reduce performance of a single task in order to improve overall system performance.

When performance matters, use profiling tools on a controlled replica of the live environment, with accurate data, behaviours, and measurements - and then one gets a better idea of the ideal areas to focus optimization efforts. Of course writing that is easier than doing it - especially if there are unknown/unexpected usage peaks that mean what one thinks is accurate user behaviour doesn't actually apply at the critical times, or affects resources one isn't monitoring.

Anyhow, this is probably veering too far off-track on a thread that started about readability, so I'll stop there. :)

 
1 members found this post helpful.
Old 03-03-2023, 08:34 AM   #18
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,804

Rep: Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306
Quote:
Originally Posted by danielbmartin View Post
Yes, the thread veered off course but that's okay. Each member contributed new ideas.

This thread is not marked SOLVED because we might still get back to sed syntax.

I don't understand how to use (. ){4} to indicate "repeat four times." No knowing the proper name of this language construct I couldn't do an effective search. Please advise.

Daniel B. Martin

.
this is a simple regexp construct
Code:
(, ){4}
# means
(, )(, )(, )(, )
That's all. In general you can specify two numbers {x,y} where x means the minimal, y the maximal number of occurrences.
https://www.regular-expressions.info/repeat.html
 
2 members found this post helpful.
Old 03-03-2023, 08:45 AM   #19
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,597

Rep: Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545
Quote:
Originally Posted by danielbmartin View Post
I don't understand how to use (. ){4} to indicate "repeat four times." No knowing the proper name of this language construct I couldn't do an effective search. Please advise.
The {4} is called called a quantifier.

(I use the term numeric quantifier, to differentiate from the shorthand quantifiers (? * +), but I'm not sure how widespread that usage is.)

The issue here is that quantifying a capturing group doesn't give you more groups, it only changes what is stored in the single group.

I've not encountered a regex implementation that has any mechanism for capturing multiple groups without explicitly defining the separate groups.


Last edited by boughtonp; 03-03-2023 at 08:58 AM.
 
3 members found this post helpful.
Old 03-03-2023, 08:54 AM   #20
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,781

Rep: Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198
Interesting question. I had to test this out.
Yes, the quantifier {x,y} tries to match the preceding character or group as often as possible (greedy) between x and y times.
And obviously the reference to a group gets the last matched one.
The following demonstrates it:
Code:
echo abcdefgh | sed -r 's/(..)*/\1/'
gh
First repetition then reference to the last match.

Last edited by MadeInGermany; 03-03-2023 at 09:03 AM.
 
1 members found this post helpful.
Old 03-03-2023, 04:28 PM   #21
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Quote:
Originally Posted by pan64 View Post
You can definitely try it. using regexp is much slower than ${var:x:y}, so I'm not really sure about that.
Okay, I'll try not to gloat too much, but I believe I am entitled to an "I told you so".
Code:
$ cat bench.bash
#!/bin/bash

yes abcdefgh | head -n 250000 > input.txt

echo time "sed -r 's/(.)./\1 /g' | paste ..."
time sed -r 's/(.)./\1 /g' input.txt | paste -d' ' - input.txt >/dev/null


echo time while read -r line ...
time while read -r line;
do
    echo "${line:0:1} ${line:2:1} ${line:4:1} ${line:6:1} ${line}"
done <input.txt >/dev/null

$ ./bench.bash
time sed -r 's/(.)./\1 /g' | paste ...

real    0m0.504s
user    0m0.478s
sys     0m0.020s
time while read -r line ...

real    0m7.582s
user    0m6.344s
sys     0m1.190s
 
2 members found this post helpful.
Old 03-04-2023, 03:20 AM   #22
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,804

Rep: Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306
Quote:
Originally Posted by ntubski View Post
Okay, I'll try not to gloat too much, but I believe I am entitled to an "I told you so".
I wrote a few lines about it, but dropped. So here it is again:
1. regex is in general slow, solving an issue without regex probably will be faster (obviously only if possible).
2. forking new processes is extremely expensive (compared to a few lines of code using built-in functions)
3. any kind of execution depends on the amount of data, type of the data (length, whatever), complexity of regexp, so there is no general rule here
4. also everything depends on the available resources, load
5. not to speak about the cache, if the tools (files) we use are already cached or not (or if they are located on a remote host...)
6. finally one single measurement is not enough to say anything

but anyway, probably you are right, bash itself is terrible slow for that kind of operation.
 
2 members found this post helpful.
  


Reply

Tags
sed



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sed Syntax Question StorageDon Programming 5 06-09-2016 12:33 AM
[SOLVED] Question about sed syntax in bash script musonio Linux - Software 2 08-21-2009 07:17 AM
Starting httpd: httpd: Syntax error on line 209 of /etc/httpd/conf/httpd.conf: Syntax sethukpathi Linux - Networking 6 04-12-2008 11:26 AM
Another SED Syntax Question... stlouis Programming 3 01-17-2008 08:50 PM
C++ syntax error before :: token HELP, i cant find the syntax error :( qwijibow Programming 2 12-14-2004 06:09 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:35 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration