LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 02-04-2013, 11:08 AM   #16
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,167

Rep: Reputation: 306Reputation: 306Reputation: 306Reputation: 306

Quote:
Originally Posted by schneidz View Post
thanks but it says the function s|\(^.*\) \(.*\)|s/\1\/\2/g| cannot be parsed
Okay, let's try again by breaking that sed into smaller pieces, hoping they will be "digestible" by aix.
Code:
 sed 's/^/s\//' $InFile1 \
|tr " " "/"              \
|sed 's/$/\/g/'          \
|sed -f - $InFile2 > $OutFile3
Daniel B. Martin
 
Old 02-04-2013, 11:12 AM   #17
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 4,216

Original Poster
Rep: Reputation: 643Reputation: 643Reputation: 643Reputation: 643Reputation: 643Reputation: 643
Quote:
Originally Posted by danielbmartin View Post
Okay, let's try again by breaking that sed into smaller pieces, hoping they will be "digestible" by aix.
Code:
 sed 's/^/s\//' $InFile1 \
|tr " " "/"              \
|sed 's/$/\/g/'          \
|sed -f - $InFile2 > $OutFile3
Daniel B. Martin
once again thanx, but according to the previous error it seems like aix sed cant read from stdin: sed: 0602-420 Cannot open pattern file -.
 
Old 02-04-2013, 11:20 AM   #18
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,167

Rep: Reputation: 306Reputation: 306Reputation: 306Reputation: 306
Quote:
Originally Posted by schneidz View Post
once again thanx, but according to the previous error it seems like aix sed cant read from stdin: sed: 0602-420 Cannot open pattern file -.
Not giving up yet! This version uses an intermediate file.
Code:
 sed 's/^/s\//' $InFile1 \
|tr " " "/"              \
|sed 's/$/\/g/'          \
> $Work1
sed -f $Work1 $InFile2 > $OutFile4
Daniel B. Martin
 
1 members found this post helpful.
Old 02-04-2013, 11:34 AM   #19
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 4,216

Original Poster
Rep: Reputation: 643Reputation: 643Reputation: 643Reputation: 643Reputation: 643Reputation: 643
Quote:
Originally Posted by danielbmartin View Post
Not giving up yet! This version uses an intermediate file.
Code:
 sed 's/^/s\//' $InFile1 \
|tr " " "/"              \
|sed 's/$/\/g/'          \
> $Work1
sed -f $Work1 $InFile2 > $OutFile4
Daniel B. Martin
thanx, i tried it with a 22 line infile1 and a 7 line infile2 and it seems to work well.
now i will time it using the large datasets and see what happens.

thanks alot (even if unsuccessful, at least i learned a bit more about sed).
 
Old 02-04-2013, 11:53 AM   #20
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,167

Rep: Reputation: 306Reputation: 306Reputation: 306Reputation: 306
Quote:
Originally Posted by schneidz View Post
thanx, i tried it with a 22 line infile1 and a 7 line infile2 and it seems to work well.
now i will time it using the large datasets and see what happens.
Suggestion: test timidly. Start with a full-size InFile1 and an InFile2 which is a 10% subset of the real thing. Then 20%, then 30%. It will be instructive if the execution time increases linearly.

Daniel B. Martin
 
1 members found this post helpful.
Old 02-04-2013, 12:53 PM   #21
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,698

Rep: Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988
If you would like to keep the initial changes all sed you could try:
Code:
sed 's/\(^\|$\| \)/\//g;s/^/s/' infile > workfile
 
Old 02-04-2013, 01:43 PM   #22
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 4,216

Original Poster
Rep: Reputation: 643Reputation: 643Reputation: 643Reputation: 643Reputation: 643Reputation: 643
Quote:
Originally Posted by danielbmartin View Post
Suggestion: test timidly. Start with a full-size InFile1 and an InFile2 which is a 10% subset of the real thing. Then 20%, then 30%. It will be instructive if the execution time increases linearly.

Daniel B. Martin
it takes about 2 minutes to cross-correlate a list of 10 substitutions against the large file.


however i get an error like:
Code:
time sed -f sed.f dataset.txt > dataset.sub
sed: 0602-405 There are too many commands for the s/123456789/schneidz5/g function.
when i try to do all the substitutions.

edit: 100 substitutions took about 8 and 1/2 minuts. i tired with 1000 but i got the error above.
(1 substitution took about 1minute 8seconds. so its not linear... its like a bulk discount)

Last edited by schneidz; 02-04-2013 at 01:59 PM.
 
Old 02-04-2013, 03:28 PM   #23
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,167

Rep: Reputation: 306Reputation: 306Reputation: 306Reputation: 306
Quote:
Originally Posted by schneidz View Post
... however i get an error like:
Code:
time sed -f sed.f dataset.txt > dataset.sub
sed: 0602-405 There are too many commands for the s/123456789/schneidz5/g function.
when i try to do all the substitutions.

edit: 100 substitutions took about 8 and 1/2 minutes. i tried with 1000 but i got the error above.
(1 substitution took about 1minute 8seconds. so its not linear... its like a bulk discount)
100 substitutions ran; 1000 did not. It may be expedient (though not elegant) to run 500 subs at a time until the whole task is accomplished. This might be done with a loop in which each iteration chews off the next 500 lines of File1, and makes all the substitutions in File2.

500 is a guess, maybe the upper limit is a lower number.

There is light at the end of this tunnel!

Daniel B. Martin
 
Old 02-04-2013, 04:05 PM   #24
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 4,216

Original Poster
Rep: Reputation: 643Reputation: 643Reputation: 643Reputation: 643Reputation: 643Reputation: 643
Quote:
Originally Posted by danielbmartin View Post
100 substitutions ran; 1000 did not. It may be expedient (though not elegant) to run 500 subs at a time until the whole task is accomplished. This might be done with a loop in which each iteration chews off the next 500 lines of File1, and makes all the substitutions in File2.

500 is a guess, maybe the upper limit is a lower number.

There is light at the end of this tunnel!

Daniel B. Martin
yes i am in the process of haxing something together using split grep and sed. so far looks promising.
 
Old 02-05-2013, 12:15 AM   #25
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,698

Rep: Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988
Instead of split, grep and sed, maybe a simple awk can prepare your files:
Code:
awk '!(NR%500){n++}{print "s/"$1"/"$2"/g" > "workfile" n}' infile
Now you cn simply loop through the files and use your sed -f option. Simply change 500 to whatever you find to be an acceptable number of changes

Last edited by grail; 02-05-2013 at 12:16 AM.
 
1 members found this post helpful.
Old 02-06-2013, 11:36 AM   #26
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,167

Rep: Reputation: 306Reputation: 306Reputation: 306Reputation: 306
Quote:
Originally Posted by grail View Post
Instead of split, grep and sed, maybe a simple awk can prepare your files:
Code:
awk '!(NR%500){n++}{print "s/"$1"/"$2"/g" > "workfile" n}' infile
Now you cn simply loop through the files and use your sed -f option. Simply change 500 to whatever you find to be an acceptable number of changes
I like this idea and attempted to construct a simple test case, but cannot make it work.
Code:
# Create a test file which contains 100 lines,
#   each of the form (number) XXXXX,
#   and break it into 5 equal segments.
seq -w 100          \
|sed 's/$/ XXXXX/'  \
 > $Work3
for ((pass=1;pass<=5;pass=pass+5))
do
  echo "This is loop iteration # $pass"
# grail said: awk '!(NR%500){n++}{print "s/"$1"/"$2"/g" > "workfile" n}' infile
              awk '!(NR%20) {n++}{print "s/"$1"/"$2"/g" > "$Work4"   n}' $Work3
  echo; echo "Segment $pass of input file Work3 ..."; cat $Work4
done
File Work3 is created as desired but the awk isn't producing Work4.
This is what happened.
Code:
This is loop iteration # 1

Segment 1 of input file Work3 ...
cat: /home/daniel/Desktop/LQfiles/dbm614w04.txt: No such file or directory
Please advise.

Daniel B. Martin
 
Old 02-06-2013, 01:38 PM   #27
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,543

Rep: Reputation: 880Reputation: 880Reputation: 880Reputation: 880Reputation: 880Reputation: 880Reputation: 880
Quote:
Originally Posted by danielbmartin View Post
Code:
# grail said: awk '!(NR%500){n++}{print "s/"$1"/"$2"/g" > "workfile" n}' infile
              awk '!(NR%20) {n++}{print "s/"$1"/"$2"/g" > "$Work4"   n}' $Work3
awk doesn't see shell variables, you need something like:
Code:
awk -vWork4="$Work4" '!(NR%20) {n++}{print "s/"$1"/"$2"/g" > (Work4 n)}' $Work3
# or some trickiness with quoting:
awk '!(NR%20) {n++}{print "s/"$1"/"$2"/g" > ("'"$Work4"'" n)}' $Work3
 
1 members found this post helpful.
Old 02-06-2013, 09:54 PM   #28
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,167

Rep: Reputation: 306Reputation: 306Reputation: 306Reputation: 306
Quote:
Originally Posted by ntubski View Post
... you need something like:
Code:
awk '!(NR%20) {n++}{print "s/"$1"/"$2"/g" > ("'"$Work4"'" n)}' $Work3
Thank you for getting me over that hurdle. The code runs but does not produce the expected output. For ease of testing I scaled back to a source file with only 9 lines and code which attempts to parcel them out 3 at a time.

This code ...
Code:
# Create a test file which contains 9 lines,
#   each of the form (number) XXXXX,
#   and break it into 3 equal segments.
seq -w 9 |sed 's/$/ XXXXX/' > $Work3
for ((pass=1;pass<=3;pass++))
do
  rm $Work5
  echo "This is loop iteration # $pass"
  awk '!(NR%4) {n++} {print "s/"$1"/"$2"/g" > ("'"$Work5"'" n)}' $Work3
  echo "Work5 ..."; cat $Work5              
done
... produced this result ...
Code:
This is loop iteration # 1
Work5 ...
s/1/XXXXX/g
s/2/XXXXX/g
s/3/XXXXX/g
This is loop iteration # 2
Work5 ...
s/1/XXXXX/g
s/2/XXXXX/g
s/3/XXXXX/g
This is loop iteration # 3
Work5 ...
s/1/XXXXX/g
s/2/XXXXX/g
s/3/XXXXX/g
Observe that it dished out the same three lines on each iteration.

Please advise.

Daniel B. Martin
 
Old 02-06-2013, 11:33 PM   #29
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,543

Rep: Reputation: 880Reputation: 880Reputation: 880Reputation: 880Reputation: 880Reputation: 880Reputation: 880
Quote:
Originally Posted by danielbmartin View Post
Thank you for getting me over that hurdle. The code runs but does not produce the expected output. For ease of testing I scaled back to a source file with only 9 lines and code which attempts to parcel them out 3 at a time.
The awk code grail proposed already outputs to separate files, try this:
Code:
seq -w 9 | sed 's/$/ XXXXX/' > "$Work3"

# modifed n++ condition to avoid small hiccup on the first parcel
awk '(n*3 < NR) {n++} {print "s/"$1"/"$2"/g" > ("'"$Work5"'" n)}' "$Work3"

for work in "$Work5"* ; do
    echo "$work ..."
    cat "$work"
done
 
1 members found this post helpful.
Old 02-08-2013, 12:52 PM   #30
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,167

Rep: Reputation: 306Reputation: 306Reputation: 306Reputation: 306
InFile1 ...
Code:
hello world
l33tz h4x0r
chunl akuma
quest tribe
salad carot
simon zelda
InFile2 ...
Code:
hello my name is simon, and i like to do drawings; simon says.
lemonade was a popular drink in my day, and it still is.
g0t r00tz third-line: chunli akuma ken ryu sakura
third-line: choppin broccoli -- helloproject2501helloceltics#35hello123
you dont win friends with salad
first-line: deus ex second-line: counter strike v1.6 third-line: burden of 80 proof fourth-line: battle field 2
first-line: a tribe called quest - midnite marauders second-line: the perceptionists - black dialog third-line: buju banton - rasta got soul
Code ...
Code:
#
# Method of LQ Member danielbmartin #14 using sed
#   to break the change-pairs file into pieces,
#   and apply each piece individually to the source file.
#
# Rework InFile1 (the change pairs) into substitution pairs
#   for subsequent use by a "sed -f".
 sed 's/^/s\//' $InFile1 \
|tr " " "/"              \
|sed 's/$/\/g/'          \
> $Work01
# Make a copy of InFile2 (the source file), which will be 
#   incrementally transformed to the desired end product.
cat $InFile2 > $OutFile14
start=1
step=4   # step = number of lines in each subset
for ((start=1;;start=start+step))
do
  let stop=start+step-1
# Use sed to create Work09, a subset of the change file.
  sed $start','$stop'!d' $Work01 > $Work09
# If Work09 is an empty file, leave this for-loop.
# This escapes from what would otherwise be an infinite loop.
  if [ ! -s $Work09 ]; then break; fi
  echo; echo "Now applying this subset of the change file..."; cat $Work09
  sed -f $Work09 $OutFile14 > $Work14
  cat $Work14 > $OutFile14
done
This code applies the change-pairs 4 at a time.
In production use you would change the value of variable step to 300, 400, 500, whatever value your system can handle.
In production use you would disable the echo statements which are used for explanation.

Execution produced this on-screen display ...
Code:
Now applying this subset of the change file...
s/hello/world/g
s/l33tz/h4x0r/g
s/chunl/akuma/g
s/quest/tribe/g

Now applying this subset of the change file...
s/salad/carot/g
s/simon/zelda/g
... and produced this end product ...
Code:
world my name is zelda, and i like to do drawings; zelda says.
lemonade was a popular drink in my day, and it still is.
g0t r00tz third-line: akumai akuma ken ryu sakura
third-line: choppin broccoli -- worldproject2501worldceltics#35world123
you dont win friends with carot
first-line: deus ex second-line: counter strike v1.6 third-line: burden of 80 proof fourth-line: battle field 2
first-line: a tribe called tribe - midnite marauders second-line: the perceptionists - black dialog third-line: buju banton - rasta got soul
Daniel B. Martin
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] awk - field substitutions gafoleyo Linux - Newbie 12 05-13-2012 05:29 PM
code substitutions Loarn Programming 2 07-14-2011 07:07 PM
string substitutions within a file cleopard Programming 1 09-05-2008 04:52 PM
variables within sed substitutions? ocicat Programming 3 07-29-2007 01:17 PM
Perl: Using Vars in Substitutions cramer Programming 6 08-26-2006 01:52 PM


All times are GMT -5. The time now is 05:34 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration