LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-14-2010, 03:24 PM   #1
jhurley03
LQ Newbie
 
Registered: Jul 2010
Posts: 5

Rep: Reputation: 0
Shell Scripting


I need a utility which will scan in a text file and search and replace strings. I also want to keep track of how many strings I've replaced. The following is an example paragraph and which text to replace.

"Bill Gates" with "Mr. Noodles",
"Gates" with "Noodles",
"Paul Allen" or "Allen" as "The Other Guy",
"Altair" as "WTF",
"Apple" as "The Prodigal Son",
"Tandy" as "Who?"
and "Microsoft" as "Blue Screen of Death"

"Bill Gates was born in Seattle in 1955, the second of three children in a well-to-do family. His father, William H. Gates II, was a lawyer, while his mother, Mary Gates, was a teacher, a regent of the University of Washington, and member of several corporate boards. Gates was first exposed to computers at school in the late 1960s with his friend Paul Allen, the son of two Seattle librarians. By the time Gates was 14, the two friends were writing and testing computer programs for fun and profit.
In 1972 they established their first company, Traf-O-Data, which sold a rudimentary computer that recorded and analyzed traffic data. Allen went on to study computer science at the University of Washington and then dropped out to work at Honeywell, while Gates enrolled at Harvard. Inspired in 1975 by an issue of Popular Electronics that showed the new Altair microcomputer kit just released by MITS Computer, Gates and Allen wrote a version of BASIC for the machine. Later that year Gates left college to work full time developing programming languages for the Altair, and he and Allen relocated to Albuquerque, New Mexico, to be near MITS Computer, where Allen took a position as director of software development. Gates and Allen named their partnership Micro-soft. Their revenues for 1975 totaled $16,000.
A year later, Gates published "An Open Letter to Hobbyists" in the Altair newsletter, in which he enjoined users to avoid illegally copied software. Arguing that software piracy prevented "good software from being written," Gates wrote prophetically, "Nothing would please me more than being able to hire ten programmers and deluge the hobby market with good software." In November 1976 Allen left MITS to devote his full attention to Microsoft, and the company's tradename was registered. In 1977 Apple and Radio Shack licensed Microsoft BASIC for their Apple II and Tandy computers, with the Apple license going for a flat fee of $21,000. As Apple sold a million machines complete with BASIC, Microsoft's unit revenues dropped to two cents a copy.
That same year Microsoft released its second programming language, Microsoft FORTRAN, which was followed in 1978 by a version of COBOL. Both were written for the CP/M operating system, one of many available in the rapidly expanding but still unstandardized microcomputer market. As CP/M was adopted by computer manufacturers including Sirius, Zenith, and Sharp, Microsoft became the leading distributor for microcomputer languages. By the end of 1978 Microsoft had 13 employees, a sales subsidiary in Japan, and $1 million in revenues. The following year Gates and Allen moved the company to Bellevue, Washington."
 
Old 07-14-2010, 04:29 PM   #2
sntnlz
Member
 
Registered: Jun 2005
Location: Virginia, USA
Distribution: Mageia/Kali/Ubuntu/openSUSE/Manjaro/Pop!_OS
Posts: 39

Rep: Reputation: 18
Try "gawk". If you're not familiar with it, go to http://www.gnu.org/manual/gawk/gawk.html for an introduction.
 
Old 07-14-2010, 05:48 PM   #3
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
How about
Code:
sed -e 's/Bill Gates/Mr\. Noodles/g;s/Gates/Noodles/g;s/Paul Allen/The Other Guy/g;s/Allen/The Other Guy/g;s/Altair/WTF/g;s/Apple/The Prodigal Son/g;s/Tandy/Who/g;s/Microsoft/Blue Screen of Death/g' input > output
This is pretty much straight forward.
 
Old 07-14-2010, 09:52 PM   #4
jhurley03
LQ Newbie
 
Registered: Jul 2010
Posts: 5

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by crts View Post
How about
Code:
sed -e 's/Bill Gates/Mr\. Noodles/g;s/Gates/Noodles/g;s/Paul Allen/The Other Guy/g;s/Allen/The Other Guy/g;s/Altair/WTF/g;s/Apple/The Prodigal Son/g;s/Tandy/Who/g;s/Microsoft/Blue Screen of Death/g' input > output
This is pretty much straight forward.
I did run the command you posted and it worked, but what I'm looking for is my own script that I can run that will search and replace the text and keep track of how many strings have been replaced.

Last edited by jhurley03; 07-14-2010 at 09:55 PM.
 
Old 07-14-2010, 09:55 PM   #5
vikas027
Senior Member
 
Registered: May 2007
Location: Sydney
Distribution: RHEL, CentOS, Ubuntu, Debian, OS X
Posts: 1,305

Rep: Reputation: 107Reputation: 107
Question

Quote:
Originally Posted by jhurley03 View Post
I know about sed, but what I'm looking for is my own script that I can run that will search and replace the text and keep track of how many strings have been replaced.

What do you mean by your own script ?

Do you want to the user to input values (old values and new values) ?
 
Old 07-14-2010, 10:02 PM   #6
vikas027
Senior Member
 
Registered: May 2007
Location: Sydney
Distribution: RHEL, CentOS, Ubuntu, Debian, OS X
Posts: 1,305

Rep: Reputation: 107Reputation: 107
Quote:
Originally Posted by sntnlz View Post
Try "gawk". If you're not familiar with it, go to http://www.gnu.org/manual/gawk/gawk.html for an introduction.
Could you please give an example, I generally use sed for this purpose.
 
Old 07-14-2010, 10:16 PM   #7
jhurley03
LQ Newbie
 
Registered: Jul 2010
Posts: 5

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by vikas027 View Post
What do you mean by your own script ?

Do you want to the user to input values (old values and new values) ?
Yes
 
Old 07-14-2010, 10:27 PM   #8
jhurley03
LQ Newbie
 
Registered: Jul 2010
Posts: 5

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by vikas027 View Post
What do you mean by your own script ?

Do you want to the user to input values (old values and new values) ?
The following script reads in a file (supplied by a line argument) line by line into a varilable called linein:

#!/bin/bash
#sample code for reading in a file line by line.
fileIn=$1

while read linein
do
echo $linein
done < $fileIn

The following snippet replaces the first occurance of a substring inside of a string variable:

${stringZ/abc/xyz} #This looks at a string named 'stringZ' and would replace the first occurance 'abc' with 'xyz'

This snippet replaces ALL occurances of the substring inside of a string variable:

${stringZ//abc/xyz}

So now you need to put it all together:
your script will work like this:

fileStringReplace <filename> <stringToSearchFor> <stringToReplaceWith>

It will replace 'stringToSearchFor' with 'stringToReplaceWith' inside 'filename'. It will return how many strings were replaced.
 
Old 07-14-2010, 10:36 PM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
I am curious how far you have thought this through, what happens if the paragraph looks like this:

Quote:
By the time he was 12 years old Bill
Gates will have ...
Strictly speaking this is "Bill Gates", but I believe your solution will end up calling him "Bill Noodles" and not "Mr. Noodles"
 
Old 07-14-2010, 11:54 PM   #10
jhurley03
LQ Newbie
 
Registered: Jul 2010
Posts: 5

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by grail View Post
I am curious how far you have thought this through, what happens if the paragraph looks like this:



Strictly speaking this is "Bill Gates", but I believe your solution will end up calling him "Bill Noodles" and not "Mr. Noodles"
I guess that could possibly happen.
 
Old 07-15-2010, 12:21 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Well my previous issue aside I would use awk and have two files, one with your text and the other as a delimetered listing of changes.

pseudo code:

Read first file into an array to be looped over for the changes.
For each line in the second file:
a - make necessary changes
b - count each change into its own array based on change name
 
Old 07-15-2010, 01:14 AM   #12
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
Quote:
Originally Posted by grail View Post
I am curious how far you have thought this through, what happens if the paragraph looks like this:
Code:
By the time he was 12 years old Bill
Gates will have ...
Strictly speaking this is "Bill Gates", but I believe your solution will end up calling him "Bill Noodles" and not "Mr. Noodles"
Hi grail,

that is a very good point. I totally did not take this into consideration. So here is a modified approach which will also be able to handle input parameters:
Code:
#!/bin/bash
file="$1"
search="$2"
replace="$3"

sed -r "$ ! {N;s/"${search/ /[[:blank:]]*\\n{,1\}[[:blank:]]*}"/$replace/g};$ s/$search/$replace/g" $file
This will not work if Bill



Gates
spans over more than one line. This could be handled by adjusting the according quantifier. There might also arise some issues when passing parameters which contain special characters like '.'. In this case they should be escaped accordingly.
 
Old 07-15-2010, 04:00 AM   #13
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Well I will let you play with the particulars of words over multiple lines, but the following works with the current input:

Code:
# change file
Bill Gates|Mr. Noodles
Gates|Noodles
Paul Allen|The Other Guy
Allen|The Other Guy
Altair|WTF
Apple|The Prodigal Son
Tandy|Who?
Microsoft|Blue Screen of Death

#script file
#!/usr/bin/awk -f

BEGIN{ FS="|" }

FILENAME == ARGV[1]{
	change[i++]=$1
	change[i++]=$2
}

!f && FILENAME != ARGV[1]{
	FS=" "
	f++
}

FILENAME == ARGV[2]{
	for(x=0;x < length(change);x+=2)
		n[change[x]] += gsub(change[x],change[x+1])

	print
}

END{
	print ""
	for(c in n)
		print c" has a total of "n[c]" changes"
}
This would be called in the following way with input_file containing your text above:
Code:
./change.awk change input_file > output_file
From this my output looks like:
Quote:
"Mr. Noodles was born in Seattle in 1955, the second of three children in a well-to-do family. His father, William H. Noodles II, was a lawyer, while his mother, Mary Noodles, was a teacher, a regent of the University of Washington, and member of several corporate boards. Noodles was first exposed to computers at school in the late 1960s with his friend The Other Guy, the son of two Seattle librarians. By the time Noodles was 14, the two friends were writing and testing computer programs for fun and profit.
In 1972 they established their first company, Traf-O-Data, which sold a rudimentary computer that recorded and analyzed traffic data. The Other Guy went on to study computer science at the University of Washington and then dropped out to work at Honeywell, while Noodles enrolled at Harvard. Inspired in 1975 by an issue of Popular Electronics that showed the new WTF microcomputer kit just released by MITS Computer, Noodles and The Other Guy wrote a version of BASIC for the machine. Later that year Noodles left college to work full time developing programming languages for the WTF, and he and The Other Guy relocated to Albuquerque, New Mexico, to be near MITS Computer, where The Other Guy took a position as director of software development. Noodles and The Other Guy named their partnership Micro-soft. Their revenues for 1975 totaled $16,000.
A year later, Noodles published "An Open Letter to Hobbyists" in the WTF newsletter, in which he enjoined users to avoid illegally copied software. Arguing that software piracy prevented "good software from being written," Noodles wrote prophetically, "Nothing would please me more than being able to hire ten programmers and deluge the hobby market with good software." In November 1976 The Other Guy left MITS to devote his full attention to Blue Screen of Death, and the company's tradename was registered. In 1977 The Prodigal Son and Radio Shack licensed Blue Screen of Death BASIC for their The Prodigal Son II and Who? computers, with the The Prodigal Son license going for a flat fee of $21,000. As The Prodigal Son sold a million machines complete with BASIC, Blue Screen of Death's unit revenues dropped to two cents a copy.
That same year Blue Screen of Death released its second programming language, Blue Screen of Death FORTRAN, which was followed in 1978 by a version of COBOL. Both were written for the CP/M operating system, one of many available in the rapidly expanding but still unstandardized microcomputer market. As CP/M was adopted by computer manufacturers including Sirius, Zenith, and Sharp, Blue Screen of Death became the leading distributor for microcomputer languages. By the end of 1978 Blue Screen of Death had 13 employees, a sales subsidiary in Japan, and $1 million in revenues. The following year Noodles and The Other Guy moved the company to Bellevue, Washington."

Tandy has a total of 1 changes
Paul Allen has a total of 1 changes
Gates has a total of 11 changes
Apple has a total of 4 changes
Altair has a total of 3 changes
Bill Gates has a total of 1 changes
Microsoft has a total of 7 changes
Allen has a total of 7 changes
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Terminal functions for shell scripting with Shell Curses LXer Syndicated Linux News 0 03-26-2008 11:50 PM
SHELL scripting/ shell functions mayaabboud Linux - Newbie 6 12-26-2007 08:18 AM
Shell Scripting: Getting a pid and killing it via a shell script topcat Programming 15 10-28-2007 02:14 AM
teaching shell scripting: cool scripting examples? fax8 Linux - General 1 04-20-2006 04:29 AM
shell interface vs shell scripting? I'm confused jcchenz Linux - Software 1 10-26-2005 03:32 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:20 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration