ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I need a utility which will scan in a text file and search and replace strings. I also want to keep track of how many strings I've replaced. The following is an example paragraph and which text to replace.
"Bill Gates" with "Mr. Noodles",
"Gates" with "Noodles",
"Paul Allen" or "Allen" as "The Other Guy",
"Altair" as "WTF",
"Apple" as "The Prodigal Son",
"Tandy" as "Who?"
and "Microsoft" as "Blue Screen of Death"
"Bill Gates was born in Seattle in 1955, the second of three children in a well-to-do family. His father, William H. Gates II, was a lawyer, while his mother, Mary Gates, was a teacher, a regent of the University of Washington, and member of several corporate boards. Gates was first exposed to computers at school in the late 1960s with his friend Paul Allen, the son of two Seattle librarians. By the time Gates was 14, the two friends were writing and testing computer programs for fun and profit.
In 1972 they established their first company, Traf-O-Data, which sold a rudimentary computer that recorded and analyzed traffic data. Allen went on to study computer science at the University of Washington and then dropped out to work at Honeywell, while Gates enrolled at Harvard. Inspired in 1975 by an issue of Popular Electronics that showed the new Altair microcomputer kit just released by MITS Computer, Gates and Allen wrote a version of BASIC for the machine. Later that year Gates left college to work full time developing programming languages for the Altair, and he and Allen relocated to Albuquerque, New Mexico, to be near MITS Computer, where Allen took a position as director of software development. Gates and Allen named their partnership Micro-soft. Their revenues for 1975 totaled $16,000.
A year later, Gates published "An Open Letter to Hobbyists" in the Altair newsletter, in which he enjoined users to avoid illegally copied software. Arguing that software piracy prevented "good software from being written," Gates wrote prophetically, "Nothing would please me more than being able to hire ten programmers and deluge the hobby market with good software." In November 1976 Allen left MITS to devote his full attention to Microsoft, and the company's tradename was registered. In 1977 Apple and Radio Shack licensed Microsoft BASIC for their Apple II and Tandy computers, with the Apple license going for a flat fee of $21,000. As Apple sold a million machines complete with BASIC, Microsoft's unit revenues dropped to two cents a copy.
That same year Microsoft released its second programming language, Microsoft FORTRAN, which was followed in 1978 by a version of COBOL. Both were written for the CP/M operating system, one of many available in the rapidly expanding but still unstandardized microcomputer market. As CP/M was adopted by computer manufacturers including Sirius, Zenith, and Sharp, Microsoft became the leading distributor for microcomputer languages. By the end of 1978 Microsoft had 13 employees, a sales subsidiary in Japan, and $1 million in revenues. The following year Gates and Allen moved the company to Bellevue, Washington."
sed -e 's/Bill Gates/Mr\. Noodles/g;s/Gates/Noodles/g;s/Paul Allen/The Other Guy/g;s/Allen/The Other Guy/g;s/Altair/WTF/g;s/Apple/The Prodigal Son/g;s/Tandy/Who/g;s/Microsoft/Blue Screen of Death/g' input > output
sed -e 's/Bill Gates/Mr\. Noodles/g;s/Gates/Noodles/g;s/Paul Allen/The Other Guy/g;s/Allen/The Other Guy/g;s/Altair/WTF/g;s/Apple/The Prodigal Son/g;s/Tandy/Who/g;s/Microsoft/Blue Screen of Death/g' input > output
This is pretty much straight forward.
I did run the command you posted and it worked, but what I'm looking for is my own script that I can run that will search and replace the text and keep track of how many strings have been replaced.
I know about sed, but what I'm looking for is my own script that I can run that will search and replace the text and keep track of how many strings have been replaced.
What do you mean by your own script ?
Do you want to the user to input values (old values and new values) ?
Well my previous issue aside I would use awk and have two files, one with your text and the other as a delimetered listing of changes.
pseudo code:
Read first file into an array to be looped over for the changes.
For each line in the second file:
a - make necessary changes
b - count each change into its own array based on change name
I am curious how far you have thought this through, what happens if the paragraph looks like this:
Code:
By the time he was 12 years old Bill
Gates will have ...
Strictly speaking this is "Bill Gates", but I believe your solution will end up calling him "Bill Noodles" and not "Mr. Noodles"
Hi grail,
that is a very good point. I totally did not take this into consideration. So here is a modified approach which will also be able to handle input parameters:
Gates
spans over more than one line. This could be handled by adjusting the according quantifier. There might also arise some issues when passing parameters which contain special characters like '.'. In this case they should be escaped accordingly.
Well I will let you play with the particulars of words over multiple lines, but the following works with the current input:
Code:
# change file
Bill Gates|Mr. Noodles
Gates|Noodles
Paul Allen|The Other Guy
Allen|The Other Guy
Altair|WTF
Apple|The Prodigal Son
Tandy|Who?
Microsoft|Blue Screen of Death
#script file
#!/usr/bin/awk -f
BEGIN{ FS="|" }
FILENAME == ARGV[1]{
change[i++]=$1
change[i++]=$2
}
!f && FILENAME != ARGV[1]{
FS=" "
f++
}
FILENAME == ARGV[2]{
for(x=0;x < length(change);x+=2)
n[change[x]] += gsub(change[x],change[x+1])
print
}
END{
print ""
for(c in n)
print c" has a total of "n[c]" changes"
}
This would be called in the following way with input_file containing your text above:
Code:
./change.awk change input_file > output_file
From this my output looks like:
Quote:
"Mr. Noodles was born in Seattle in 1955, the second of three children in a well-to-do family. His father, William H. Noodles II, was a lawyer, while his mother, Mary Noodles, was a teacher, a regent of the University of Washington, and member of several corporate boards. Noodles was first exposed to computers at school in the late 1960s with his friend The Other Guy, the son of two Seattle librarians. By the time Noodles was 14, the two friends were writing and testing computer programs for fun and profit.
In 1972 they established their first company, Traf-O-Data, which sold a rudimentary computer that recorded and analyzed traffic data. The Other Guy went on to study computer science at the University of Washington and then dropped out to work at Honeywell, while Noodles enrolled at Harvard. Inspired in 1975 by an issue of Popular Electronics that showed the new WTF microcomputer kit just released by MITS Computer, Noodles and The Other Guy wrote a version of BASIC for the machine. Later that year Noodles left college to work full time developing programming languages for the WTF, and he and The Other Guy relocated to Albuquerque, New Mexico, to be near MITS Computer, where The Other Guy took a position as director of software development. Noodles and The Other Guy named their partnership Micro-soft. Their revenues for 1975 totaled $16,000.
A year later, Noodles published "An Open Letter to Hobbyists" in the WTF newsletter, in which he enjoined users to avoid illegally copied software. Arguing that software piracy prevented "good software from being written," Noodles wrote prophetically, "Nothing would please me more than being able to hire ten programmers and deluge the hobby market with good software." In November 1976 The Other Guy left MITS to devote his full attention to Blue Screen of Death, and the company's tradename was registered. In 1977 The Prodigal Son and Radio Shack licensed Blue Screen of Death BASIC for their The Prodigal Son II and Who? computers, with the The Prodigal Son license going for a flat fee of $21,000. As The Prodigal Son sold a million machines complete with BASIC, Blue Screen of Death's unit revenues dropped to two cents a copy.
That same year Blue Screen of Death released its second programming language, Blue Screen of Death FORTRAN, which was followed in 1978 by a version of COBOL. Both were written for the CP/M operating system, one of many available in the rapidly expanding but still unstandardized microcomputer market. As CP/M was adopted by computer manufacturers including Sirius, Zenith, and Sharp, Blue Screen of Death became the leading distributor for microcomputer languages. By the end of 1978 Blue Screen of Death had 13 employees, a sales subsidiary in Japan, and $1 million in revenues. The following year Noodles and The Other Guy moved the company to Bellevue, Washington."
Tandy has a total of 1 changes
Paul Allen has a total of 1 changes
Gates has a total of 11 changes
Apple has a total of 4 changes
Altair has a total of 3 changes
Bill Gates has a total of 1 changes
Microsoft has a total of 7 changes
Allen has a total of 7 changes
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.