LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-03-2009, 01:12 AM   #1
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 115

Rep: Reputation: 15
Question Need a script to find/replace numbers with names in 1 file using another as the guide


Hi all,

I am trying to find / write a shell script that will go through a file organized like this (but with thousands of lines)...

93,5.00,"contig00002",169,83,"jgi|Brafl1|100379|fgenesh2_pg.scaffold_359000019"
579,1.00,"contig00003",3,380,"jgi|Brafl1|114745|estExt_fgenesh2_pm.C_1200006"
450,5.00,"contig00007",2,352,"jgi|Brafl1|274326|estExt_GenewiseH_1.C_8420008"

...and check the region of each line between the second and third pipes (the 6-digit numbers) against the values in the first column of a separate text file in CSV format like this...

274326,"Wnt family of developmental regulators"
114745,"FOG: Hormone receptors"
100379,"Transcription factor tinman/NKX2-3, contains HOX domain"

...and when they match, replace the value to the right of the third pipe (e.g., fgenesh2_pg.sca...) with the value in the second column in the CSV file associated with that number.

I'm new at scripting but I'm sitting here with Burtch's Linux Shell Scripting with Bash trying to figure out where to start. If anyone can point me to a publicly available script that would be a good starting point or has some suggestions, I would really appreciate it.

Last edited by kmkocot; 07-03-2009 at 01:13 AM.
 
Old 07-03-2009, 01:15 AM   #2
veerain
Senior Member
 
Registered: Mar 2005
Location: Earth bound to Helios
Distribution: Custom
Posts: 2,524

Rep: Reputation: 315Reputation: 315Reputation: 315Reputation: 315
use an awk based script or a sed based script.
 
Old 07-03-2009, 02:18 AM   #3
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 242Reputation: 242Reputation: 242
See here for a small example.
 
Old 07-03-2009, 02:39 AM   #4
vonbiber
Member
 
Registered: Apr 2009
Distribution: slackware 14.0 32-bit, slackware 14.1 64-bit
Posts: 329

Rep: Reputation: 52
Quote:
Originally Posted by kmkocot View Post
Hi all,

I am trying to find / write a shell script that will go through a file organized like this (but with thousands of lines)...

93,5.00,"contig00002",169,83,"jgi|Brafl1|100379|fgenesh2_pg.scaffold_359000019"
579,1.00,"contig00003",3,380,"jgi|Brafl1|114745|estExt_fgenesh2_pm.C_1200006"
450,5.00,"contig00007",2,352,"jgi|Brafl1|274326|estExt_GenewiseH_1.C_8420008"

...and check the region of each line between the second and third pipes (the 6-digit numbers) against the values in the first column of a separate text file in CSV format like this...

274326,"Wnt family of developmental regulators"
114745,"FOG: Hormone receptors"
100379,"Transcription factor tinman/NKX2-3, contains HOX domain"

...and when they match, replace the value to the right of the third pipe (e.g., fgenesh2_pg.sca...) with the value in the second column in the CSV file associated with that number.

I'm new at scripting but I'm sitting here with Burtch's Linux Shell Scripting with Bash trying to figure out where to start. If anyone can point me to a publicly available script that would be a good starting point or has some suggestions, I would really appreciate it.
ok, I'm gonna sketch roughly what you might do
Here are the commands used below (it would be a good
idea to run a man on them):
cat, cut, grep, sed, eval
sh or bash

you read the input file line by line
say it's named input.txt
you can do that with a loop like that
for line in $(cat input.txt)
do
..... you process line by line
....
done

ok, now inside the loop,

you need to retrieve the code of the
region. You can use the command 'cut' for that,
get the 3rd field of the '|' delimited line:

region=$(echo $line | cut -d'|' -f3)

then, you can use the grep command to look for that region
number in your CSV file, and if grep returns a line you
retrieve the text by using 'cut' again to get the second
field (but this time using ',' as delimiter)
then you substitute the text for the region number and
you write this to another file (that will eventually
replace your input.txt file)

text="$(grep $region csv.txt| cut -d',' -f2)"
cmd="echo $line |sed 's/$region/$text/'"
eval "$cmd" >> output.txt
 
Old 07-03-2009, 03:30 AM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1953Reputation: 1953Reputation: 1953Reputation: 1953Reputation: 1953Reputation: 1953Reputation: 1953Reputation: 1953Reputation: 1953Reputation: 1953Reputation: 1953
This is definitely an awk job here; no other tools should be necessary. Since awk is field-based, it's almost trivial to design a script to compare one field to a value, and changing another field based on the results.

Even I could probably design a basic script, just a simple if-loop, replace, and print, but I'm not sure how you'd go about searching through the values in a separate file for matching.

Check out the awk tutorial at the unix grymoire for help here. It takes some time to work through, but it will be worth it for jobs like this.

Edit: check out ghostdog's link above. That's exactly what I'm talking about.

Last edited by David the H.; 07-03-2009 at 03:32 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Find and replace with a shell script abefroman Programming 5 03-10-2009 03:38 PM
bash - replace all spaces in file, folder names babag Programming 24 04-20-2008 12:17 AM
Replace Numeric From File Names ajidnair Linux - Newbie 8 02-28-2008 10:58 PM
Shell script: Find "\n\t..." to replace a string in a file michael24h7d Programming 8 05-11-2007 03:07 AM
find and replace script UnixKiwi Programming 12 04-16-2007 11:08 PM


All times are GMT -5. The time now is 10:01 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration