LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Need a script to find/replace numbers with names in 1 file using another as the guide (http://www.linuxquestions.org/questions/programming-9/need-a-script-to-find-replace-numbers-with-names-in-1-file-using-another-as-the-guide-737379/)

kmkocot 07-03-2009 01:12 AM

Need a script to find/replace numbers with names in 1 file using another as the guide
 
Hi all,

I am trying to find / write a shell script that will go through a file organized like this (but with thousands of lines)...

93,5.00,"contig00002",169,83,"jgi|Brafl1|100379|fgenesh2_pg.scaffold_359000019"
579,1.00,"contig00003",3,380,"jgi|Brafl1|114745|estExt_fgenesh2_pm.C_1200006"
450,5.00,"contig00007",2,352,"jgi|Brafl1|274326|estExt_GenewiseH_1.C_8420008"

...and check the region of each line between the second and third pipes (the 6-digit numbers) against the values in the first column of a separate text file in CSV format like this...

274326,"Wnt family of developmental regulators"
114745,"FOG: Hormone receptors"
100379,"Transcription factor tinman/NKX2-3, contains HOX domain"

...and when they match, replace the value to the right of the third pipe (e.g., fgenesh2_pg.sca...) with the value in the second column in the CSV file associated with that number.

I'm new at scripting but I'm sitting here with Burtch's Linux Shell Scripting with Bash trying to figure out where to start. If anyone can point me to a publicly available script that would be a good starting point or has some suggestions, I would really appreciate it.

contusion 07-03-2009 01:15 AM

use an awk based script or a sed based script.

ghostdog74 07-03-2009 02:18 AM

See here for a small example.

vonbiber 07-03-2009 02:39 AM

Quote:

Originally Posted by kmkocot (Post 3595152)
Hi all,

I am trying to find / write a shell script that will go through a file organized like this (but with thousands of lines)...

93,5.00,"contig00002",169,83,"jgi|Brafl1|100379|fgenesh2_pg.scaffold_359000019"
579,1.00,"contig00003",3,380,"jgi|Brafl1|114745|estExt_fgenesh2_pm.C_1200006"
450,5.00,"contig00007",2,352,"jgi|Brafl1|274326|estExt_GenewiseH_1.C_8420008"

...and check the region of each line between the second and third pipes (the 6-digit numbers) against the values in the first column of a separate text file in CSV format like this...

274326,"Wnt family of developmental regulators"
114745,"FOG: Hormone receptors"
100379,"Transcription factor tinman/NKX2-3, contains HOX domain"

...and when they match, replace the value to the right of the third pipe (e.g., fgenesh2_pg.sca...) with the value in the second column in the CSV file associated with that number.

I'm new at scripting but I'm sitting here with Burtch's Linux Shell Scripting with Bash trying to figure out where to start. If anyone can point me to a publicly available script that would be a good starting point or has some suggestions, I would really appreciate it.

ok, I'm gonna sketch roughly what you might do
Here are the commands used below (it would be a good
idea to run a man on them):
cat, cut, grep, sed, eval
sh or bash

you read the input file line by line
say it's named input.txt
you can do that with a loop like that
for line in $(cat input.txt)
do
..... you process line by line
....
done

ok, now inside the loop,

you need to retrieve the code of the
region. You can use the command 'cut' for that,
get the 3rd field of the '|' delimited line:

region=$(echo $line | cut -d'|' -f3)

then, you can use the grep command to look for that region
number in your CSV file, and if grep returns a line you
retrieve the text by using 'cut' again to get the second
field (but this time using ',' as delimiter)
then you substitute the text for the region number and
you write this to another file (that will eventually
replace your input.txt file)

text="$(grep $region csv.txt| cut -d',' -f2)"
cmd="echo $line |sed 's/$region/$text/'"
eval "$cmd" >> output.txt

David the H. 07-03-2009 03:30 AM

This is definitely an awk job here; no other tools should be necessary. Since awk is field-based, it's almost trivial to design a script to compare one field to a value, and changing another field based on the results.

Even I could probably design a basic script, just a simple if-loop, replace, and print, but I'm not sure how you'd go about searching through the values in a separate file for matching.

Check out the awk tutorial at the unix grymoire for help here. It takes some time to work through, but it will be worth it for jobs like this.

Edit: check out ghostdog's link above. That's exactly what I'm talking about.


All times are GMT -5. The time now is 12:36 AM.