Originally Posted by kmkocot
I am trying to find / write a shell script that will go through a file organized like this (but with thousands of lines)...
...and check the region of each line between the second and third pipes (the 6-digit numbers) against the values in the first column of a separate text file in CSV format like this...
274326,"Wnt family of developmental regulators"
114745,"FOG: Hormone receptors"
100379,"Transcription factor tinman/NKX2-3, contains HOX domain"
...and when they match, replace the value to the right of the third pipe (e.g., fgenesh2_pg.sca...) with the value in the second column in the CSV file associated with that number.
I'm new at scripting but I'm sitting here with Burtch's Linux Shell Scripting with Bash trying to figure out where to start. If anyone can point me to a publicly available script that would be a good starting point or has some suggestions, I would really appreciate it.
ok, I'm gonna sketch roughly what you might do
Here are the commands used below (it would be a good
idea to run a man on them):
cat, cut, grep, sed, eval
sh or bash
you read the input file line by line
say it's named input.txt
you can do that with a loop like that
for line in $(cat input.txt)
..... you process line by line
ok, now inside the loop,
you need to retrieve the code of the
region. You can use the command 'cut' for that,
get the 3rd field of the '|' delimited line:
region=$(echo $line | cut -d'|' -f3)
then, you can use the grep command to look for that region
number in your CSV file, and if grep returns a line you
retrieve the text by using 'cut' again to get the second
field (but this time using ',' as delimiter)
then you substitute the text for the region number and
you write this to another file (that will eventually
replace your input.txt file)
text="$(grep $region csv.txt| cut -d',' -f2)"
cmd="echo $line |sed 's/$region/$text/'"
eval "$cmd" >> output.txt