LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 07-03-2009, 02:12 AM   #1
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 112

Rep: Reputation: 15
Question Need a script to find/replace numbers with names in 1 file using another as the guide


Hi all,

I am trying to find / write a shell script that will go through a file organized like this (but with thousands of lines)...

93,5.00,"contig00002",169,83,"jgi|Brafl1|100379|fgenesh2_pg.scaffold_359000019"
579,1.00,"contig00003",3,380,"jgi|Brafl1|114745|estExt_fgenesh2_pm.C_1200006"
450,5.00,"contig00007",2,352,"jgi|Brafl1|274326|estExt_GenewiseH_1.C_8420008"

...and check the region of each line between the second and third pipes (the 6-digit numbers) against the values in the first column of a separate text file in CSV format like this...

274326,"Wnt family of developmental regulators"
114745,"FOG: Hormone receptors"
100379,"Transcription factor tinman/NKX2-3, contains HOX domain"

...and when they match, replace the value to the right of the third pipe (e.g., fgenesh2_pg.sca...) with the value in the second column in the CSV file associated with that number.

I'm new at scripting but I'm sitting here with Burtch's Linux Shell Scripting with Bash trying to figure out where to start. If anyone can point me to a publicly available script that would be a good starting point or has some suggestions, I would really appreciate it.

Last edited by kmkocot; 07-03-2009 at 02:13 AM.
 
Old 07-03-2009, 02:15 AM   #2
veerain
Member
 
Registered: Mar 2005
Posts: 387

Rep: Reputation: 43
use an awk based script or a sed based script.
 
Old 07-03-2009, 03:18 AM   #3
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
See here for a small example.
 
Old 07-03-2009, 03:39 AM   #4
vonbiber
Member
 
Registered: Apr 2009
Distribution: slackware
Posts: 299

Rep: Reputation: 50
Quote:
Originally Posted by kmkocot View Post
Hi all,

I am trying to find / write a shell script that will go through a file organized like this (but with thousands of lines)...

93,5.00,"contig00002",169,83,"jgi|Brafl1|100379|fgenesh2_pg.scaffold_359000019"
579,1.00,"contig00003",3,380,"jgi|Brafl1|114745|estExt_fgenesh2_pm.C_1200006"
450,5.00,"contig00007",2,352,"jgi|Brafl1|274326|estExt_GenewiseH_1.C_8420008"

...and check the region of each line between the second and third pipes (the 6-digit numbers) against the values in the first column of a separate text file in CSV format like this...

274326,"Wnt family of developmental regulators"
114745,"FOG: Hormone receptors"
100379,"Transcription factor tinman/NKX2-3, contains HOX domain"

...and when they match, replace the value to the right of the third pipe (e.g., fgenesh2_pg.sca...) with the value in the second column in the CSV file associated with that number.

I'm new at scripting but I'm sitting here with Burtch's Linux Shell Scripting with Bash trying to figure out where to start. If anyone can point me to a publicly available script that would be a good starting point or has some suggestions, I would really appreciate it.
ok, I'm gonna sketch roughly what you might do
Here are the commands used below (it would be a good
idea to run a man on them):
cat, cut, grep, sed, eval
sh or bash

you read the input file line by line
say it's named input.txt
you can do that with a loop like that
for line in $(cat input.txt)
do
..... you process line by line
....
done

ok, now inside the loop,

you need to retrieve the code of the
region. You can use the command 'cut' for that,
get the 3rd field of the '|' delimited line:

region=$(echo $line | cut -d'|' -f3)

then, you can use the grep command to look for that region
number in your CSV file, and if grep returns a line you
retrieve the text by using 'cut' again to get the second
field (but this time using ',' as delimiter)
then you substitute the text for the region number and
you write this to another file (that will eventually
replace your input.txt file)

text="$(grep $region csv.txt| cut -d',' -f2)"
cmd="echo $line |sed 's/$region/$text/'"
eval "$cmd" >> output.txt
 
Old 07-03-2009, 04:30 AM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950
This is definitely an awk job here; no other tools should be necessary. Since awk is field-based, it's almost trivial to design a script to compare one field to a value, and changing another field based on the results.

Even I could probably design a basic script, just a simple if-loop, replace, and print, but I'm not sure how you'd go about searching through the values in a separate file for matching.

Check out the awk tutorial at the unix grymoire for help here. It takes some time to work through, but it will be worth it for jobs like this.

Edit: check out ghostdog's link above. That's exactly what I'm talking about.

Last edited by David the H.; 07-03-2009 at 04:32 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Find and replace with a shell script abefroman Programming 5 03-10-2009 04:38 PM
bash - replace all spaces in file, folder names babag Programming 24 04-20-2008 01:17 AM
Replace Numeric From File Names ajidnair Linux - Newbie 8 02-28-2008 11:58 PM
Shell script: Find "\n\t..." to replace a string in a file michael24h7d Programming 8 05-11-2007 04:07 AM
find and replace script UnixKiwi Programming 12 04-17-2007 12:08 AM


All times are GMT -5. The time now is 07:44 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration