[SOLVED] script to match patterns on files and echo on the same line

rebelbuttmunch · 11-22-2011, 04:09 AM

Hi,
I've got two files
file 1:
A hello
B goodbye
C cya

file2
data A data data (lots of data)
data C data data
data B data data

I want to iterate through the first file, taking the first column as the key and grep the second file. I want the output to be merged on one line like this
result:
A Hello data A data data data
B goodbye data data data
C cya data dat data

Im able to do it up to the merging on the one line. The greps Im using are putting everything on its own line like this

A Hello
A Hello data data data

Ideas?

sycamorex · 11-22-2011, 04:40 AM

Can you post the code that you've got so far?

rebelbuttmunch · 11-22-2011, 04:58 AM

A colleague solved it for me. Using a file with a column of sorted key values

for seq in `cat seqid_uniq_sort.txt`; do echo `cat file_2 | grep $seq && cat file_1.txt | grep $seq`; done

crts · 11-22-2011, 05:46 AM

Quote:

Originally Posted by rebelbuttmunch

A colleague solved it for me. Using a file with a column of sorted key values

for seq in `cat seqid_uniq_sort.txt`; do echo `cat file_2 | grep $seq && cat file_1.txt | grep $seq`; done

Hi,

thanks for following up and posting the solution. Allow me some comments on the script where I think it can be improved.

Do not use 'for' to read lines of a file. Read the following link that covers the problem in more detail:
http://mywiki.wooledge.org/DontReadLinesWithFor

You do not need to 'cat' the files and then pipe them to grep. Just use
grep "$seq" filename

It is also good practice to double-quote variables.

Finally, here is an alternative that also works with your sample data and does not require a third index file.

Code:

paste <(sort -k2 file2) <(sort -k1 file)

However, the formatting looks a bit different. Your files have trailing blanks which are being kept. If this is not wanted then you could do something like

Code:

paste <(sort -k2 file2) <(sort -k1 file) | sed 's/[[:blank:]]\+/ /g'

The 'sed' will replace multiple SPACE/TAB with a single space.

Hope this helps.