LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Search text file for records in another text file and pull extra data over to new (https://www.linuxquestions.org/questions/linux-newbie-8/search-text-file-for-records-in-another-text-file-and-pull-extra-data-over-to-new-4175434885/)

schneidz 10-31-2012 07:54 PM

um, you need to read the man pages.

what you are trying to do above is search for the string "MSENGNKNIAIVEAFSETDKKTGEVVTLVPNTNNTVQPVALMRLGLFVPTLKSTSRGRKGQMVSMDASAELKQLSLAKAEGYEDIRISGLRLDMDNDFKT WVGIIHAFAKHKVVGDTVTLPFVEFVRLCGIPTARSSAKLRKRLDSSLSRIATNTISFRSKGSDEFYVTHLVQTAKYSVKHDTVELKADPKIFELYQFDK KVLLQLRAINELSRKESAQALYTFIESLPPDPAPISLARLRARLNLTSRTITQNATVRKAMEQLREIGYLDYTEVKRGNSVYFVIHYRRPKLRQAQISTK IDNDETEYSLPDENQDDIIDVVPDEKEGKMVMLSKEELALLEELRKAKTRK" in file_1.txt

Adzrules 11-01-2012 03:07 AM

Quote:

Originally Posted by schneidz (Post 4819334)
um, you need to read the man pages.

what you are trying to do above is search for the string "MSENGNKNIAIVEAFSETDKKTGEVVTLVPNTNNTVQPVALMRLGLFVPTLKSTSRGRKGQMVSMDASAELKQLSLAKAEGYEDIRISGLRLDMDNDFKT WVGIIHAFAKHKVVGDTVTLPFVEFVRLCGIPTARSSAKLRKRLDSSLSRIATNTISFRSKGSDEFYVTHLVQTAKYSVKHDTVELKADPKIFELYQFDK KVLLQLRAINELSRKESAQALYTFIESLPPDPAPISLARLRARLNLTSRTITQNATVRKAMEQLREIGYLDYTEVKRGNSVYFVIHYRRPKLRQAQISTK IDNDETEYSLPDENQDDIIDVVPDEKEGKMVMLSKEELALLEELRKAKTRK" in file_1.txt

Indirectly though. I want to find the corresponding CDS.001 line in file_1.txt and then pull the next few lines below that record so that I can obtain the MSENGNKNIAIVEAFSETDKKTGEVVTLVPNTNNTVQPVALMRLGLFVPTLKSTSRGRKGQMVSMDASAELKQLSLAKAEGYEDIRISGLRLDMDNDFKT WVGIIHAFAKHKVVGDTVTLPFVEFVRLCGIPTARSSAKLRKRLDSSLSRIATNTISFRSKGSDEFYVTHLVQTAKYSVKHDTVELKADPKIFELYQFDK KVLLQLRAINELSRKESAQALYTFIESLPPDPAPISLARLRARLNLTSRTITQNATVRKAMEQLREIGYLDYTEVKRGNSVYFVIHYRRPKLRQAQISTK IDNDETEYSLPDENQDDIIDVVPDEKEGKMVMLSKEELALLEELRKAKTRK bits.

I suppose I effectively want to say: "Find this record from file_1.txt in file_2.txt and then copy that line and the next few lines until a ">CDS.*" is encountered."

millgates 11-01-2012 03:21 AM

What schneidz wanted to say is that you mixed up the files. You search patterns from file_2 in file_1 instead the other way around. The filename after -f should be the one with what you want to search, while the other one is the one in which you want to search.

Adzrules 11-01-2012 04:37 AM

Quote:

Originally Posted by millgates (Post 4819489)
What schneidz wanted to say is that you mixed up the files. You search patterns from file_2 in file_1 instead the other way around. The filename after -f should be the one with what you want to search, while the other one is the one in which you want to search.

I see.

However, if I switch things around, where I'm searching for fgrep -f FileIWantToSearch FileWithinWhichToSearch, I get a blank text file again. I've tried it both ways round and always get a blank text file.

Adzrules 11-01-2012 05:02 AM

Okay guys, thanks ever so much for your help! It has been much appreciated. My apologies for my idiocy at points, but I have solved it myself now with the following command:

Code:

for i in `cat file_1.txt`; do grep $i file_2.txt -A 1; done

millgates 11-01-2012 06:18 AM

Hm, it seems that there are trailing lines in your file_1.txt. That's why the grep -f did not work. It assumed the spaces are part of the pattern to search. If you remove the spaces, it should work.

Edit: I meant "trailing spaces", of course, not "trailing lines"

David the H. 11-03-2012 11:53 AM

This is also incorrect in several ways.
Code:

for i in `cat file_1.txt`; do grep $i file_2.txt -A 1; done
1) Don't Read Lines With For. Use a while+read loop.

2) Useless Use Of Cat. Although in this case it's really part and parcel of #1.

3) QUOTE ALL OF YOUR VARIABLE SUBSTITUTIONS. You should never leave the quotes off a parameter expansion unless you explicitly want the resulting string to be word-split by the shell (globbing patterns are also expanded). This is a vitally important concept in scripting, so train yourself to do it correctly now. You can learn about the exceptions later.

http://mywiki.wooledge.org/Arguments
http://mywiki.wooledge.org/WordSplitting
http://mywiki.wooledge.org/Quotes

4) In addition, while not wrong, per se, $(..) is highly recommended over `..`. Backticks are deprecated syntax.

Code:

while IFS='' read -r entry ; do
        grep -A 1 "$entry" file_2.txt
done <file_1.txt

I added IFS='' to preserve leading and trailing whitespace in the file. Leave this off if this isn't an issue, or if you want it removed.

grep -f is probably still the best option, however, although you need to be sure the matching patterns are exact, as millgates mentioned. Another thing to watch out for is that your files shouldn't have dos-style line endings. The extra carriage return would generally make matches fail.


All times are GMT -5. The time now is 09:16 AM.