[SOLVED] Need to concatenate many files with the same name occurring in many subdirectories
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Need to concatenate many files with the same name occurring in many subdirectories
Hi all,
I have a "master" directory containing many subdirectories. Each subdirectory contains many files named #####.fa where # is any integer. I'm trying to write a script that will search through all of the subdirectories, find all the files with the same name, and concatenate them together into one output file of the same name in the master directory. The tricky part is that each file name doesn't necessarily occur in every folder. Do I need to make a query list for this or is there a way to specify files of the same name occurring in different directories?
That concatenates the contents of all of the files of interest into one file which is very close. I'm trying to only concatenate files with the same name. I'm sorry I was unclear. I feel like the first half of the script is what the doctor ordered. I'm trying to play around with xargs.
I'd start with using find or ls -R to generate a list of all files in all dirs, pipe through sort -u (or uniq) to get a unique list, then use that to drive the concatenate process.
That concatenates the contents of all of the files of interest into one file which is very close. I'm trying to only concatenate files with the same name. I'm sorry I was unclear. I feel like the first half of the script is what the doctor ordered. I'm trying to play around with xargs.
Thanks,
Kevin
Not in my installation. I just tested it (with an admittedly small
sample, two identical files of the same name each, in the main and a
sub-directory ... ) and it produces two concatenated files, each of
which matches the name of 1 pair.
Tinkster: My apologies... Operator error. I used FILENAME as the target of the print command at the end (instead of newfile) and it concatenated them all to that file. Now it is behaving as you said it should. Thank you!
Similar question...
I have a directory with many subdirectories each named like so: KOG0001, KOG0002, ...KOG9999.
Each of these subdirectories contain a variable number two kinds of files (nuc and prot) named like so: Capitella_sp_nuc_hits.fasta (nuc) and Capitella_sp_prot_hits.fasta (prot). The Capitella_sp part represents the name of the species and varies from file to file.
I'm trying to write a script that will go through each subdirectory and concatenate the contents of all the _prot_hits.fasta files into one file in the main directory named like KOG0001.fasta, KOG0002.fasta, and so on. I think I have it figured out except how to reference the source files that I want. Can anyone help me out?
Code:
find . -maxdepth 2 -type f -name "KOG[0-9][0-9][0-9][0-9]*_prot_hits.fasta" -printf "%p\n" | awk -F"_" '{print $1}' | sed 's/$/&.fasta/g' awk -F"/" 'BEGIN {filename=$3 while((getline line < **original_source_files** ) > 0) {print line >> filename} close(filename)}'
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.