LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Merging many files as one (https://www.linuxquestions.org/questions/linux-newbie-8/merging-many-files-as-one-897357/)

niharikaananth 08-15-2011 02:58 AM

Merging many files as one
 
Hi.....I have around 71 files in a folder. I wanted to merge all these files as one. The file's numbers are numerically i.e 1_file.txt, 2_something.txt, 3_someohtername.txt & so on upto 71_lastfile.txt. I ran "cat * > ../notes.txt", but it is not coming serially, I mean if I run any command to merge all of these files as a one file, that the new file content should be started serially i.e 1_*.txt, 2_*.txt, 3_*.txt and so on. So could anybody guide me how can I merge all 71 files as a one file with serially. Otherwise I will have copy-paste single-single file which consume more time.

sycamorex 08-15-2011 03:22 AM

Try >> instead of >
This will append to a file.

grail 08-15-2011 06:05 AM

If I understand correctly the issue will be the order they are being passed to cat. Try using a sort did get them in the correct order and then pass them to cat.

TobiSGD 08-15-2011 06:09 AM

Assuming that you are using Bash 3.0 or newer:
Code:

for i in {1..71}
  do
    cat $i*.txt >> newfile
  done


sycamorex 08-15-2011 06:13 AM

Quote:

Originally Posted by grail (Post 4443362)
If I understand correctly the issue will be the order they are being passed to cat. Try using a sort did get them in the correct order and then pass them to cat.

Ooops, I forgot about that.

David the H. 08-15-2011 01:07 PM

I'm assuming that the problem you're experiencing is that filename globbing is based on dictionary sorting, so you get sequences like this:

10 11 12... 18 19 1 20 21 ... 69 70 71 8 9

The only ways to get around this are to use sort with the numerical sorting option, some other technique for matching the actual sequence, like TobiSGD offered, or else to rename your files so that they are all zero-padded (or otherwise in alphanumeric order). I generally prefer the last myself, as it solves the problem permanently.

There are several batch renaming utilities out there for cleaning up filenames, and the topic comes up here regularly, so search around a bit. But here's a quick script I just whipped up that can handle simple jobs.
Code:

#!/bin/bash

shopt -s extglob    #needed for zero-stripping below

#loop through the files given to the script
#(you can use a glob, like "*.txt")
#defaults to globbing everything in the PWD
for file in ${@:-$PWD/*} ; do

    #ignore any files without numbers
    [[ $file != *[0-9]* ]] && continue

    #break the filename into (prefix)-(number)-(suffix).
    #the substrings are stored in the BASH_REMATCH array
    [[ $file =~ ([^[0-9]*)([0-9]+)(.*) ]]
                                           
    #pad the number (2 digits by default).  strip any existing
    #leading zeroes first, or bash will treat them as octal
    printf -v numpad "%02d" "${BASH_REMATCH[2]##*(0)}"

    #build the new filename
    newfile="${BASH_REMATCH[1]}${numpad}${BASH_REMATCH[3]}"

    #confirm the result. remove the echo to rename
    echo mv "$file" "$newfile"

done

exit 0

This assumes that the names have only a single number sequence in them. It separates the number string from the non-number strings before and after it, and pads it to 2 places (simply change "%02d" if you want more). Then it reassembles the pieces into a new filename. You can give it a list of files, or else it defaults to everything in the present working directory.

Finally, be careful with it. It might have unforeseen side-effects, so I've disabled the actual renaming operation. Don't remove the echo at the end until you've confirmed that it works.


All times are GMT -5. The time now is 08:37 PM.