LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Command to merge several file as one (https://www.linuxquestions.org/questions/linux-newbie-8/command-to-merge-several-file-as-one-904738/)

loveulinux 09-24-2011 01:30 AM

Command to merge several file as one
 
Hi...All, there are nearly 70 files(notes) in a folder which belongs to one particular course and the file's names are not arranged numerically when I do ls command. The file's names are like 1_some_name.txt, 2_some_other_name.txt 3_some_different_name.txt and so on up to 69_some_last_filename.txt. I merged all files as one using below command.

"ls | sort -n | for i in `awk '{print $1}'`; do cat $i >> ../Newfile.txt; done"

It worked great and seems above command has not missed any of the file.

When I opened Newfile.txt found it is some difficult to find file-wise since either there is no gap or no any main heading between the files. So I just wanted that the file names(1_some_name.txt, 2_some_other_name.txt) only should be heading between every files and at least 3 lines gap should be in between each and every file. So could anybody please tell me how it can be possible.

allend 09-24-2011 02:17 AM

Just add an echo command in the loop
Code:

do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done
It is considered poor practice to parse the output of the ls command as the output from ls can vary.

Perhaps
Code:

for i in ?_*.txt ; do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done
then
Code:

for i in ??_*.txt ; do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done
would do what you want.

David the H. 09-24-2011 03:55 PM

I personally think it's better to start by ensuring that the filenames are all zero padded to the same depth. Then it becomes safer and easier to use simple globbing and a loop, like the one given above. The shell will do all the sorting for you.

Code:


for i in [0-9]*.txt ; do echo -e "\n\n\n$i"; cat "$i" >> ../Newfile.txt; done

I wrote up a script a few months ago to automate the zero-padding of filenames. It was very simple at first, but I've since fleshed it out. :cool:

Code:

#!/bin/bash

# Pads numbers in file names if found.
# See help message for more.

# Set the environment
shopt -s extglob
IFS=''

BCYAN="${BCYAN:-\e[1;36m}"    #Define text color codes, for prettified output.
RESET="${RESET:-\e[0m}"              #Overridden by environment defaults if they exist.

#set default padding level
pad=2

# Set up the help dialog
help+=( ''                                                                                                                                                                        )
help+=( '\tA quick script to zero-pad files that contain numbers.'                                                                                )
help+=( '\tIt will only pad the first number string it finds in the name, and ignores files without numbers.'                )
help+=( ''                                                                                                                                                                        )
help+=( "\tUsage: \t${BCYAN}${0##*/} [-n <num>] <files>${RESET}"                                                                                        )
help+=( "\t\t${BCYAN}${0##*/} -h${RESET}"                                                                                                                        )
help+=( ''                                                                                                                                                                        )
help+=( '\tUse -n to specify the number of digits to pad, from 2-9 digits.  Defaults to '"$pad"' if not used.'        )
help+=( '\tIf no files are given, it processes the current directory.'                                                                        )
help+=( ''                                                                                                                                                                        )


# Process input options
# "-h" : print help & exit.
# "-n" : test for valid input value and update "pad" variable
while getopts ":hn:" opt; do

    case "$opt" in

          h) IFS=$'\n'
            echo -e "${help[*]}" >&2
            exit "2"
          ;;

          n)        if [[ "$OPTARG" =~ [^[:digit:]] ]] || (( "10#$OPTARG" < 2 )) || (( "10#$OPTARG" > 9 )); then
                                echo
                                echo -e "${BCYAN}invalid option: [$OPTARG].${RESET}" >&2
                                echo -e "${BCYAN}-n must be an integer from 2 to 9${RESET}" >&2
                                echo -e "${BCYAN}Falling back to the default of $pad${RESET}"
                                echo
                        else
                                pad="$(( 10#$OPTARG ))"
                        fi
          ;;

          \?) echo -e "${BCYAN}Invalid option: [-$OPTARG].  Ignoring.${RESET}" >&2
          ;;
    esac
done

shift $(( OPTIND - 1 )) ; OPTIND=1

# Now check for files in the input.
# If nothing given, set input parameters to files in current directory.
if [[ -z "$*" ]]; then
        set -- ./*
fi

# Process files in input parameters
for file in "$@" ; do

        # Ignore files without digits
        [[ "$file" != *[0-9]* ]] && continue

        # Split filename into prefix-digits-suffix
        [[ "$file" =~ ([^[0-9]*)([0-9]+)(.*) ]]

        # Pad digits to desired width
        printf -v numpad "%0*d" "$pad" "${BASH_REMATCH[2]##*(0)}"

        # Add old and new filenames to arrays for final processing
        oldfile+=( "$file" )
        newfile+=( "${BASH_REMATCH[1]}${numpad}${BASH_REMATCH[3]}" )

done

# If there are any files to rename, ask to confirm the operation.
# And rename if confirmed.
if [[ -n "${oldfile[*]}" ]]; then

        echo
        echo -e "${BCYAN}Rename the following files?${RESET}"
        echo
        for i in "${!oldfile[@]}" ; do
                echo -e "${oldfile[i]/#$PWD/.}\t-->\t${newfile[i]/#$PWD/.}"
        done
        echo

        read -p "(y/n): "
        echo

        case "$REPLY" in

                y|Y*)          for i in "${!oldfile[@]}" ; do
                                        mv -n "${oldfile[i]}" "${newfile[i]}"
                                done
                                echo
                                ;;

                *)                echo -e "${BCYAN}Aborting.${RESET}"
                                echo
                                exit 1
                                ;;
        esac

# Otherwise just exit.
else
        echo
        echo -e "${BCYAN}No files to rename.${RESET}" >&2
        echo -e "${BCYAN}Exiting.${RESET}" >&2
        echo
        exit 1

fi

exit 0


loveulinux 10-01-2011 12:44 PM

Command to merge several file as one
 
Sorry for the late response. And I am sorry if my poor English is not able to define that what I need. I ran below command but still filenames are not coming as heading in the Newfile.txt before file's content starts.

"ls | sort -n | for i in `awk '{print $1}'`; do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done"

It added all file's contents as filewise(from 1 to 69) in Newfile.txt but I need these below each and every output to be heading before each and every file content in the Newfile.txt. It is just echo the 69 filenames in the output for above command and not adding those filenames to the file. see the below output

64_SSH_Introduction.txt



65_IPTables.txt



66_IPv6_IPTables.txt



67_NMap_Introduction.txt



68_Nessus_Introduction.txt



69_Snort_Sniffer_Logger.txt

This is what I needed as to be heading i.e 1_first_file.txt should be as heading in the first line of Newfile.txt and the content of 1_first_file.txt should come after this heading, then 2_second_file.txt as heading after first file's content, then content of 2_second_file.txt and so on upto 69_Snort_Sniffer_Logger.txt

I removed Newfile.txt and ran the below command inside the same directory.

"for i in ?_*.txt ; do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done"

found below output and found empty Newfile.txt when I cat it.
"; cat $i >> ../Newfile.txt; done



?_*.txt
cat: ?_*.txt: No such file or directory"

Then I ran below command
"for i in ??_*.txt ; do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done"
and found there is no difference between "ls | sort -n | for i in `awk '{print $1}'`; do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done" and above command. I mean only all 69 file's contents are coming serially but not heading along with content which is explained in previous paragraph. So could you please show how to modify this command.



Quote:

Originally Posted by allend (Post 4480783)
Just add an echo command in the loop
Code:

do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done
It is considered poor practice to parse the output of the ls command as the output from ls can vary.

Perhaps
Code:

for i in ?_*.txt ; do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done
then
Code:

for i in ??_*.txt ; do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done
would do what you want.


loveulinux 10-01-2011 01:01 PM

Command to merge several file as one
 
Quote:

Originally Posted by David the H. (Post 4481311)
I personally think it's better to start by ensuring that the filenames are all zero padded to the same depth. Then it becomes safer and easier to use simple globbing and a loop, like the one given above. The shell will do all the sorting for you.

Code:


for i in [0-9]*.txt ; do echo -e "\n\n\n$i"; cat "$i" >> ../Newfile.txt; done

Sorry for the late response and thanks for the reply. I tried this and found it is serially adding the all 69 file's content to Newfile.txt but not the file's name as the heading before each and every file's content. Please find the another post in the same thread for more details.

I wrote up a script a few months ago to automate the zero-padding of filenames. It was very simple at first, but I've since fleshed it out. :cool:

Code:

#!/bin/bash

# Pads numbers in file names if found.
# See help message for more.

# Set the environment
shopt -s extglob
IFS=''

BCYAN="${BCYAN:-\e[1;36m}"    #Define text color codes, for prettified output.
RESET="${RESET:-\e[0m}"              #Overridden by environment defaults if they exist.

#set default padding level
pad=2

# Set up the help dialog
help+=( ''                                                                                                                                                                        )
help+=( '\tA quick script to zero-pad files that contain numbers.'                                                                                )
help+=( '\tIt will only pad the first number string it finds in the name, and ignores files without numbers.'                )
help+=( ''                                                                                                                                                                        )
help+=( "\tUsage: \t${BCYAN}${0##*/} [-n <num>] <files>${RESET}"                                                                                        )
help+=( "\t\t${BCYAN}${0##*/} -h${RESET}"                                                                                                                        )
help+=( ''                                                                                                                                                                        )
help+=( '\tUse -n to specify the number of digits to pad, from 2-9 digits.  Defaults to '"$pad"' if not used.'        )
help+=( '\tIf no files are given, it processes the current directory.'                                                                        )
help+=( ''                                                                                                                                                                        )


# Process input options
# "-h" : print help & exit.
# "-n" : test for valid input value and update "pad" variable
while getopts ":hn:" opt; do

    case "$opt" in

          h) IFS=$'\n'
            echo -e "${help[*]}" >&2
            exit "2"
          ;;

          n)        if [[ "$OPTARG" =~ [^[:digit:]] ]] || (( "10#$OPTARG" < 2 )) || (( "10#$OPTARG" > 9 )); then
                                echo
                                echo -e "${BCYAN}invalid option: [$OPTARG].${RESET}" >&2
                                echo -e "${BCYAN}-n must be an integer from 2 to 9${RESET}" >&2
                                echo -e "${BCYAN}Falling back to the default of $pad${RESET}"
                                echo
                        else
                                pad="$(( 10#$OPTARG ))"
                        fi
          ;;

          \?) echo -e "${BCYAN}Invalid option: [-$OPTARG].  Ignoring.${RESET}" >&2
          ;;
    esac
done

shift $(( OPTIND - 1 )) ; OPTIND=1

# Now check for files in the input.
# If nothing given, set input parameters to files in current directory.
if [[ -z "$*" ]]; then
        set -- ./*
fi

# Process files in input parameters
for file in "$@" ; do

        # Ignore files without digits
        [[ "$file" != *[0-9]* ]] && continue

        # Split filename into prefix-digits-suffix
        [[ "$file" =~ ([^[0-9]*)([0-9]+)(.*) ]]

        # Pad digits to desired width
        printf -v numpad "%0*d" "$pad" "${BASH_REMATCH[2]##*(0)}"

        # Add old and new filenames to arrays for final processing
        oldfile+=( "$file" )
        newfile+=( "${BASH_REMATCH[1]}${numpad}${BASH_REMATCH[3]}" )

done

# If there are any files to rename, ask to confirm the operation.
# And rename if confirmed.
if [[ -n "${oldfile[*]}" ]]; then

        echo
        echo -e "${BCYAN}Rename the following files?${RESET}"
        echo
        for i in "${!oldfile[@]}" ; do
                echo -e "${oldfile[i]/#$PWD/.}\t-->\t${newfile[i]/#$PWD/.}"
        done
        echo

        read -p "(y/n): "
        echo

        case "$REPLY" in

                y|Y*)          for i in "${!oldfile[@]}" ; do
                                        mv -n "${oldfile[i]}" "${newfile[i]}"
                                done
                                echo
                                ;;

                *)                echo -e "${BCYAN}Aborting.${RESET}"
                                echo
                                exit 1
                                ;;
        esac

# Otherwise just exit.
else
        echo
        echo -e "${BCYAN}No files to rename.${RESET}" >&2
        echo -e "${BCYAN}Exiting.${RESET}" >&2
        echo
        exit 1

fi

exit 0


I copied as it is to test file in the same directory where all 69 files are stored and executed. Found the script works great. But since I am beginner in shell script, I neither able to modify it nor understanding. I did not find Newfile.txt and found the output as below for all files.
mv: `./68_Nessus_Introduction.txt' and `./68_Nessus_Introduction.txt' are the same file
mv: `./69_Snort_Sniffer_Logger.txt' and `./69_Snort_Sniffer_Logger.txt' are the same file

grail 10-01-2011 01:36 PM

Well, as you know there are 70 files each starting with a number, how about something like:
Code:

#!/bin/bash

exec 6>&1 >Newfile.txt

for FILE in {1..70}_*
do
    if [[ -e $FILE ]]
    then
        echo -e "$FILE\n"
        cat "$FILE"
        echo
    else
        break
    fi
done

exec >&6 6>&-


allend 10-01-2011 09:08 PM

I see the problem with what I originally posted. Forgot to redirect the output of echo.
Code:

for i in ?_*.txt ; do echo -e "\n\n\n$i" >> ../Newfile.txt; cat $i >> ../Newfile.txt; done
grail's solution is better. A neat answer to the numbering problem.

loveulinux 10-02-2011 12:25 AM

Command to merge several file as one
 
Super, I tried "??" instead of "?" It gave the output what I exactly wanted. With single question mark("?") I got the below error
cat: ?_*.txt: No such file or directory.
I just wanted to clarify one thing. "\n\n\n" is nothing but the 3 lines gap between files. Isn't it?
Anyway thank you master for giving some idea to use advanced commands like ??_*.txt instead ls, sort etc.

Quote:

Originally Posted by allend (Post 4487634)
I see the problem with what I originally posted. Forgot to redirect the output of echo.
Code:

for i in ?_*.txt ; do echo -e "\n\n\n$i" >> ../Newfile.txt; cat $i >> ../Newfile.txt; done
grail's solution is better. A neat answer to the numbering problem.


loveulinux 10-02-2011 12:28 AM

Command to merge several file as one
 
Thanks for your suggestion with shell script. I copied the script as it is to test file inside the same directory and executed. Found "0" for the status "echo $?" and Newfile.txt file is created in the same directory but it is empty when I cat it.


Quote:

Originally Posted by grail (Post 4487395)
Well, as you know there are 70 files each starting with a number, how about something like:
Code:

#!/bin/bash

exec 6>&1 >Newfile.txt

for FILE in {1..70}_*
do
    if [[ -e $FILE ]]
    then
        echo -e "$FILE\n"
        cat "$FILE"
        echo
    else
        break
    fi
done

exec >&6 6>&-



allend 10-02-2011 01:32 AM

Quote:

Super, I tried "??" instead of "?" It gave the output what I exactly wanted. With single question mark("?") I got the below error
cat: ?_*.txt: No such file or directory.
I just wanted to clarify one thing. "\n\n\n" is nothing but the 3 lines gap between files. Isn't it?
If you have used David the H.'s script, then you have likely changed the filenames from , say, 1_somename.txt to 01_somename.txt, hence the failed match with a single question mark.
Yes, the \n\n\n sequence simply outputs three new lines.

grail 10-02-2011 03:50 AM

Quote:

but it is empty when I cat it.
So to confirm, as I did test it before posting, you are running the script from within the directory where all the files are and the naming conventions:
Code:

1_blah.txt
25_blah.txt
...

If the file are not name N_whatever where N is a number from 1 to 70 then it will not work.

loveulinux 10-02-2011 04:52 AM

Command to merge several file as one
 
Thank you all, Let me have a deep look into the David and Grail scripts and then try.

Quote:

Originally Posted by allend (Post 4487717)
If you have used David the H.'s script, then you have likely changed the filenames from , say, 1_somename.txt to 01_somename.txt, hence the failed match with a single question mark.
Yes, the \n\n\n sequence simply outputs three new lines.



All times are GMT -5. The time now is 10:42 PM.