[SOLVED] Command to merge several file as one

loveulinux · 09-24-2011, 01:30 AM

Hi...All, there are nearly 70 files(notes) in a folder which belongs to one particular course and the file's names are not arranged numerically when I do ls command. The file's names are like 1_some_name.txt, 2_some_other_name.txt 3_some_different_name.txt and so on up to 69_some_last_filename.txt. I merged all files as one using below command.

"ls | sort -n | for i in `awk '{print $1}'`; do cat $i >> ../Newfile.txt; done"

It worked great and seems above command has not missed any of the file.

When I opened Newfile.txt found it is some difficult to find file-wise since either there is no gap or no any main heading between the files. So I just wanted that the file names(1_some_name.txt, 2_some_other_name.txt) only should be heading between every files and at least 3 lines gap should be in between each and every file. So could anybody please tell me how it can be possible.

allend · 09-24-2011, 02:17 AM

Just add an echo command in the loop

Code:

do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done

It is considered poor practice to parse the output of the ls command as the output from ls can vary.

Perhaps

Code:

for i in ?_*.txt ; do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done

then

Code:

for i in ??_*.txt ; do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done

would do what you want.

David the H. · 09-24-2011, 03:55 PM

I personally think it's better to start by ensuring that the filenames are all zero padded to the same depth. Then it becomes safer and easier to use simple globbing and a loop, like the one given above. The shell will do all the sorting for you.

Code:

for i in [0-9]*.txt ; do echo -e "\n\n\n$i"; cat "$i" >> ../Newfile.txt; done

I wrote up a script a few months ago to automate the zero-padding of filenames. It was very simple at first, but I've since fleshed it out.

Code:

#!/bin/bash

# Pads numbers in file names if found.
# See help message for more.

# Set the environment
shopt -s extglob
IFS=''

BCYAN="${BCYAN:-\e[1;36m}"    #Define text color codes, for prettified output.
RESET="${RESET:-\e[0m}"	      #Overridden by environment defaults if they exist.

#set default padding level
pad=2

# Set up the help dialog
help+=( '' 																					)
help+=( '\tA quick script to zero-pad files that contain numbers.' 										)
help+=( '\tIt will only pad the first number string it finds in the name, and ignores files without numbers.'		)
help+=( '' 																					)
help+=( "\tUsage: \t${BCYAN}${0##*/} [-n <num>] <files>${RESET}"											)
help+=( "\t\t${BCYAN}${0##*/} -h${RESET}" 															)
help+=( '' 																					)
help+=( '\tUse -n to specify the number of digits to pad, from 2-9 digits.  Defaults to '"$pad"' if not used.'	)
help+=( '\tIf no files are given, it processes the current directory.' 									)
help+=( '' 																					)


# Process input options
# "-h" : print help & exit.
# "-n" : test for valid input value and update "pad" variable
while getopts ":hn:" opt; do

     case "$opt" in

          h) IFS=$'\n'
             echo -e "${help[*]}" >&2
             exit "2"
          ;;

          n)	if [[ "$OPTARG" =~ [^[:digit:]] ]] || (( "10#$OPTARG" < 2 )) || (( "10#$OPTARG" > 9 )); then
				echo
				echo -e "${BCYAN}invalid option: [$OPTARG].${RESET}" >&2
				echo -e "${BCYAN}-n must be an integer from 2 to 9${RESET}" >&2
				echo -e "${BCYAN}Falling back to the default of $pad${RESET}"
				echo
			else
				pad="$(( 10#$OPTARG ))"
			fi
          ;;

          \?) echo -e "${BCYAN}Invalid option: [-$OPTARG].  Ignoring.${RESET}" >&2
          ;;
     esac
done

shift $(( OPTIND - 1 )) ; OPTIND=1

# Now check for files in the input.
# If nothing given, set input parameters to files in current directory.
if [[ -z "$*" ]]; then
	set -- ./*
fi

# Process files in input parameters
for file in "$@" ; do

	# Ignore files without digits
	[[ "$file" != *[0-9]* ]] && continue

	# Split filename into prefix-digits-suffix
	[[ "$file" =~ ([^[0-9]*)([0-9]+)(.*) ]]

	# Pad digits to desired width
	printf -v numpad "%0*d" "$pad" "${BASH_REMATCH[2]##*(0)}"

	# Add old and new filenames to arrays for final processing
	oldfile+=( "$file" )
	newfile+=( "${BASH_REMATCH[1]}${numpad}${BASH_REMATCH[3]}" )

done

# If there are any files to rename, ask to confirm the operation.
# And rename if confirmed.
if [[ -n "${oldfile[*]}" ]]; then

	echo
	echo -e "${BCYAN}Rename the following files?${RESET}"
	echo
	for i in "${!oldfile[@]}" ; do
		echo -e "${oldfile[i]/#$PWD/.}\t-->\t${newfile[i]/#$PWD/.}"
	done
	echo

	read -p "(y/n): "
	echo

	case "$REPLY" in

		y|Y*)  	for i in "${!oldfile[@]}" ; do
					mv -n "${oldfile[i]}" "${newfile[i]}"
				done
				echo
				;;

		*)		echo -e "${BCYAN}Aborting.${RESET}"
				echo
				exit 1
				;;
	esac

# Otherwise just exit.
else
	echo
	echo -e "${BCYAN}No files to rename.${RESET}" >&2
	echo -e "${BCYAN}Exiting.${RESET}" >&2
	echo
	exit 1

fi

exit 0

loveulinux · 10-01-2011, 12:44 PM

Sorry for the late response. And I am sorry if my poor English is not able to define that what I need. I ran below command but still filenames are not coming as heading in the Newfile.txt before file's content starts.

"ls | sort -n | for i in `awk '{print $1}'`; do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done"

It added all file's contents as filewise(from 1 to 69) in Newfile.txt but I need these below each and every output to be heading before each and every file content in the Newfile.txt. It is just echo the 69 filenames in the output for above command and not adding those filenames to the file. see the below output

64_SSH_Introduction.txt

65_IPTables.txt

66_IPv6_IPTables.txt

67_NMap_Introduction.txt

68_Nessus_Introduction.txt

69_Snort_Sniffer_Logger.txt

This is what I needed as to be heading i.e 1_first_file.txt should be as heading in the first line of Newfile.txt and the content of 1_first_file.txt should come after this heading, then 2_second_file.txt as heading after first file's content, then content of 2_second_file.txt and so on upto 69_Snort_Sniffer_Logger.txt

I removed Newfile.txt and ran the below command inside the same directory.

"for i in ?_*.txt ; do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done"

found below output and found empty Newfile.txt when I cat it.
"; cat $i >> ../Newfile.txt; done

?_*.txt
cat: ?_*.txt: No such file or directory"

Then I ran below command
"for i in ??_*.txt ; do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done"
and found there is no difference between "ls | sort -n | for i in `awk '{print $1}'`; do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done" and above command. I mean only all 69 file's contents are coming serially but not heading along with content which is explained in previous paragraph. So could you please show how to modify this command.

Quote:

Originally Posted by allend

Just add an echo command in the loop

Code:

do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done

It is considered poor practice to parse the output of the ls command as the output from ls can vary.

Perhaps

Code:

for i in ?_*.txt ; do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done

then

Code:

for i in ??_*.txt ; do echo -e "\n\n\n$i"; cat $i >> ../Newfile.txt; done

would do what you want.

loveulinux · 10-01-2011, 01:01 PM

Quote:

Originally Posted by David the H.

I personally think it's better to start by ensuring that the filenames are all zero padded to the same depth. Then it becomes safer and easier to use simple globbing and a loop, like the one given above. The shell will do all the sorting for you.

Code:

for i in [0-9]*.txt ; do echo -e "\n\n\n$i"; cat "$i" >> ../Newfile.txt; done

Sorry for the late response and thanks for the reply. I tried this and found it is serially adding the all 69 file's content to Newfile.txt but not the file's name as the heading before each and every file's content. Please find the another post in the same thread for more details.

I wrote up a script a few months ago to automate the zero-padding of filenames. It was very simple at first, but I've since fleshed it out.

Code:

#!/bin/bash

# Pads numbers in file names if found.
# See help message for more.

# Set the environment
shopt -s extglob
IFS=''

BCYAN="${BCYAN:-\e[1;36m}"    #Define text color codes, for prettified output.
RESET="${RESET:-\e[0m}"	      #Overridden by environment defaults if they exist.

#set default padding level
pad=2

# Set up the help dialog
help+=( '' 																					)
help+=( '\tA quick script to zero-pad files that contain numbers.' 										)
help+=( '\tIt will only pad the first number string it finds in the name, and ignores files without numbers.'		)
help+=( '' 																					)
help+=( "\tUsage: \t${BCYAN}${0##*/} [-n <num>] <files>${RESET}"											)
help+=( "\t\t${BCYAN}${0##*/} -h${RESET}" 															)
help+=( '' 																					)
help+=( '\tUse -n to specify the number of digits to pad, from 2-9 digits.  Defaults to '"$pad"' if not used.'	)
help+=( '\tIf no files are given, it processes the current directory.' 									)
help+=( '' 																					)


# Process input options
# "-h" : print help & exit.
# "-n" : test for valid input value and update "pad" variable
while getopts ":hn:" opt; do

     case "$opt" in

          h) IFS=$'\n'
             echo -e "${help[*]}" >&2
             exit "2"
          ;;

          n)	if [[ "$OPTARG" =~ [^[:digit:]] ]] || (( "10#$OPTARG" < 2 )) || (( "10#$OPTARG" > 9 )); then
				echo
				echo -e "${BCYAN}invalid option: [$OPTARG].${RESET}" >&2
				echo -e "${BCYAN}-n must be an integer from 2 to 9${RESET}" >&2
				echo -e "${BCYAN}Falling back to the default of $pad${RESET}"
				echo
			else
				pad="$(( 10#$OPTARG ))"
			fi
          ;;

          \?) echo -e "${BCYAN}Invalid option: [-$OPTARG].  Ignoring.${RESET}" >&2
          ;;
     esac
done

shift $(( OPTIND - 1 )) ; OPTIND=1

# Now check for files in the input.
# If nothing given, set input parameters to files in current directory.
if [[ -z "$*" ]]; then
	set -- ./*
fi

# Process files in input parameters
for file in "$@" ; do

	# Ignore files without digits
	[[ "$file" != *[0-9]* ]] && continue

	# Split filename into prefix-digits-suffix
	[[ "$file" =~ ([^[0-9]*)([0-9]+)(.*) ]]

	# Pad digits to desired width
	printf -v numpad "%0*d" "$pad" "${BASH_REMATCH[2]##*(0)}"

	# Add old and new filenames to arrays for final processing
	oldfile+=( "$file" )
	newfile+=( "${BASH_REMATCH[1]}${numpad}${BASH_REMATCH[3]}" )

done

# If there are any files to rename, ask to confirm the operation.
# And rename if confirmed.
if [[ -n "${oldfile[*]}" ]]; then

	echo
	echo -e "${BCYAN}Rename the following files?${RESET}"
	echo
	for i in "${!oldfile[@]}" ; do
		echo -e "${oldfile[i]/#$PWD/.}\t-->\t${newfile[i]/#$PWD/.}"
	done
	echo

	read -p "(y/n): "
	echo

	case "$REPLY" in

		y|Y*)  	for i in "${!oldfile[@]}" ; do
					mv -n "${oldfile[i]}" "${newfile[i]}"
				done
				echo
				;;

		*)		echo -e "${BCYAN}Aborting.${RESET}"
				echo
				exit 1
				;;
	esac

# Otherwise just exit.
else
	echo
	echo -e "${BCYAN}No files to rename.${RESET}" >&2
	echo -e "${BCYAN}Exiting.${RESET}" >&2
	echo
	exit 1

fi

exit 0

I copied as it is to test file in the same directory where all 69 files are stored and executed. Found the script works great. But since I am beginner in shell script, I neither able to modify it nor understanding. I did not find Newfile.txt and found the output as below for all files.
mv: `./68_Nessus_Introduction.txt' and `./68_Nessus_Introduction.txt' are the same file
mv: `./69_Snort_Sniffer_Logger.txt' and `./69_Snort_Sniffer_Logger.txt' are the same file

grail · 10-01-2011, 01:36 PM

Well, as you know there are 70 files each starting with a number, how about something like:

Code:

#!/bin/bash

exec 6>&1 >Newfile.txt

for FILE in {1..70}_*
do
    if [[ -e $FILE ]]
    then
        echo -e "$FILE\n"
        cat "$FILE"
        echo
    else
        break
    fi
done

exec >&6 6>&-

allend · 10-01-2011, 09:08 PM

I see the problem with what I originally posted. Forgot to redirect the output of echo.

Code:

for i in ?_*.txt ; do echo -e "\n\n\n$i" >> ../Newfile.txt; cat $i >> ../Newfile.txt; done

grail's solution is better. A neat answer to the numbering problem.

loveulinux · 10-02-2011, 12:25 AM

Super, I tried "??" instead of "?" It gave the output what I exactly wanted. With single question mark("?") I got the below error
cat: ?_*.txt: No such file or directory.
I just wanted to clarify one thing. "\n\n\n" is nothing but the 3 lines gap between files. Isn't it?
Anyway thank you master for giving some idea to use advanced commands like ??_*.txt instead ls, sort etc.

Quote:

Originally Posted by allend

I see the problem with what I originally posted. Forgot to redirect the output of echo.

Code:

for i in ?_*.txt ; do echo -e "\n\n\n$i" >> ../Newfile.txt; cat $i >> ../Newfile.txt; done

grail's solution is better. A neat answer to the numbering problem.

loveulinux · 10-02-2011, 12:28 AM

Thanks for your suggestion with shell script. I copied the script as it is to test file inside the same directory and executed. Found "0" for the status "echo $?" and Newfile.txt file is created in the same directory but it is empty when I cat it.

Quote:

Originally Posted by grail

Well, as you know there are 70 files each starting with a number, how about something like:

Code:

#!/bin/bash

exec 6>&1 >Newfile.txt

for FILE in {1..70}_*
do
    if [[ -e $FILE ]]
    then
        echo -e "$FILE\n"
        cat "$FILE"
        echo
    else
        break
    fi
done

exec >&6 6>&-

allend · 10-02-2011, 01:32 AM

Quote:

Super, I tried "??" instead of "?" It gave the output what I exactly wanted. With single question mark("?") I got the below error
cat: ?_*.txt: No such file or directory.
I just wanted to clarify one thing. "\n\n\n" is nothing but the 3 lines gap between files. Isn't it?

If you have used David the H.'s script, then you have likely changed the filenames from , say, 1_somename.txt to 01_somename.txt, hence the failed match with a single question mark.
Yes, the \n\n\n sequence simply outputs three new lines.

grail · 10-02-2011, 03:50 AM

Quote:

but it is empty when I cat it.

So to confirm, as I did test it before posting, you are running the script from within the directory where all the files are and the naming conventions:

Code:

1_blah.txt
25_blah.txt
...

If the file are not name N_whatever where N is a number from 1 to 70 then it will not work.

loveulinux · 10-02-2011, 04:52 AM

Thank you all, Let me have a deep look into the David and Grail scripts and then try.

Quote:

Originally Posted by allend

If you have used David the H.'s script, then you have likely changed the filenames from , say, 1_somename.txt to 01_somename.txt, hence the failed match with a single question mark.
Yes, the \n\n\n sequence simply outputs three new lines.