[SOLVED] Recursively naming files in a directory after processing them in a "for" loop

geodave0110 · 11-13-2011, 01:32 AM

Hello All,

I ran across a sed command that removes line breaks in a text file, and I'd like to recursively run this command for a set of files in a directory. But I'd like to retain the name of each text file and add an incrementing number at its end.

Here's what I've done so far:

Code:

#!/bin/bash
for file in *.txt
do
sed -n -e ":a" -e "$ s/\n//gp;N;b a" "$file"
done

When I run this script I basically get each file's content displayed in my terminal the way I want it. But I'm having a hard time figuring out a way to recursively write the new contents to separate files named incrementally.

I'd be happy to provide clearer information if the above doesn't make sense. Any help or suggestions are much appreciated!

PS. Can someone please explain to me what the sed command's steps are doing? I'm new to programming in Linux and would like to learn beyond merely plugging and chugging lines without really knowing their details.

frieza · 11-13-2011, 02:06 AM

try

Code:

let count=0 #set an increment variable
for file in `ls *.txt` #act on output of 'ls *.txt*'
do
filename=`echo $file | cut -d. -f1` #cut off the .txt extension
newfile=$filename$x.txt #add increment variable to new file name
cp -v $file $newfile #copy file to new file name
sed -i -n -e ":a" -e "$ s/\n//gp;N;b a" "$newfile" #perform sed on new file
let count=count+1 #increment increment variable
done

hope this helps

geodave0110 · 11-13-2011, 04:00 AM

Thank you frieza! Your additions worked perfectly. Thanks especially for commenting your script so I know what's happening at each step!

grail · 11-13-2011, 05:11 AM

Well I don't normally like to pick on some's code, but when your already using a superior method it seems necessary:

Code:

for file in *.txt # No word splitting done

for file in `ls *.txt` # Suffers from word splitting and parsing of ls

For more information on both issues listed see Ls Parsing and Word Splitting

I am also curious how you claim:

Quote:

Your additions worked perfectly.

When the following line is wrong:

Code:

newfile=$filename$x.txt #add increment variable to new file name

The increment is not added as 'x' is not the counter but rather the variable 'count'

Also, no need to go through so many steps or outside commands to perform the following:

Code:

filename=`echo $file | cut -d. -f1` #cut off the .txt extension
newfile=$filename$x.txt #add increment variable to new file name

# simply use
newfile=${file%.*}$((count++)).txt

Lastly, if any of your file names have whitespace or special characters, you will need to copy them when used in the cp command.

David the H. · 11-13-2011, 07:42 AM

Recent versions of bash (4.0+) have a new feature allowing globbing through subdirectories.
You can also use a subshell to avoid resetting everything each time.

Code:

#!/bin/bash

# enable ** globbing, and also nullglob, so that it doesn't error on empty directories.
shopt -s globstar nullglob

# set the starting number to count from
c=1

# Now loop through the list of subdirectories produced by **/.  Include the topdir too.
for dir in . **/ ; do

	# run each loop in a subshell ( everything inside (...) )
	# this avoids having to reset everything for each directory
	# when it exits, you're back at the starting directory with all variables at their initial values

	(
		cd "${dir}"
		echo "Now processing [$dir]"

		#loop through each file in the directory
		for file in * ; do
			sed -i -n -e ":a" -e "$ s/\n//gp;N;b a" "$file" "${file%.*}-$(( c++ )).txt"
		done
	)

	echo
done

exit 0

Nominal Animal · 11-13-2011, 03:04 PM

I would personally use find; the Bash globbing rules (especially wrt. files that start with a dot) may not do exactly what one might expect. Find is more clear-cut.

Code:

#!/bin/bash

# In and UTF-8 locale, invalid byte sequences may halt execution.
# Avoid that by explicitly using a C/POSIX locale.
export LC_ALL=C LANG=C

# First command line parameter must exist.
if [ $# -lt 1 ] || [ ! -e "$1" ]; then
    echo "Usage: $0 directories-or-files..." >&2
    exit 1
fi

find "$@" -type f -print0 | while read -rd "" FILE ; do
    # Verify file still exists.
    [ -f "$FILE" ] || continue

    # Check if the file name ends with a version number. Skip if so.
    INDEX="${FILE##*.}"
    [ -n "$INDEX" -a "$FILE" != "$INDEX" -a -z "${INDEX//[0-9]/}" ] && continue

    # Find the first unused index, starting with 1.
    INDEX=1
    NEWFILE="$FILE.$INDEX"
    while [ -e "$NEWFILE" ]; do
        INDEX=$[INDEX+1]
        NEWFILE="$FILE.$INDEX"
    done

    # Copy $FILE to $NEWFILE, removing all newlines.
    tr -d '\r\n' <"$FILE" >"$NEWFILE" || exit $?

    # If possible, retain mode and owner.
    chown --reference="$FILE" "$NEWFILE" &>/dev/null
    chmod --reference="$FILE" "$NEWFILE" &>/dev/null

    # Let the user know.
    echo "$FILE: ${NEWFILE##*/}"
done

This one will skip files that have a numeric suffix.
For other files, it will find the first numeric suffix (starting at .1) that does not exist yet.
It will then use tr to strip newlines from the original file, saving the result as the new file.
It will attempt to retain the owner and mode, but quietly.
Finally, it will output the path to the old file, and the new file name, if successful.

If you run it without command-line parameters, or the first parameter is not an existing file or directory, it will output simple help and abort.

For testing, comment out the tr, chown, and chmod lines by adding a # at the beginning of those three lines. That way the script will just say what file names it would use, but not actually create any new files (nor modify existing ones).

frieza · 11-14-2011, 12:06 PM

Quote:

Originally Posted by grail

I am also curious how you claim:

When the following line is wrong:

Code:

newfile=$filename$x.txt #add increment variable to new file name

The increment is not added as 'x' is not the counter but rather the variable 'count'

Also, no need to go through so many steps or outside commands to perform the following:
[code]

d'oh, my bad, you're absolutely right on that count, as for the rest, well that's the best i could pull off with my knowledge of bash scripting, i'll be the first to admit it was a crude solution, minus the typo