If you want to support all possible file names, and optionally scan in subdirectories too, use something like
Code:
# List of subtrees to search for
# (this defaults to all command-line parameters, if used in a script)
trees=("$@")
# Make sure locale does not affect file name handling
OLD_LC_ALL="$LC_ALL" OLD_LANG="$LANG"
LC_ALL=C LANG=C
# Array containing all files ending with _extracted into list1
list1=()
while read -rd "" file ; do
list1[${#list1[@]}]="$file"
done < <( find "${trees[@]}" -maxdepth 1 -type f -name '*_extracted' -print0 )
# Array containing all files ending with .roi into list2
list2=()
while read -rd "" file ; do
list2[${#list2[@]}]="$file"
done < <( find "${trees[@]}" -maxdepth 1 -type f -name '*.roi' -print0 )
# Restore locale
LANG="$OLD_LANG" LC_ALL="$OLD_LC_ALL"
If you want files in subdirectories too, just omit the
-maxdepth 1 parameters to the find commands.
This uses ASCII NULs as separators. Since the Linux kernel uses them as well to indicate the end of a string (say a pathname), this will work for all possible file names.
The locale must be set to POSIX (
LANG=C LC_ALL=C) because in UTF-8 locales, non-UTF-8 sequences (like say filenames using cp1252 character set) are an error and cause the commands to abort. Using the POSIX locale makes sure all file names are considered just opaque cookies, no matter what characters in which charset the names might contain.
A funny side note I just noticed:
Starting with an empty array,
list=(), the following two lines are equivalent, and append
$new as a new element to the array:
Code:
list=("${list[@]}" "$new")
list[${#list[@]}]="$new"
The latter one is several magnitudes faster. On my machine, the former one takes 330 times longer to run for 2000 files!
Hope this helps,