Quote:
Originally Posted by qweeak
Some of the commands output cann't be correctly piped.
|
Yes they can, if you use ASCII NUL (zero byte) as the separator. For example,
Code:
find . -mindepth 1 -maxdepth 1 -print0 | xargs -r0 command...
will run
command... splitting the directory item names in the current directory in as many groups as necessary. It will work for all possible file names, and all possible command-line arguments, regardless of what characters they may contain. (You should make sure you run that using the C or POSIX locale, i.e. set both
LANG and
LC_ALL environment variables to
C. That way the commands will work correctly
regardless of the character set used for the file names, even when more than one character set is used.)
If you have multiple parallel data streams, you will have to use temporary files. There, too, you'll need to use NULs as separators to support all possible strings.
For binary data, you'll need to use a separate file for each logical parameter.
You see, all POSIX systems use the NUL (
\0) as an end-of-string mark, internally. It is the only byte value you cannot use in a parameter string given to a syscall. (Binary data is only supported by specific syscalls, and even then, the length of the binary data is always exactly specified. In the command line, such binary data is always specified as encoded strings -- for example, consider IP addresses, MAC addresses, device numbers, and so on. Thus, you almost always need to only worry about passing string data correctly.)
The problem, of course, is that not all shells and utilities support NUL separators at all. (For example, I don't know of any shell that supports NUL as the internal field separator,
IFS.)
Some commands, such as GNU tar ([FONT=Monospace]--null -T
filename) can read file names from a file instead of command line, using NUL separators. This is important when you need to supply more parameters to a single command than the kernel might allow. Again, this is an extra feature only available in some commands -- although in general, it is actually quite simple to write a patch to add this functionality to most utilities.
Quote:
Originally Posted by qweeak
Which is the best shell for this kind of operation involving huge no of arguments ?
|
Bash, because Bash
read built-in command supports NUL separator internally (
-d ""). GNU find has
-print0 and supports also
-printf "stuff\0" so you can use it for everything regarding directories and files. GNU awk (
gawk) also supports
RS="\0" and
FS="\0" so it can be used to very easily filter result lists using NUL separators.
Here is a contrived example:
Code:
#!/bin/bash
# Make sure find et al. ignore file name character set, and Just Work.
export LANG=C LC_ALL=C
# "$WORK": automatically removed temporary directory for temp files.
WORK="$(mktemp -d)" || exit $?
trap "rm -rf '$WORK'" EXIT
( if [ $# -gt 0 ]; then
# Supply all command line parameters to find
find "$@" -print0
else
# No command line parameters, so do a default find
find . -print0
fi
) | while read -rd "" FILE ; do
# Do something with each "$FILE".
if file "$FILE" | grep -qe executable ; then
printf '%s\0' "$FILE"
fi
done > "$WORK/executables"
# List all files containing executable content,
# regardless of whether the files are marked executable or not.
xargs -r0 ls -ld < "$WORK/executables"
There are better ways to do the things the above scriptlet does. I only wanted to illustrate using a subshell as part of a pipe, a while loop to read (and also to emit) strings with NUL separators, and how to use a temporary file (in an automatically removed temporary directory) to store NUL-separated strings. These are usually enough to handle most situations.
(I considered using
tar -czf - --null -T "$WORK/executable" as the last line instead; it would have emitted a gzipped tarball containing all executable files, even if there were millions of them -- certainly more than you can ever supply to a single command using command-line parameters.)
Because you provided no specifics, my answer is a bit more vague than I'd really prefer. If you can provide a little more details -- like whether the number of parameters is the actual problem, or if the parameters may contain characters that are easily mangled by shells, or if you need more than one parallel data channel --, I might be able to help you more.