LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-16-2012, 01:23 AM   #16
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193

Cheers David and NA, once again reminding / showing ways I had forgotten / didn't know
 
Old 04-16-2012, 06:26 PM   #17
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by tuxdev View Post
It's really quite a huge reason, though. sed, find, and tar have POSIX mandated behavior (though tar has some weirdities). ps is actually not recommended for largely the same reasons as ls: it's meant for *humans*, and completely inappropriate for scripts. psgrep is the script-safe way of going about most things you would want to use ps for.
Really? I have 2 questions for you, then:
  1. How does one use extended regexes with a POSIX-compliant version of sed? I personally don't want to have half of my expression consist of "\", so I use -r on Linux and -E on FreeBSD.
  2. What is the meaning of this statement (taken from some mysterious ls spec): "The default format shall be to list one entry per line to standard output; the exceptions are to terminals or when one of the -C, [XSI] -m, or -x options is specified. If the output is to a terminal, the format is implementation-defined."? Also, why would that matter for a program whose output must only be directly viewed by a human?
Kevin Barry
 
Old 04-17-2012, 06:41 AM   #18
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
I just realized something about globstar that needs a warning.

It's not good at all for use on large file trees. It can easily expand into a string that's much to large for the shell to handle. If I run it on my $HOME directory, for example, it badly bogs down my system with some kind of memory allocation overload.

It works a treat on small trees though.
 
1 members found this post helpful.
Old 04-17-2012, 06:53 AM   #19
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by David the H. View Post
It's not good at all for use on large file trees. It can easily expand into a string that's much to large for the shell to handle. If I run it on my $HOME directory, for example, it badly bogs down my system with some kind of memory allocation overload.
Is that something that can be detected and prevented (e.g. depth limit,) or do you just have to take your chances?
Kevin Barry
 
Old 04-17-2012, 08:10 AM   #20
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
I don't know really. I can't think of any way offhand to avoid it, since it's the shell itself expanding the pattern into the file list during the parsing stage. And the limits to the shell's command line depend on the system.

I only figured out what was happening because it seemed to just freeze into a memory-consuming loop on certain directories, like $HOME. Only after ctrl+c'ing a few times did I get the error message telling me what was wrong.

I wonder if there's not (also?) some inefficiency bug involved, because it still bogs down even when the globbing pattern itself should result in just a few files. I'd like to hear if anyone else can confirm the behavior I'm getting (I'm using the debian-supplied 4.2.20(1)-release version).

It's a shame really. I was just getting used to using it, and now I have to start being more careful.
 
Old 04-17-2012, 08:55 AM   #21
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443

Original Poster
Blog Entries: 1

Rep: Reputation: 48
Quote:
Originally Posted by David the H. View Post
I just realized something about globstar that needs a warning.

It's not good at all for use on large file trees. It can easily expand into a string that's much to large for the shell to handle. If I run it on my $HOME directory, for example, it badly bogs down my system with some kind of memory allocation overload.

It works a treat on small trees though.
That makes sense... I presume that globstar falls victim to the same limitiations that globbing does... sooner or later, you're going to fill the environment buffer, and then all bets are off.
 
Old 04-17-2012, 09:19 AM   #22
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Yep, exactly. Only with globstar, it's more likely to be sooner than later.
 
Old 04-17-2012, 11:23 AM   #23
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
That's no worse than find, but it would be helpful if you could glob to standard output directly, without passing the names as arguments to another command/program.
Kevin Barry
 
Old 04-17-2012, 12:15 PM   #24
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by David the H. View Post
I'd like to hear if anyone else can confirm the behavior I'm getting (I'm using the debian-supplied 4.2.20(1)-release version).
I happened to have a Scientific Linux 6.2 virtual machine running, with Bash-4.1.2(1)-release (x86_64-redhat-linux-gnu).

First, I use a helper function to create the test trees:
Code:
# mkdirs DEPTH DIR(S)...
#
function mkdirs() {
    local depth=$[ $1-1 ]
    local dir=""
    shift 1
    [ $depth -ge 0 ] || return 0
    for dir in "$@" ; do
        (
            mkdir "$dir" || exit $?
            cd "$dir" || exit $?
            mkdirs $[depth] "$@" || exit $?
        ) || return 1
    done
    return 0
}
It creates the directories recursively, to the desired depth.

Using
Code:
mkdirs 7 dir-one dir-two dir-three dir-four dir-five
you create a 97655-directory tree, with five entries at each level in each subtree. It does each directory separately, so it takes a few minutes to run. (Note, you don't really need to to run the other tests.)

Recursive globbing,
Code:
shopt -s globstar
dirlist=(**/)
has no problems at all. It takes just a couple of seconds. You can use it with any builtins without a hitch, for example:
Code:
printf '%s' "${dirlist[@]}" | wc -c
5800784
which takes only a couple of seconds, too. This means there seems to be nothing wrong in recursive globbing, as long as you use builtins only, and the entire list.

If you take the above as given, then you can continue with just the dirlist array, which you can synthesize in just a few seconds using
Code:
dirlist=()
list="dir-one dir-two dir-three dir-four dir-five"
for D1 in $list ; do
    for D2 in $list ; do
        for D3 in $list ; do
            for D4 in $list ; do
               for D5 in $list ; do
                   for D6 in $list ; do
                       for D7 in $list ; do
                           dirlist[${#dirlist[@]}]="$D1/$D2/$D3/$D4/$D5/$D6/$D7"
                       done
                       dirlist[${#dirlist[@]}]="$D1/$D2/$D3/$D4/$D5/$D6"
                   done
                   dirlist[${#dirlist[@]}]="$D1/$D2/$D3/$D4/$D5"
               done
               dirlist[${#dirlist[@]}]="$D1/$D2/$D3/$D4"
           done
           dirlist[${#dirlist[@]}]="$D1/$D2/$D3"
       done
       dirlist[${#dirlist[@]}]="$D1/$D2"
   done
   dirlist[${#dirlist[@]}]="$D1"
done
Now, if you use the dirlist in any way, say count the total length of the directory names (the trivial equivalent of the above):
Code:
function totallen() {
    local total=0
    while [ $# -gt 0 ]; do
        local string="$1"
        shift 1
        total=$[total+${#string}]
    done
    echo $total
}
then expect to sit and wait. I ran on a bit smaller dirlist with 82030 directories in it:
Code:
time totallen "${dirlist[@]}"
4913284
real    17m24.167s
user    17m12.304s
sys      0m1.106s
Yup, that is seventeen minutes. It boils down to about 80 entries per second, just for summing the parameter string lengths.

On Bash-4.1.2(1)-release (x86_64-redhat-linux-gnu), slicing slow. Echoing the second set of five names (i.e. sixth to tenth entries in the array, remember indices start at zero in Bash):
Code:
time echo "${dirlist[@]:5:5}"
dir-one/dir-one/dir-one/dir-one/dir-one/dir-one dir-one/dir-one/dir-one/dir-one/dir-one/dir-two/dir-one dir-one/dir-one/dir-one/dir-one/dir-one/dir-two/dir-two dir-one/dir-one/dir-one/dir-one/dir-one/dir-two/dir-three dir-one/dir-one/dir-one/dir-one/dir-one/dir-two/dir-four
real	0m0.357s
user	0m0.357s
sys	0m0.001s
Not too bad, except if you do any serious work with arrays in Bash, it will be treacle-fast.

However, on Bash-4.2.10(1)-release x86_64-pc-linux-gnu, it .. it takes ages:
Code:
time echo "${dirlist[@]:5:5}"
dir-one/dir-one/dir-one/dir-one/dir-one/dir-one dir-one/dir-one/dir-one/dir-one/dir-one/dir-two/dir-one dir-one/dir-one/dir-one/dir-one/dir-one/dir-two/dir-two dir-one/dir-one/dir-one/dir-one/dir-one/dir-two/dir-three dir-one/dir-one/dir-one/dir-one/dir-one/dir-two/dir-four
real	1m38.657s
user	1m38.598s
sys	0m0.064s
Either RedHat has applied a patch to Bash which greatly improves the array slicing speed, or there is a severe regression in it between Bash-4.2.10 compared to Bash-4.1.2.

While I didn't check for memory leaks, I think the above indicates the real issue is the slow string handling, and insanely slow array handling, in Bash. If you do array slicing or access in a loop, it will look like the loop has frozen, simply because it works so slow.

On Bash-4.2.10(1)-release x86_64-pc-linux-gnu, don't use large arrays, or simply referencing an array member takes a significant fraction of a minute!

Edited to add: A for loop is not too bad:
Code:
len=0
for dir in "${dirlist[@]}" ; do
    len=$[len+${#dir}]
done
echo $len
executes in a few seconds, too, so apparently for loops don't suffer that much. Hey, this group slicing method -- for Bash-internal xargs -like processing, for example -- seems to work, too:
Code:
perslice=5
slice=()
for dir in "${dirlist[@]}" ; do
    if [ ${#slice[@]} -ge $perslice ]; then
        # Do something with "${slice[@]}"
        slice=()
    fi
    slice[${#slice[@]}]="$dir"
done
if [ ${#slice[@]} -gt 0 ]; then
    # Do something with leftovers "${slice[@]}"
fi
which takes only a few seconds, sans the something that works on the slices. So there is a workaround, it seems, to Bash string/array weaknesses, here.

Last edited by Nominal Animal; 04-17-2012 at 12:26 PM.
 
Old 04-17-2012, 02:51 PM   #25
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Well using the example code on my CLFS system running on vbox and the example of:
Code:
mkdirs 7 dir-one dir-two dir-three dir-four dir-five
My output doesn't look so bad:
Code:
GNU bash, version 4.2.10(1)-release (x86_64-unknown-linux-gnu)

5800784

real	0m0.748s
user	0m0.703s
sys	0m0.042s
5800784

real	4m26.532s
user	4m25.941s
sys	0m0.262s
dir-five/dir-five/dir-five/dir-five/dir-five/dir-five/ dir-five/dir-five/dir-five/dir-five/dir-five/dir-five/dir-five/ dir-five/dir-five/dir-five/dir-five/dir-five/dir-five/dir-four/ dir-five/dir-five/dir-five/dir-five/dir-five/dir-five/dir-one/ dir-five/dir-five/dir-five/dir-five/dir-five/dir-five/dir-three/

real	0m0.115s
user	0m0.114s
sys	0m0.000s
4.5 mins maybe not flash, but for the number of levels it is not too shabby
 
Old 04-17-2012, 05:04 PM   #26
tuxdev
Senior Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 2,012

Rep: Reputation: 115Reputation: 115
Quote:
How does one use extended regexes with a POSIX-compliant version of sed? I personally don't want to have half of my expression consist of "\", so I use -r on Linux and -E on FreeBSD.
You want your regexes to be pretty, or do you want them to be portable? You don't get both..

Quote:
What is the meaning of this statement (taken from some mysterious ls spec): "The default format shall be to list one entry per line to standard output; the exceptions are to terminals or when one of the -C, [XSI] -m, or -x options is specified. If the output is to a terminal, the format is implementation-defined."? Also, why would that matter for a program whose output must only be directly viewed by a human?
That statement says one thing about the behavior of ls, but the document as a whole leaves a large amount of important details open. You just can't depend on ls doing what you think it's doing across different implementations/platforms.

I suppose if you just flat out assume that your tools are GNU, than you can get away with a lot of stuff, but that's fundamentally unportable.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Curious as to why people don't sell their shell scripts vinvar30 Linux - Software 9 07-04-2011 08:23 AM
[SOLVED] Startup scripts don't get executed gusblake Fedora 2 06-23-2010 04:50 PM
Don't see the old familiar startup scripts folders rc0.d etc. philnk Zenwalk 1 12-28-2008 04:30 AM
How to ssh from a shell script ? For ppl who can write shell scripts. thefountainhead100 Programming 14 10-22-2008 06:24 AM
java scripts don't load properly MauricioTulua Linux - Software 1 09-17-2004 02:51 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:15 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration