bash 3.0+ script: array_unique(): can this function still be further optimized?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
bash 3.0+ script: array_unique(): can this function still be further optimized?
I'm currently making a set of functions that can be used for common manipulation of arrays. All of them are packaged as array_*().. One of them, array_unique(), was what I tried to optimize yesterday. Here's the code:
Code:
# array/unique.sh
#
# provides array_unique()
#
# author: konsolebox
# copyright free, 2010
#
# bool array_unique (avar <input>, [avar <result>])
#
# Removes duplicate entries from array. If a result variable is
# specified, changes are saved to that variable, or else the input
# variable gets modified instead. Indices are preserved.
#
# Please don't specify variables that have same names as the local
# variables below. All of the local variables are prefixed with __ so
# it wont' really be a namespace problem.
#
# Note: At first I thought that a method like the one presented by the
# second version will yield faster result since it doesn't
# reallocate the indices everytime a repeating line is removed.
# That could have been the case but as tests shows, the first
# version yielded faster performance in most cases. The second
# version is only faster when there are -very- many similarities
# in many lines.
#
# It appears that the overhead of having manier instructions is
# greater than the overhead of recollecting indices. The second
# version could be just faster if its lines can be reduced. It
# can perhaps also be faster if implemented in languages other
# than bash.
#
if true; then
function array_unique {
case "$#" in
1)
eval "
local -a __T=(\"\${!$1[@]}\") || return
local -i __I=0 __J __C=\${#__T[@]} __D=0
for (( ; __I < __C; ++__I )); do
for (( __J = __I + 1; __J < __C; ++__J )); do
[[ \${$1[__T[__I]]} = \"\${$1[__T[__J]]}\" ]] && {
unset $1\\[__T\\[__J\\]\\] __T\\[__J\\]
(( ++__D ))
}
done
[[ __D -gt 0 ]] && {
__T=(\"\${__T[@]:__I + 1}\")
(( __C -= __D + __I + 1, __I = -1, __D = 0 ))
}
done
return 0
"
;;
2)
eval "
local -a __T=(\"\${!$1[@]}\") || return
local -i __I=0 __J __C=\${#__T[@]} __D=0
$2=() || return
for (( ; __I < __C; ++__I )); do
for (( __J = __I + 1; __J < __C; ++__J )); do
[[ \${$1[__T[__I]]} = \"\${$1[__T[__J]]}\" ]] && {
unset __T\\[__J\\]
(( ++__D ))
}
done
__J=__T[__I]
$2[__J]=\${$1[__J]} || return
[[ __D -gt 0 ]] && {
__T=(\"\${__T[@]:__I + 1}\")
(( __C -= __D + __I + 1, __I = -1, __D = 0 ))
}
done
return 0
"
;;
# *)
# echo "invalid number of arguments."
# exit 1
# ;;
esac
return 1
}
else
function array_unique {
case "$#" in
1)
eval "
local -i -a __T=(\"\${!$1[@]}\") __U=() || return
local -i __I=0 __J __C=\${#__T[@]} __P
for (( ; __I < __C; __I += __U[__I] + 1 )); do
for (( __P = __I, __J = __I + __U[__I] + 1; __J < __C; __J += __U[__J] + 1 )); do
[[ \${$1[__T[__I]]} = \"\${$1[__T[__J]]}\" ]] && {
(( __U[__P] += __U[__J] + 1 ))
unset $1\\[__T\\[__J\\]\\]
continue
}
(( __P = __J ))
done
done
return 0
"
;;
2)
eval "
local -i -a __T=(\"\${!$1[@]}\") __U=() || return
local -i __I=0 __J __C=\${#__T[@]} __P
$2=() || return
for (( ; __I < __C; __I += __U[__I] + 1 )); do
__J=__T[__I]
$2[__J]=\${$1[__J]} || return
for (( __P = __I, __J = __I + __U[__I] + 1; __J < __C; __J += __U[__J] + 1 )); do
[[ \${$1[\${__T[__I]}]} = \"\${$1[__T[__J]]}\" ]] && {
(( __U[__P] += __U[__J] + 1 ))
continue
}
(( __P = __J ))
done
done
return 0
"
;;
# *)
# echo "invalid number of arguments."
# exit 1
# ;;
esac
return 1
}
fi
Sample test script:
Code:
source unique.sh
N=10 # The heaviest value I tried was probably N=1000 Z=1000
Z=10
declare -a -i A=()
for (( I = 0; I <= N; ++I )); do
(( A[I] = RANDOM % Z ))
done
unset "A[$(( RANDOM % N ))]" "A[$(( RANDOM % N ))]" "A[$(( RANDOM % N ))]" "A[$(( RANDOM % N ))]"
echo "array_unique A"
echo "before: $(set | grep ^A=)"
time array_unique A
echo "after: $(set | grep ^A=)"
echo
declare -a -i A=()
for (( I = 0; I <= N; ++I )); do
(( A[I] = RANDOM % Z ))
done
unset "A[$(( RANDOM % N ))]" "A[$(( RANDOM % N ))]" "A[$(( RANDOM % N ))]" "A[$(( RANDOM % N ))]"
echo "array_unique A B"
echo "before: $(set | grep ^A=)"
time array_unique A B
echo "after: $(set | grep ^B=)"
Can anyone further optimize this function? Please don't tell me that I'm just wasting my time trying to optimize this again . Thanks.
Last edited by konsolebox; 07-20-2010 at 08:01 AM.
Find the best algorithm first, you're using quadratic time solutions.
Code:
function array_unique {
local -A __U
eval "for n in \"\${$1[@]}\" ; do
__U[\$n]=1
done"
case "$#" in
1) eval "$1=( \${!__U[@]} )";;
2) eval "$2=( \${!__U[@]} )";;
*) return 1;;
esac
}
EDIT:
If you allow external programs, this one is faster
Code:
function array_unique {
local __OIFS="$IFS"
local __s='eval "echo \"\${$1[*]}\"" | sort -u'
case "$#" in
1) IFS=$'\n'; eval "$1=( \$($__s) )";;
2) IFS=$'\n'; eval "$2=( \$($__s) )";;
*) return 1;;
esac
IFS="$__OIFS"
}
Last edited by ntubski; 07-20-2010 at 11:25 AM.
Reason: added function using sort
Well I am not sure about optimization because I stopped after the first statement:
Would not the first part of optimization be to throw away the useless 'else' clause??
Thanks for the rep.. Actually that's only for load time; after-all functions are only loaded and parsed once. They're converted to a more optimized form afterwards. What's important is really in the loop; it's what I'm trying to change.
Quote:
Originally Posted by ntubski
Find the best algorithm first, you're using quadratic time solutions.
Code:
function array_unique {
local -A __U
eval "for n in \"\${$1[@]}\" ; do
__U[\$n]=1
done"
case "$#" in
1) eval "$1=( \${!__U[@]} )";;
2) eval "$2=( \${!__U[@]} )";;
*) return 1;;
esac
}
Looks like it's only for associative arrays? But thanks. Maybe using that idea, a faster version can be implement in bash 4.0+. I can create another version that will be automatically used if bash is 4.0+.
.... The only possible overhead I can see with that is the reallocation of values to keys in flag associative arrays; but still it can be better.
Quote:
EDIT:
If you allow external programs, this one is faster
Code:
function array_unique {
local __OIFS="$IFS"
local __s='eval "echo \"\${$1[*]}\"" | sort -u'
case "$#" in
1) IFS=$'\n'; eval "$1=( \$($__s) )";;
2) IFS=$'\n'; eval "$2=( \$($__s) )";;
*) return 1;;
esac
IFS="$__OIFS"
}
Could be but there's the dependency.. and should it really be faster since it makes many subprocess calls? .. parent->(subshell echo, parent->(exec(sort)))
in both steps (echo and $() like call), a large memory is allocated for the storage of string/output.
Thanks again. I'll try to apply the first.
Last edited by konsolebox; 07-20-2010 at 10:03 PM.
As expected the version that uses associative arrays is the fastest. Thanks again ntubski. You did it again .
I also found something that makes the functions really slow: unset. It appears that builtin calls are really slow compared to function-embedded methods. It also appears that unsetting an element in an array also takes much time. So to make me not use unset, I guess I'll also have to try a variation whereby a temporary array variable will be used for storing unique elements before copying them to the input array. After running tests I hope on my next post it will already be the final form of the script. I'll wait some days just to make sure that the script gets 100.00001% stable. Of course new ideas are always welcome until then. Thanks.
Code:
function array_unique_a {
case "$#" in
1)
[[ $1 =~ ^[[:alpha:]_]+[[:alnum:]_]*$ ]] || return
local -A __F=(); local -i __I; local __A
eval "
for __I in \"\${!$1[@]}\"; do
__A=\${$1[__I]}
[[ -n \${__F[\$__A]} ]] && {
unset $1\\[__I\\]
continue
}
__F[\$__A]=.
done
"
;;
2)
[[ $1 =~ ^[[:alpha:]_]+[[:alnum:]_]*$ && $2 =~ ^[[:alpha:]_]+[[:alnum:]_]*$ ]] || return
local -A __F=(); local -i __I; local __A
eval "
$2=() || return
for __I in \"\${!$1[@]}\"; do
__A=\${$1[__I]}
[[ -n \${__F[\$__A]} ]] && continue
$2[__I]=\$__A
__F[\$__A]=.
done
"
;;
# *)
# echo "invalid number of arguments."
# exit 1
# ;;
esac
}
function array_unique_b {
case "$#" in
1)
eval "
local -a __T=(\"\${!$1[@]}\") || return
local -i __I=0 __J __C=\${#__T[@]} __D=0
for (( ; __I < __C; ++__I )); do
for (( __J = __I + 1; __J < __C; ++__J )); do
[[ \${$1[__T[__I]]} = \"\${$1[__T[__J]]}\" ]] && {
unset $1\\[__T\\[__J\\]\\] __T\\[__J\\]
(( ++__D ))
}
done
[[ __D -gt 0 ]] && {
__T=(\"\${__T[@]:__I + 1}\")
(( __C -= __D + __I + 1, __I = -1, __D = 0 ))
}
done
return 0
"
;;
2)
eval "
local -a __T=(\"\${!$1[@]}\") || return
local -i __I=0 __J __C=\${#__T[@]} __D=0
$2=() || return
for (( ; __I < __C; ++__I )); do
for (( __J = __I + 1; __J < __C; ++__J )); do
[[ \${$1[__T[__I]]} = \"\${$1[__T[__J]]}\" ]] && {
unset __T\\[__J\\]
(( ++__D ))
}
done
__J=__T[__I]
$2[__J]=\${$1[__J]} || return
[[ __D -gt 0 ]] && {
__T=(\"\${__T[@]:__I + 1}\")
(( __C -= __D + __I + 1, __I = -1, __D = 0 ))
}
done
return 0
"
;;
# *)
# echo "invalid number of arguments."
# exit 1
# ;;
esac
return 1
}
function array_unique_c {
case "$#" in
1)
eval "
local -i -a __T=(\"\${!$1[@]}\") __U=() || return
local -i __I=0 __J __C=\${#__T[@]} __P
for (( ; __I < __C; __I += __U[__I] + 1 )); do
for (( __P = __I, __J = __I + __U[__I] + 1; __J < __C; __J += __U[__J] + 1 )); do
[[ \${$1[__T[__I]]} = \"\${$1[__T[__J]]}\" ]] && {
(( __U[__P] += __U[__J] + 1 ))
unset $1\\[__T\\[__J\\]\\]
continue
}
(( __P = __J ))
done
done
return 0
"
;;
2)
eval "
local -i -a __T=(\"\${!$1[@]}\") __U=() || return
local -i __I=0 __J __C=\${#__T[@]} __P
$2=() || return
for (( ; __I < __C; __I += __U[__I] + 1 )); do
__J=__T[__I]
$2[__J]=\${$1[__J]} || return
for (( __P = __I, __J = __I + __U[__I] + 1; __J < __C; __J += __U[__J] + 1 )); do
[[ \${$1[\${__T[__I]}]} = \"\${$1[__T[__J]]}\" ]] && {
(( __U[__P] += __U[__J] + 1 ))
continue
}
(( __P = __J ))
done
done
return 0
"
;;
# *)
# echo "invalid number of arguments."
# exit 1
# ;;
esac
return 1
}
test script
Code:
source unique.sh
function array_copy {
local -i __I
eval "$2=() && for __I in \${!$1[@]}; do $2[__I]=\${$1[__I]}; done"
}
[[ -z $N ]] && N=10000
[[ -z $Z ]] && Z=100
[[ -z $U ]] && U=500
declare -a -i A=() B=()
for (( I = 0; I <= N; ++I )); do
(( A[I] = RANDOM % Z ))
done
for (( I = U; I; --I )); do
unset "A[$(( RANDOM % N ))]"
done
array_copy A C
echo "N = $N, Z =$Z, U = $U"
echo
echo "orignal data count: ${#A[@]}"
echo
echo '----- array_unique_a -----'
echo
echo "array_unique_a A"
echo
time array_unique_a A
echo
echo "after: $(set | grep ^A=)"
echo
echo "array_unique_a A B"
echo
time array_unique_a A B
echo
echo "after: $(set | grep ^B=)"
echo
array_copy C A
B=()
echo '----- array_unique_b -----'
echo
echo "array_unique_b A"
echo
time array_unique_b A
echo
echo "after: $(set | grep ^A=)"
echo
echo "array_unique_b A B"
echo
time array_unique_b A B
echo
echo "after: $(set | grep ^B=)"
echo
array_copy C A
B=()
echo '----- array_unique_c -----'
echo
echo "array_unique_c A"
echo
time array_unique_c A
echo
echo "after: $(set | grep ^A=)"
echo
echo "array_unique_c A B"
echo
time array_unique_c A B
echo
echo "after: $(set | grep ^B=)"
echo
result for first test
Code:
N = 10000, Z =100, U = 500
orignal data count: 9517
----- array_unique_a -----
array_unique_a A
real 0m0.597s
user 0m0.587s
sys 0m0.010s
after: A=([0]="87" [1]="73" [2]="47" ...) (same)
array_unique_a A B
real 0m0.006s
user 0m0.004s
sys 0m0.002s
after: B=([0]="87" [1]="73" [2]="47" ...) (same)
----- array_unique_b -----
array_unique_b A
real 2m27.084s
user 0m2.221s
sys 2m24.828s
after: A=([0]="87" [1]="73" [2]="47" ...) (same)
array_unique_b A B
real 0m0.295s
user 0m0.572s
sys 0m0.000s
after: B=([0]="87" [1]="73" [2]="47" ...) (same)
----- array_unique_c -----
array_unique_c A
real 3m22.593s
user 0m3.387s
sys 3m18.185s
after: A=([0]="87" [1]="73" [2]="47" ...) (same)
array_unique_c A B
real 0m0.458s
user 0m0.890s
sys 0m0.000s
after: B=([0]="87" [1]="73" [2]="47" ...) (same)
2nd test
Code:
N = 10000, Z =5, U = 500
orignal data count: 9515
----- array_unique_a -----
array_unique_a A
real 0m0.585s
user 0m0.582s
sys 0m0.004s
after: A=([0]="0" [2]="2" [3]="1" [4]="3" [5]="4")
array_unique_a A B
real 0m0.001s
user 0m0.001s
sys 0m0.000s
after: B=([0]="0" [2]="2" [3]="1" [4]="3" [5]="4")
----- array_unique_b -----
array_unique_b A
real 0m9.102s
user 0m9.011s
sys 0m0.086s
after: A=([0]="0" [2]="2" [3]="1" [4]="3" [5]="4")
array_unique_b A B
real 0m0.005s
user 0m0.004s
sys 0m0.000s
after: B=([0]="0" [2]="2" [3]="1" [4]="3" [5]="4")
----- array_unique_c -----
array_unique_c A
real 0m14.186s
user 0m14.102s
sys 0m0.081s
after: A=([0]="0" [2]="2" [3]="1" [4]="3" [5]="4")
array_unique_c A B
real 0m0.006s
user 0m0.006s
sys 0m0.000s
after: B=([0]="0" [2]="2" [3]="1" [4]="3" [5]="4")
Could be but there's the dependency.. and should it really be faster since it makes many subprocess calls? .. parent->(subshell echo, parent->(exec(sort)))
in both steps (echo and $() like call), a large memory is allocated for the storage of string/output.
It tested fastest for me. If you want pure bash, it's probably worth writing sort in bash: that still gives an O(n log n) algorithm vs. O(n^2), so it should be faster for large N.
EDIT:
Code:
real 3m22.593s
user 0m3.387s
sys 3m18.185s
It's strange that there is so much sys time used, maybe you started swapping?
Last edited by ntubski; 07-21-2010 at 08:52 AM.
Reason: noted user vs sys time
It tested fastest for me. If you want pure bash, it's probably worth writing sort in bash: that still gives an O(n log n) algorithm vs. O(n^2), so it should be faster for large N.
I already have array_sort() but it's implemented differently with array_unique(). How can that make things faster? Also does it save the indices?
Quote:
EDIT:
Code:
real 3m22.593s
user 0m3.387s
sys 3m18.185s
It's strange that there is so much sys time used, maybe you started swapping?
Not sure but why not (array_unique A B)?. Could it be that unsetting an element costs too much memory?.. I'll check it later.
I already have array_sort() but it's implemented differently with array_unique(). How can that make things faster? Also does it save the indices?
Your solutions with the nested loops go through each element N times, so commands within the nested loop are run N*N=N^2 times. There are sorting algorithms that take time proportional to NlogN which is much less than N^2 for large N.
Do you mean array_sort() uses array_unique()? I don't understand what you mean by "save the indices".
Quote:
Not sure but why not (array_unique A B)?. Could it be that unsetting an element costs too much memory?.. I'll check it later.
Only the user time represents CPU cycles, so it seems strange that the majority of the time it takes is sys time. Even if unsetting costs a lot of memory, I don't think it should cost that much.
Do you mean array_sort() uses array_unique()? I don't understand what you mean by "save the indices".
I'm sorry that I put it the wrong way. I use a different algorithm for array_sort() and it doesn't run with array_unique().
Here's the previous code that's not yet array_sort(). I didn't bring the new one so I just took it from playshell in sf.
Code:
include log.sh
include patterns.sh
include utils/copyarray.sh
#
# utils/sortarray.sh
#
# portable utility that provides a function that sorts an array
#
# author: konsolebox
# copyright free
# created and updated 2008-2010
#
# Credits have got to be given to the authors of the book "A Book on C 4th Ed."
# for this great algorithm. The algorithm was originally described by someone
# and was completely explained in the book with an implimentation that's written
# in C.
#
# I knew C from many sources but most of what I learned came from this book and
# I therefore recommend it to starters for a good start and also to experienced
# programmers for a good reference and new methods that they may discover from
# it.
# void utils_sortarray
# (
# ["from=<__ARRAY>"],
# ["type=<string|integer>"],
# ["to=<__ARRAY>"],
# ["as=<values|indices>"],
# ["--" [ SOURCEVALUES[@] ]]
# )
#
function utils_sortarray {
log_fcall "$@"
[[ $# -eq 0 ]] && return
local FROM TYPE TO AS
local -a __ARRAY
local -a -i POINTERS
while [[ $# -gt 0 ]]; do
case "$1" in
from=*)
FROM=${1#from=}
;;
type=*)
TYPE=${1#type=}
;;
to=*)
TO=${1#to=}
;;
as=*)
AS=${1#as=}
;;
--)
shift
break
;;
#beginsyntaxcheckblock
*)
log_criticalerror "unknown parameter: $1"
;;
#endsyntaxcheckblock
esac
shift
done
#beginsyntaxcheckblock
[[ -n $FROM && $FROM != $PATTERNS_ARRAYVARNAME ]] && \
log_criticalerror "variable name not valid for the source array: $FROM"
[[ -n $TYPE && $TYPE != @(string|integer) ]] && \
log_criticalerror "argument is not valid for type: $TYPE"
[[ -n $TO && $TO != $PATTERNS_ARRAYVARNAME ]] && \
log_criticalerror "variable name not valid for the target array: $TO"
[[ -n $AS && $AS != @(values|indices) ]] && \
log_criticalerror "argument is not valid for as: $AS"
[[ -z $FROM && $# -eq 0 ]] && \
log_criticalerror "a source should be specified either by 'from=<__ARRAY>' or '-- CONTENTS[@]'"
#endsyntaxcheckblock
if [[ $# -gt 0 ]]; then
__ARRAY=("$@")
elif [[ -n $FROM ]]; then
utils_copyarray "$FROM" __ARRAY
fi
[[ -z $TYPE ]] && TYPE='string'
[[ -z $TO ]] && TO='__'
[[ -z $AS ]] && AS='values'
POINTERS=(${!__ARRAY[@]})
if [[ ${#POINTERS[@]} -gt 1 ]]; then
case "$TYPE" in
string)
utils_sortarray_quicksortforstrings 0 $(( ${#POINTERS[@]} - 1 ))
;;
integer)
utils_sortarray_quicksortforintegers 0 $(( ${#POINTERS[@]} - 1 ))
;;
esac
fi
case "$AS" in
values)
local -i I J=0
unset "${TO}[*]"
eval "for I in \"\${POINTERS[@]}\"; do ${TO}[J++]=\${__ARRAY[I]}; done"
;;
indices)
eval "$TO=(\"\${POINTERS[@]}\")"
;;
esac
}
# void utils_sortarray_quicksortforstrings (uint LEFT, uint RIGHT)
#
function utils_sortarray_quicksortforstrings {
[[ $1 -lt $2 ]] || return
local -i LEFT=$1 RIGHT=$2 PIVOT PARTITION
if utils_sortarray_quicksortforstrings_findpivot; then
utils_sortarray_quicksortforstrings_partition
utils_sortarray_quicksortforstrings $LEFT $(( PARTITION - 1 ))
utils_sortarray_quicksortforstrings $PARTITION $RIGHT
fi
}
# bool utils_sortarray_quicksortforstrings_findpivot (void)
#
function utils_sortarray_quicksortforstrings_findpivot {
local -i A B C P MIDDLE
(( MIDDLE = LEFT + (RIGHT - LEFT) / 2 ))
(( A = POINTERS[LEFT], B = POINTERS[MIDDLE], C = POINTERS[RIGHT] ))
[[ ${__ARRAY[A]} > ${__ARRAY[B]} ]] && (( A = $B, B = $A ))
[[ ${__ARRAY[A]} > ${__ARRAY[C]} ]] && (( A = $C, C = $A ))
[[ ${__ARRAY[B]} > ${__ARRAY[C]} ]] && (( B = $C, C = $B ))
if [[ ${__ARRAY[A]} < ${__ARRAY[B]} ]]; then
PIVOT=$B
return 0
fi
if [[ ${__ARRAY[B]} < ${__ARRAY[C]} ]]; then
PIVOT=$C
return 0
fi
for (( P = LEFT + 1; P < MIDDLE; ++P )); do
if [[ ${__ARRAY[P]} > ${__ARRAY[A]} ]]; then
PIVOT=$P
return 0
fi
if [[ ${__ARRAY[P]} < ${__ARRAY[A]} ]]; then
PIVOT=$A
return 0
fi
done
for (( P = MIDDLE + 1; P < RIGHT; ++P )); do
if [[ ${__ARRAY[P]} > ${__ARRAY[A]} ]]; then
PIVOT=$P
return 0
fi
if [[ ${__ARRAY[P]} < ${__ARRAY[A]} ]]; then
PIVOT=$A
return 0
fi
done
return 1
}
# void utils_sortarray_quicksortforstrings_partition (void)
#
function utils_sortarray_quicksortforstrings_partition {
local -i L R T
local P=${__ARRAY[PIVOT]}
for (( L = LEFT, R = RIGHT; L <= R; )); do
while [[ ${__ARRAY[POINTERS[L]]} < $P ]]; do
(( ++L ))
done
until [[ ${__ARRAY[POINTERS[R]]} < $P ]]; do
(( --R ))
done
[[ L -lt R ]] && (( T = POINTERS[L], POINTERS[L] = POINTERS[R], POINTERS[R] = T, ++L, --R ))
done
(( PARTITION = L ))
}
# void utils_sortarray_quicksortforintegers (uint LEFT, uint RIGHT)
#
function utils_sortarray_quicksortforintegers {
[[ $1 -lt $2 ]] || return
local -i LEFT=$1 RIGHT=$2 PIVOT PARTITION
if utils_sortarray_quicksortforintegers_findpivot; then
utils_sortarray_quicksortforintegers_partition
utils_sortarray_quicksortforintegers $LEFT $(( PARTITION - 1 ))
utils_sortarray_quicksortforintegers $PARTITION $RIGHT
fi
}
# bool utils_sortarray_quicksortforintegers_findpivot (void)
#
function utils_sortarray_quicksortforintegers_findpivot {
local -i A B C P MIDDLE
(( MIDDLE = LEFT + (RIGHT - LEFT) / 2 ))
(( A = POINTERS[LEFT], B = POINTERS[MIDDLE], C = POINTERS[RIGHT] ))
[[ __ARRAY[A] -gt __ARRAY[B] ]] && (( A = $B, B = $A ))
[[ __ARRAY[A] -gt __ARRAY[C] ]] && (( A = $C, C = $A ))
[[ __ARRAY[B] -gt __ARRAY[C] ]] && (( B = $C, C = $B ))
if [[ __ARRAY[A] -lt __ARRAY[B] ]]; then
PIVOT=$B
return 0
fi
if [[ __ARRAY[B] -lt __ARRAY[C] ]]; then
PIVOT=$C
return 0
fi
for (( P = LEFT + 1; P < MIDDLE; ++P )); do
if [[ __ARRAY[P] -gt __ARRAY[A] ]]; then
PIVOT=$P
return 0
fi
if [[ __ARRAY[P] -lt __ARRAY[A] ]]; then
PIVOT=$A
return 0
fi
done
for (( P = MIDDLE + 1; P < RIGHT; ++P )); do
if [[ __ARRAY[P] -gt __ARRAY[A] ]]; then
PIVOT=$P
return 0
fi
if [[ __ARRAY[P] -lt __ARRAY[A] ]]; then
PIVOT=$A
return 0
fi
done
return 1
}
# void utils_sortarray_quicksortforintegers_partition (void)
#
function utils_sortarray_quicksortforintegers_partition {
local -i L R T P
for (( L = LEFT, R = RIGHT, P = __ARRAY[PIVOT]; L <= R; )); do
for (( ; __ARRAY[POINTERS[L]] < P; ++L )); do
continue
done
for (( ; __ARRAY[POINTERS[R]] >= P; --R )); do
continue
done
[[ L -lt R ]] && (( T = POINTERS[L], POINTERS[L] = POINTERS[R], POINTERS[R] = T, ++L, --R ))
done
(( PARTITION = L ))
}
Quote:
Originally Posted by ntubski
Your solutions with the nested loops go through each element N times, so commands within the nested loop are run N*N=N^2 times. There are sorting algorithms that take time proportional to NlogN which is much less than N^2 for large N.
Perhaps I can also consider this but I actually planned to make the functions not change the indices or save the element's current positions.
For example if we have an array like
Code:
A=([1]="A" [2]="B" [3]="A" [4]="B")
We can have them processed into two outputs.
One is
Code:
A=([1]="A" [2]="B")
and the other is
Code:
A=([0]="A" [1]="B")
And what I'm after is just the first.
If we do something like:
Code:
B=("${A[@]}"])
The indices of B will always be continuous from 0 no matter what the index/key structure of A is.
Quote:
Only the user time represents CPU cycles, so it seems strange that the majority of the time it takes is sys time. Even if unsetting costs a lot of memory, I don't think it should cost that much.
I just found out that it's naturally slow since there are many elements from the original but why it takes too much time on sys and not on user, that I don't know. Perhaps bash uses a dynamic method for handling arrays like everytime an element is removed, the whole map of the array is updated or changed. This might be for the sake of saving memory but if it's the case, it could be just the reason why removing an element is slow. For the reason why it's system time that's used, maybe because there are calls to malloc() and free(); that's casted not only for the element that's to be removed but on other related objects as well..? If it's already internal then I guess I can no longer do anything much about it.
I already made variations of array_unique_a() where I try to make a temporary copy first but the result was just a slower function. There was also a version were I just save the indices that will be deleted and then cast unset with them in one shot but it's also slower.
Anyhow I guess the last version I posted is already the fastest though I'm still hoping that there's a better way to do it. If only indices are mapped in a different way in bash (just like associative array), but perhaps it's really not that easy to make such an implementation knowing that nothingness is allowed in between elements in indexed arrays. Or maybe associative array like manipulation for indexed arrays will just be expensive.. don't know.
Last edited by konsolebox; 07-22-2010 at 12:05 AM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.