LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   ordering an array based on another array (http://www.linuxquestions.org/questions/linux-general-1/ordering-an-array-based-on-another-array-911475/)

nano2 11-02-2011 08:58 AM

ordering an array based on another array
 
I have the following scenario where by i have array1 with elements in order of preference and array2 with unsorted arrays that contains some of array1 elements.

Code:

array1=( dog cat lion cow)
array2=(lion rat mouse  dog zebra pet cat bird cow calf sheep ........... )

I want to be able to use the contents of array2 but I need to have to the first few elements ordered.

My desire result would be
dog cat lion cow rat mouse zebra pet .. ... ....

What i have
Code:

cnt=${#array1[@]}
for ((i = 0; i < cnt ; i++ )) # array1.
do
  echo ${array1[i]}
done


Juako 11-02-2011 10:59 AM

I thought at first of making a new array, but seems feasible to directly manipulate array2:

Code:

#!/bin/bash
array1=(dog cat lion cow)
array2=(lion rat mouse dog zebra pet cat bird cow calf sheep)

# swaps two elements of an array
# $1 = array, $2 = index, $3 = index
swap() {
    local ref1=$1[$2] ref2=$1[$3]
    local temp=${!ref1}
    eval "$ref1=\"${!ref2}\" $ref2=\"$temp\""
}

echo ${array1[@]}
echo ${array2[@]}

for (( i = 0, swap_idx = 0; i < ${#array2[@]}; i++ )); do
    [[ "${array1[@]//${array2[$i]}/}" != "${array1[@]}"  ]] && swap array2 $i $(( swap_idx++ ))
done

echo ${array2[@]}

The for loop:
The loop initializes $i and $swap_idx to 0, then iterates each member of array2 indexing it with $i, using parameter expansion (pattern substitution) to see if array1 contains the current examined element of array2 (the pattern in the left of the != removes the element of array1 by replacing it with "" if it matches, so the resulting array won't equal the original array1).

If the element is matched by the pattern, we swap two elements of array2: the current element, with the one indexed by swap_idx, which is at "the last position where an already sorted element has been inserted + 1" (it starts being zero, check the declare statement before the loops).

The swap() function:
This resorts to some trickery to do "indirect reference" to the arrays and elements based on its arguments: first we define two local variables containing the literal strings of some array[index] based on $1, $2 and $3.

Then we can use the ${!String} parameter substitution (names matching prefix), which expands to the real value of a variable named after String. We use that to save one of the elements in $temp, and then we make the swap. Why do I use eval? Well, I couldn't make ${!} work in the left side of the assignment, so I resorted to the literal values and made ALL THE SWAP OPERATION a string, then eval'ed it (had to use some escaping on the right side of the assignments).

edit: note that this will alter the order of elements in array2 that aren't in array1. If you want to keep those in order, you should change the for loops for these:
Code:

for (( i = 0, swap_idx = 0; i < ${#array2[@]}; i++ )); do
    [[ "${array1[@]//${array2[$i]}/}" != "${array1[@]}"  ]] && {
        for (( j = i; j > $swap_idx; j-- )); do
            swap array2 $j $j-1
        done
        (( swap_idx++ ))
    }
done

This is the same as before, but before incrementing swap_idx it "slides" the matching element towards swap_idx position, swapping it with the previous element until it's in the correct place. That way the order of non matching elements is preserved.

Let me know if something is confusing, english isn't my native language and sometimes I'm kludgy at it :/

berbae 11-02-2011 11:15 AM

My solution:
Code:

#!/bin/bash
array1=(dog cat lion cow)
array2=(lion rat mouse dog zebra pet cat bird cow calf sheep)
list2="${array2[*]}"
cnt=${#array1[@]}
for ((i=0; i <cnt ; i++ )); do
  pref=${array1[i]}
  if grep -wq $pref <<< "$list2"; then
      preflist2+="$pref "
      list2="${list2/$pref }"
  fi
done
array3=($preflist2$list2)
echo array1:
echo "${array1[*]}"
echo array2:
echo "${array2[*]}"
echo array3:
echo "${array3[*]}"

I supposed there are no double entries.

nano2 11-02-2011 12:40 PM

Thanks that works however i can't access the array3 elements one by

Code:


fn "${array3[@]}"
fn(){
shift
arr=( "$@" )
cnt=${#arr[@]}
echo "the count is " $cnt
for (( idx = 0 ; idx < cnt ; idx++ ))
do
  echo ${arr[idx]}
done

}

why is i can only see one element in the array3 ?

Juako 11-02-2011 02:21 PM

edit:
I've changed this version to avoid using eval. A few posts later in this thread you can find links with information on this command, many containing explanations of why it almost always pays to invest some more time and rethink code using this command.

This one rebuilds the sorted array every time there's a match (this is probably faster than the script I've posted earlier in this thread, that uses eval too).

Code:

#!/bin/bash
declare -a sortby=( $1 ) sorted=( $2 )
declare length=${#sorted[@]} i=${#sortby} matches

while (( i-- )); do
    [[ ! "${sorted[@]}" =~ ${sortby[$i]} ]] && continue
    sorted=( ${sorted[@]//$BASH_REMATCH/} )
    printf -v matches '%*s' $(( $length - ${#sorted[@]} ))
    sorted=( ${matches// /$BASH_REMATCH }${sorted[@]} )
done

echo ${sorted[@]}

me@localhost:~$ ./script "dog cat lion cow" "lion rat mouse dog zebra lion pet cat bird cow calf lion"
dog cat lion lion lion cow rat mouse zebra pet bird calf


Explanation:

1) The declare lines save the first series of words in the "sorter" array named $sortby, and the second series in $sorted, which is the array to be sorted. The variable $i is initialized to (length of $sortby - 1) (the last element -- array indices start with 0), and $length stores the length of $sorted. This is needed in the loop when $sorted is being rebuilt.

2) The loop uses $i to index $sortby from the end (as a for..in would return the ordered matches reversed). On each round, if $sorted doesn't contain the current inspected element of $sortby, just continues looping.

If there was a match, the continue won't be executed and in turn the next line in the loop executes. Note that the test '=~', in case the pattern on the right does match, stores that pattern in the shell variable $BASH_REMATCH (actually it is an array and the behaviour is a bit more flexible, but to our goal this is all we need to know).

3) The next line then, rebuilds $sorted removing all matches of the inspected element. So now we have to add back to the top of $sorted as many matches of that element as were found. This is done in two steps:

4) The printf line uses "-v matches" to print its output to the variable $matches, instead to standard output. The format string '%*s' is a string used by printf to format and make conversions on the data it will print. In this case '%' means "start reading instructions for conversions", '*' means "the first parameter will tell you the minimum width for printing the second parameter" and 's' means "the second parameter (what you have to print) is a string". The trick is to pass then ONLY the minimum-width parameter after this, and a null string-parameter (the arithmetic substitution that follows, which equals the number of matches of the inspected element, is thus the minimum width we require). Passing a null string (ie, not passing a second parameter) leaves only the spaces that printf uses to pad a string to the requested minimum width.

5) So the printf line stores in $matches a string of x spaces, where x is the number of matches of the inspected element. Now the $sorted array is rebuilt, adding to its front the replacement of each space in $matches by the inspected element, followed by $sorted itself, which already had this element removed. The result is that matched elements will successively "stack" at the front of $sorted, as they are deleted from the rest of it.

Since the "stacking at the front" necessarily puts newer matches before previous matches, we can't traverse $sortby from the first element to the last, so that's why $i is decremented from the last element of $sortby to the first.

berbae 11-03-2011 10:04 AM

Quote:

Originally Posted by nano2 (Post 4514238)
why is i can only see one element in the array3 ?

your code for the fn function should be:
Code:

fn(){
arr=( $@ )
cnt=${#arr[@]}
echo "the count is " $cnt
for (( idx = 0 ; idx < cnt ; idx++ ))
do
  echo ${arr[idx]}
done
}
fn "${array3[@]}"

with

arr=( "$@" )

there will be only one element in the array arr.

with

arr=( $@ )

each word in $@ is a separate element.

I made the same error in my solution above, so I changed the line:

array3=("$preflist2$list2")

to

array3=($preflist2$list2)

to have separate words.

nano2 11-13-2011 07:31 AM

Hi ,

Using the below solution
Code:


#!/bin/bash
fn2 ()
{
shift

array2=( $@ )
cnt=${#arr[@]}
echo "the count is " $cnt
for (( idx = 0 ; idx < cnt ; idx++ ))
do
  echo ${array2[idx]}
done

array1=(dog cat lion cow)
#######array2=(lion rat mouse dog zebra pet cat bird cow calf sheep)
list2="${array2[*]}"
cnt=${#array1[@]}
for ((i=0; i <cnt ; i++ )); do
  pref=${array1[i]}
  if grep -wq $pref <<< "$list2"; then
      preflist2+="$pref "
      list2="${list2/$pref }"
  fi
done
array3=($preflist2$list2)
echo array1:
echo "${array1[*]}"
echo array2:
echo "${array2[*]}"
echo array3:
echo "${array3[*]}"

}

fn2 "${array2[@]}"

array2=(lion rat mouse dog zebra pet cat bird cow calf sheep)

I am trying to pass array2 into the above fn2 but i am only getting one element in the array ...

anyone know whats going on ?

Thanks

David the H. 11-13-2011 09:03 AM

Quote:

Originally Posted by berbae (Post 4514883)
your code for the fn function should be:

with

arr=( "$@" )

there will be only one element in the array arr.

with

arr=( $@ )

each word in $@ is a separate element.

This is not true at all. $@ and ${array[@]} are designed so that when quoted, each element is treated as a separate quoted string.

Quoting them is generally a good idea, to avoid any word-splitting on spaces inside each element.

$* and ${array[*]} are the patterns that output the whole list as a single string, and will have the effect you mention if quoted.

http://mywiki.wooledge.org/BashGuide...Using_Arrays-1

David the H. 11-13-2011 09:39 AM

From reading the OP (but not looking deeply at the rest), I believe this does what you're asking for:

Code:

#!/bin/bash

listitems() {

        #declares (local to the function) an associative array, and two regular output arrays
        local -A defarray
        local -a prearray mainarray

        #set each default element with the number it should be ordered in
        defarray=( [dog]=0 [cat]=1 [lion]=2 [cow]=3 )

        #loop through the function input
        #if an entry matches a default entry, then set into a prefix array
        #with that element number.  Else, add it to the main array

        for entry in "$@"; do

                if [[ "${defarray[$entry]}" ]]; then
                        prefarray[${defarray[$entry]}]="$entry"
                else
                        mainarray+=( "$entry" )
                fi

        done

        #echo the prefix array, followed by the main array.
        echo "List: ${prefarray[@]} ${mainarray[@]}"

}

# here is the input list, in yet another array
array2=( lion rat mouse dog zebra pet cat bird cow calf sheep )

# call the function with the input list
listitems "${array2[@]}"

It stores the default list in an associative array, making it easier to match and order individual elements.

***
Edit: By the way, one drawback of the above is that if there are multiple input entries that match a default, it will be output only once. But non-default entries will print multiple times. e.g.

input: sheep cow dog zebra dog zebra
output: dog cow sheep zebra zebra

If this is a problem, tell us what the true output needs to be and we'll try to work that out.
***

And nano2, if I can suggest a tip, try for a more effective use of formatting and whitespace in your script. Indent your loops and tests, line up sections of the same depth, and separate blocks of code with empty lines. A few comments would be useful too. It helps make things readable and debuggable.

http://mywiki.wooledge.org/BashGuide...es#Readability

Juako 11-13-2011 05:13 PM

@David the H. :

I believe the one I had posted here does the trick. Have you checked it?

David the H. 11-14-2011 12:51 PM

As I said, I hadn't taken the time to closely inspect the previous posts. But yeah, looking at it now, it is indeed a pretty ingenious solution. I probably wouldn't have thought about removing elements myself. I guess I'm just too linear a thinker. ;)

I'm not at all happy with the idea of using eval though. How about this instead?

Code:

#!/bin/bash

array1=( dog cat lion cow )
array2=( "$@" )
length=${#array2[@]}

for (( i = ${#array1[@]} ; i >= 0; i-- )); do

    array2=( ${array2[@]//${array1[$i]}/} )
    (( matches = length - ${#array2[@]} ))

    for (( n=1 ; n <= matches ; n++ )); do
          array3+=( "${array1[i]}" )
    done

    array2=( "${array3[@]}" "${array2[@]}" )
    unset array3

done

echo "${array2[@]}"

exit 0

I've also modified my previous code to give the same output:

Code:

#!/bin/bash

listitems() {

    local -A defarray sortarray
    local -a prefarray mainarray

    defarray=( [dog]=100 [cat]=200 [lion]=300 [cow]=400 [sheep]=500 )

    for entry in "$@"; do

          if [[ "${defarray[$entry]}" ]]; then

              (( defarray[$entry]++ ))
              prefarray[${defarray[$entry]}]="$entry"

          else
              if [[ "${mainarray[*]}" =~ "$entry" ]]; then
                    (( sortarray[$entry]++ ))
              else
                    (( sortarray[$entry] = n+=100 ))
              fi
              mainarray[${sortarray[$entry]}]="$entry"
          fi

    done

    echo "${prefarray[@]} ${mainarray[@]}"

}

array2=( "$@" )

listitems "${array2[@]}"

exit 0

Each default entry is now separated by a big chunk of digits, to allow plenty of room to increment the index numbers for duplicates. And the mainarray now has its own assoc. array for tracking duplicates too, working in a similar manner.

Juako 11-14-2011 06:09 PM

Quote:

Originally Posted by David the H. (Post 4523824)
I'm not at all happy with the idea of using eval though. How about this instead?

I should perhaps go to eval-rehab :rolleyes:, btw: my first version uses eval to implement a swap() array elements function, which allows for more fun like in-place sorting and "sliding" array elements. But yeah, even though I hate to admit, eval is hard to trust in or know in advance about its outcomes. I have a feeling though, that my take 2 can get the effect of the eval with some substitution, which should keep avoiding a third array. I'll play a bit to find out.

nano2 11-15-2011 03:31 AM

David H.

I am running bash -versionGNU bash, version 3.2.25 doesn't support declaration arrays !

Code:

   
local -A defarray sortarray

local -a prefarray mainarray


David the H. 11-15-2011 12:22 PM

Sorry. Associative arrays were added in version 4. So it's the -A option throwing the error, and my solution won't work for you.

http://wiki.bash-hackers.org/scripting/bashchanges

4.0 was released almost 3 years ago. Now it's at 4.2 and even more new goodies have been added. Perhaps it's time to consider upgrading. ;)

Juako's code should run ok on version 3. With my alterations of course. ;)


@Juako: I look forward to seeing what you can come up with. You always post interesting solutions.

Actually, I'm not really quite as fanatical as I seem about eval. I just dislike seeing such things used unnecessarily, particularly if there are security issues. In the right hands and the right circumstances they're ok. But tools like this that have risks attached should not be treated lightly; rather you should try to exhaust your other options first.

Finally, It dawned on me that my script could be simplified quite a bit. The default list can act as the tracking list for everything, so we really only need two arrays (one still needs to be associative, though).

Code:

#!/bin/bash

listitems() {

        local -A sortarray
        local -a mainarray

        sortarray=( [dog]=100 [cat]=200 [lion]=300 [cow]=400 )
        n=400

        for entry in "$@"; do

                  if [[ "${sortarray[$entry]}" ]]; then
                        (( sortarray[$entry]++ ))
                else
                        (( sortarray[$entry] = n+=100 ))
                fi

                mainarray[${sortarray[$entry]}]="$entry"

        done

        echo "${mainarray[@]}"

}

array2=( "$@" )

listitems "${array2[@]}"

exit 0


Juako 11-15-2011 06:58 PM

Quote:

Originally Posted by David the H. (Post 4524757)
Juako's code should run ok on version 3. With my alterations of course. ;)

@Juako: I look forward to seeing what you can come up with. You always post interesting solutions.

An honour :hattip:

Allright. I've updated the post to avoid eval, hope I retained the functionality, specifically I didn't test it with arrays that contain elements with spaces, but in case that's required, a new revision can be made.

I'd like to add a few links on eval, FWIW:

http://en.wikipedia.org/wiki/Eval
http://mywiki.wooledge.org/BashFAQ/048
http://wiki.bash-hackers.org/rcwatson
http://wiki.bash-hackers.org/scripting/style#eval
Chapter on various forms of subshelling in "Beginning Portable Shell" (seems a nice book, haven't heard of it).

(I promess to read them, hehehe)


All times are GMT -5. The time now is 08:01 AM.