[SOLVED] How to skip a number even if it's a random form of it?

TheGeniusLOL · 11-09-2018, 03:55 PM

Hello astrogeek,

I understand the first step where we sort the contents from $line.

As for the remaining steps I am clueless.

Quote:

So your next task is to create an associative array, whose keys are the concatenated values from the sorted array, $inarr. I named this array processed in my first example (which included a give-away hint of how to do this!), but you can name yours anything you want.

I don't get it on how to create a concatented key array.

I'm sorry for my lack of understanding this. I am 50 years old, a late bloomer to all this shell scripting stuff. I read the article but it's too Greek crytic to me and confusing.

I need the complete working script if you can provide it. I really would appreciate it.

Thanks in advance

lougavulin · 11-09-2018, 08:10 PM

This is not exactly how astrogeek did it. For me, it seems simpler that way :

Code:

# An associative array
# For instance, dirarr['100']=1
# '100' is what we call the key and 1 is the value. Here only the key matter to us.
declare -A didarr
while read line; do
    #put sorted line digits into array, I'll give you one way to do it, see if you can find others...
    inarr=$(echo ${line} | tr " " "\n" | sort -n | tr -d "\n")
    # For instance, inarr='123', so does '123' exist into our array ?
    if [ ${didarr[$inarr]} ]; then
         echo "Skipping ${line}"
    else
         echo "Processing ${line}"
         # Add the new processed line into the array as a key
         didarr[$inarr]=1
    fi
done< patterns.txt

TheGeniusLOL · 11-09-2018, 10:33 PM

Quote:

Originally Posted by lougavulin

This is not exactly how astrogeek did it. For me, it seems simpler that way :

Code:

# An associative array
# For instance, dirarr['100']=1
# '100' is what we call the key and 1 is the value. Here only the key matter to us.
declare -A didarr
while read line; do
    #put sorted line digits into array, I'll give you one way to do it, see if you can find others...
    inarr=$(echo ${line} | tr " " "\n" | sort -n | tr -d "\n")
    # For instance, inarr='123', so does '123' exist into our array ?
    if [ ${didarr[$inarr]} ]; then
         echo "Skipping ${line}"
    else
         echo "Processing ${line}"
         # Add the new processed line into the array as a key
         didarr[$inarr]=1
    fi
done< patterns.txt

WOW! thanks lougavulin for the help. I couldn't do this in a million years if I tried. The script is still a bit cryptic for me to understand, but all that matter is that it works.

Also thanks astrogeek for your hints and detailed instructions. However, what you wanted me to do was beyond my understanding of arrays and bash scripting.

Lastly, thanks ondoho for the article. It was detailed and everything, but I gotten overwhelmed by the info and I didn't grasp it as much.

Perhaps, if I were younger and smarter, I could have understood those concepts.

Thanks again to all.

danielbmartin · 11-09-2018, 10:44 PM

I wrote an awk solution but did not post it before now because the OP wanted a bash solution.

This code ...

Code:

echo; echo "Construct an InFile which consists of 5-character lines,"
echo "  each of which contains three random positive integers"
echo "  separated by blanks."
yes -- " "          \
|head -900          \
|nl -v100 -nln      \
|shuf               \
|sed 's/./& /g'     \
|sed 's/[ \t]*$//'  \
>$InFile

... builds a test InFile.

This annotated awk ...

Code:

awk '{  \
# As each InFile line is read, create a Signature for it which is
#  its own three integers in sorted order.
# The Signature for InFile line 4 7 2 is 2 4 7.
# The Signature for InFile line 7 2 4 is also 2 4 7.
  split($0,w); Sig=""; for(j=1;j<=asort(w);j++) Sig=Sig w[j] " ";
# The array SeenBefore contains Signatures of integer triples already processed.
# If the SeenBefore array contains this Sigature, we already processed it.
  if (SeenBefore[Sig]) print "Skipping",$0
# If the SeenBefore array does not contain this Signature,
#  we process the InFile line and add its Signature to the SeenBefore array.
      else {print "Processing",$0; SeenBefore[Sig]=1}}'  \
$InFile >$OutFile1

... generates an OutFile which is identical to that produced by the bash solution posted by lougavulin.

Daniel B. Martin

.

astrogeek · 11-10-2018, 06:44 PM

Quote:

Originally Posted by lougavulin

This is not exactly how astrogeek did it. For me, it seems simpler that way :

Code:

# An associative array
# For instance, dirarr['100']=1
# '100' is what we call the key and 1 is the value. Here only the key matter to us.
declare -A didarr
while read line; do
    #put sorted line digits into array, I'll give you one way to do it, see if you can find others...
    inarr=$(echo ${line} | tr " " "\n" | sort -n | tr -d "\n")
    # For instance, inarr='123', so does '123' exist into our array ?
    if [ ${didarr[$inarr]} ]; then
         echo "Skipping ${line}"
    else
         echo "Processing ${line}"
         # Add the new processed line into the array as a key
         didarr[$inarr]=1
    fi
done< patterns.txt

Nicely done!

I had considered using a string at first but thought of multi-digit values for which a simple string concatenation would fail. Even though the original question only included three single digit examples, it did not explicitly exclude multi-digit integers or lines longer than three values. I tried to anticipate this and prompt more thought on the OP's part by using the word "integer" in my last post.

Here is the full commented script I had written for the original example:

Code:

#!/bin/bash

infile=patterns.txt
declare -A procd
while read line; do

#Get sorted array of _integer values_ from input line
#Array preserves values whereas simple string may fail for multi-digit numbers
inarr=($(echo ${line}| tr " " "\n" | sort -n))

#Test whether we have seen this _sorted sequence_ before
if [ 0${procd[${inarr[*]}]} -eq 1 ]; then
        echo "Skipping $line"
else
        echo "processing $line"
        #Add to array of processed lines using concatenation of values as key
        procd[${inarr[*]}]=1
fi
done< $infile

Here are results of an example patterns.txt with two added lines, using your string example and my original array example:

Code:

cat patterns.txt
...
22 1 21
1 122 2

./lougavulin.sh
...
Processing 22 1 21
Skipping 1 122 2

./script.sh
...
processing 22 1 21
processing 1 122 2

Thanks!

astrogeek · 11-10-2018, 06:56 PM

Quote:

Originally Posted by TheGeniusLOL

I'm sorry for my lack of understanding this. I am 50 years old, a late bloomer to all this shell scripting stuff. I read the article but it's too Greek crytic to me and confusing.

You youngsters and your quest for instant gratification! When I was your age, long ago...

Quote:

Originally Posted by TheGeniusLOL

WOW! thanks lougavulin for the help. I couldn't do this in a million years if I tried. The script is still a bit cryptic for me to understand, but all that matter is that it works.

Then you should have said so up front instead of suggesting that you were trying to learn!

LQ is all about learning, sharing of knowledge, hence the approach most members take of trying to teach. If you are here only to get working code from others then you will likely find that you are in the wrong place!

Quote:

Originally Posted by TheGeniusLOL

Perhaps, if I were younger and smarter, I could have understood those concepts.

Don't sell yourself short! I can't remember what I had for breakfast most days! Most of what I know about bash I have learned since I really was your age! If I can do it then anybody and their cat can do it better, believe me!

Good luck!

lougavulin · 11-10-2018, 07:42 PM

@danielbmartin, thank you for your Awk version !

@astrogeek, you are right about multi-digit integers. So, just for fun, changing this line in my version :

Code:

inarr=$(echo ${line}| tr " " "\n" | sort -n | tr -d "\n")

by this line :

Code:

inarr=$(echo ${line}| tr " " "\n" | sort -n | tr "\n" "0")

Returns with your new lines :
[CODE]
Processing 1 0 0
Processing 1 2 3
Processing 5 7 9
Skipping 3 2 1
Processing 0 0 7
Processing 22 1 21
Processing 1 122 2
[CODE]

@TheGeniusLOL, just to let you know what is great with Bash (or Awk), is you can take just part of code and try again and again by changing small things to understand what it does !
For example, this line which can seem cryptic :

Code:

inarr=$(echo ${line}| tr " " "\n" | sort -n | tr -d "\n")

But you can test it and play with it, like :

Code:

line="3 2 1"
echo ${line} | tr " " "\n"
echo ${line} | tr " " "\n" | sort -n
echo ${line} | tr " " "\n" | sort -n | tr -d "\n"

Which help to understand what each command does.

Code:

$ echo ${line} | tr " " "\n"
3
2
1
$ echo ${line} | tr " " "\n" | sort -n
1
2
3
$ echo ${line} | tr " " "\n" | sort -n | tr -d "\n"
123

danielbmartin · 11-11-2018, 11:15 AM

Quote:

Originally Posted by TheGeniusLOL

... I am 50 years old, a late bloomer to all this shell scripting stuff. ...

In 2010 I bought a used desktop computer and installed Linux. I was age 68 at that time. It was my first exposure to Linux. The learning curve was daunting and there were moments of discouragement. Having received help from LQ when I was a newbie, I now contribute when possible to "give back."

I dabble in programming for entertainment and to stave off old-age brain rot. My advice to you: Hang In There!

Daniel B. Martin

.

danielbmartin · 11-11-2018, 12:17 PM

The awk shown in post #19 works for character strings of variable lengths. To test, you may build an InFile this way ...

Code:

echo; echo "Construct an InFile which consists of three-word lines,"
echo "  each of which contains three random positive integers"
echo "  expressed by name (i.e. One, Two, etc.) separated by blanks."
yes -- " "          \
|head -900          \
|nl -v100 -nln      \
|shuf               \
|sed 's/./& /g'     \
|sed 's/[ \t]*$//'  \
|sed 's/0/Zero/g;
      s/1/One/g;  
      s/2/Two/g;  
      s/3/Three/g;
      s/4/Four/g;
      s/5/Five/g;  
      s/6/Six/g;
      s/7/Seven/g;
      s/8/Eight/g;  
      s/9/Nine/g'   \
>$InFile

Daniel B. Martin

.

MadeInGermany · 11-12-2018, 06:48 AM

Comments on post#17:
should have $line in "quotes" to avoid unwanted expansions

Code:

inarr=$(echo "$line" | tr " " "\n" | sort -n | tr -d "\n")

I think it is not correct to remove the " " separators. For example, how would you distinguish between 12 13 14 and 121 31 4?
Better becomes

Code:

inarr=$(echo "$line" | tr " " "\n" | sort -n | tr "\n" " ")

or leave them newline-separated, as with the following variant

Code:

#!/bin/bash

# An associative array, requires bash4
# For instance, didarr['100']=1
# With variables: didarr[$key]=$val
declare -A didarr

# we read from an explicit file descriptor that is set at the end of the block
while read line <&3
do
    # quote command arguments to avoid expansions, i.e. "$line"
    skey=$(echo "$line" | tr " " "\n" | sort -n)
    # for instance, skey=1\n2\n3
    # The +word modifier allows to test for existence (not for non-empty)
    if [[ -n ${didarr[$skey]+isset} ]]
    then
         echo "Skipping $line"
    else
         echo "Processing $line"
         # Add the new processed line into the array as a key
         # We can omit the value
         didarr[$skey]=
    fi
# We set an explicit file descriptor for the while-do-done block
done 3< patterns.txt
# After the block the shell has closed the explicit descriptor

danielbmartin · 11-12-2018, 07:42 AM

Just for fun I embellished the awk solution shown in post #19. This version tells that an InFile line was already processed and, in addition, identifies that predecessor.

Construct an InFile ...

Code:

echo; echo "Construct an InFile which consists of 5-character lines,"
echo "  each of which contains three random positive integers"
echo "  separated by blanks."
yes -- " "          \
|head -900          \
|nl -v100 -nln      \
|shuf               \
|sed 's/./& /g'     \
|sed 's/[ \t]*$//'  \
>$InFile

... or use one of your own devising.

Bounce the InFile against this awk ...

Code:

awk '{  \
# As each InFile line is read, create a Signature for it which is
#  its own three integers in sorted order.
# The Signature for InFile line 4 7 2 is 2 4 7.
# The Signature for InFile line 7 2 4 is also 2 4 7.
  split($0,w); Sig=""; for(j=1;j<=asort(w);j++) Sig=Sig w[j] " ";
# The array SeenBefore contains Signatures of integer triples already processed.
# If the SeenBefore array contains this Sigature, we already processed it.
  if (SeenBefore[Sig]) print "Skipping",$0,"because it was processed at line",SeenBefore[Sig]
# If the SeenBefore array does not contain this Signature,
#  we process the InFile line and add its Signature to the SeenBefore array.
      else {print "Processing",$0; SeenBefore[Sig]=NR" ("$0")"}}'  \
$InFile >$OutFile3

This is a small part of the OutFile, chosen to show the added feature...

Code:

Processing 5 7 7
Processing 3 5 6
Processing 9 4 6
Processing 1 5 6
Processing 9 3 1
Processing 1 0 8
Skipping 5 6 1 because it was processed at line 29 (1 5 6)
Skipping 9 1 3 because it was processed at line 30 (9 3 1)
Processing 7 4 7
Processing 1 3 5
Processing 9 4 0
Skipping 5 3 6 because it was processed at line 27 (3 5 6)
Processing 9 9 5
Processing 7 0 5

Daniel B. Martin

.

Field95 · 11-13-2018, 12:48 PM

It seems to be solved, so here's a python throw out.
It will take each value and generate permutations of it (excepting the current value), add it to a set and see if the value already exists in it.

Code:

#!/usr/bin/env python3

import itertools
import csv 
import sys


all_perm = set()
for file in sys.argv[1:]:
    with open(file) as f:
        reader = csv.reader(f, delimiter=",")
        for line in reader:                 
            line = tuple(line)

            possibilities = itertools.permutations(line)
            next(possibilities)
             
            for possibility in possibilities:                                   
                all_perm.add(possibility)                                       
                                                                                
            if line not in all_perm:                                            
                print(*line)                                                    
                all_perm.add(line)

Code:

./pattern.py patterns.txt
1 2 3
4 5 6
2 3 6

igadoter · 11-15-2018, 06:00 AM

It is already solved but essentially you should use hash here. Hash which should results the same no matter of order. This way you can group entries by they hashes. The only thing to do is to look into each group to look for different entries. The other is to rethink how to represent data. Bad choice always results complicated algorithms. In your case it would be much easier process data if before switch columns and lines in input.

lougavulin · 11-15-2018, 06:36 AM

Quote:

Originally Posted by igadoter

It is already solved but essentially you should use hash here. Hash which should results the same no matter of order. This way you can group entries by they hashes. The only thing to do is to look into each group to look for different entries. The other is to rethink how to represent data. Bad choice always results complicated algorithms. In your case it would be much easier process data if before switch columns and lines in input.

I guess to get the same hash result for different order, you need to use a specific hash, because :

Code:

$ md5sum < <(echo "1 2 3")
f2b33fb7b3d0eb95090a16060e6a24f9  -
$ md5sum < <(echo "2 3 1")
b6e27c35b42671890dc567b86e7a6c69  -

$ sha256sum < <(echo "1 2 3")
1def07dbe06eeb097aafec8a40329937cd20c93a83634b8221ea2b41a894310c  -
$ sha256sum < <(echo "2 3 1")
8a0f00a6c372b85cfc5b19d375d054c3c763e1ea914415d36839e85a8109833e  -

Or I'm missing something...

igadoter · 11-15-2018, 08:37 AM

Most hashes are not 1-1. It was posted time ago how to change file without changing its md5 control sum. Of course but my proposal was just idea. How accurate I don't know as there is no data range, specification. Small, medium, large, huge, bigger than Universe - all matters.

But for trick to use control sums on the fly. I mean

Code:

$ md5sum < <(echo foobar)

Worth to remember.