LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-09-2018, 03:55 PM   #16
TheGeniusLOL
LQ Newbie
 
Registered: Oct 2018
Posts: 22

Original Poster
Rep: Reputation: Disabled

Hello astrogeek,

I understand the first step where we sort the contents from $line.

As for the remaining steps I am clueless.

Quote:
So your next task is to create an associative array, whose keys are the concatenated values from the sorted array, $inarr. I named this array processed in my first example (which included a give-away hint of how to do this!), but you can name yours anything you want.
I don't get it on how to create a concatented key array.

I'm sorry for my lack of understanding this. I am 50 years old, a late bloomer to all this shell scripting stuff. I read the article but it's too Greek crytic to me and confusing.

I need the complete working script if you can provide it. I really would appreciate it.

Thanks in advance
 
Old 11-09-2018, 08:10 PM   #17
lougavulin
Member
 
Registered: Jul 2018
Distribution: Slackware,x86_64,current
Posts: 279

Rep: Reputation: 100Reputation: 100
This is not exactly how astrogeek did it. For me, it seems simpler that way :
Code:
# An associative array
# For instance, dirarr['100']=1
# '100' is what we call the key and 1 is the value. Here only the key matter to us.
declare -A didarr
while read line; do
    #put sorted line digits into array, I'll give you one way to do it, see if you can find others...
    inarr=$(echo ${line} | tr " " "\n" | sort -n | tr -d "\n")
    # For instance, inarr='123', so does '123' exist into our array ?
    if [ ${didarr[$inarr]} ]; then
         echo "Skipping ${line}"
    else
         echo "Processing ${line}"
         # Add the new processed line into the array as a key
         didarr[$inarr]=1
    fi
done< patterns.txt
 
3 members found this post helpful.
Old 11-09-2018, 10:33 PM   #18
TheGeniusLOL
LQ Newbie
 
Registered: Oct 2018
Posts: 22

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by lougavulin View Post
This is not exactly how astrogeek did it. For me, it seems simpler that way :
Code:
# An associative array
# For instance, dirarr['100']=1
# '100' is what we call the key and 1 is the value. Here only the key matter to us.
declare -A didarr
while read line; do
    #put sorted line digits into array, I'll give you one way to do it, see if you can find others...
    inarr=$(echo ${line} | tr " " "\n" | sort -n | tr -d "\n")
    # For instance, inarr='123', so does '123' exist into our array ?
    if [ ${didarr[$inarr]} ]; then
         echo "Skipping ${line}"
    else
         echo "Processing ${line}"
         # Add the new processed line into the array as a key
         didarr[$inarr]=1
    fi
done< patterns.txt

WOW! thanks lougavulin for the help. I couldn't do this in a million years if I tried. The script is still a bit cryptic for me to understand, but all that matter is that it works.

Also thanks astrogeek for your hints and detailed instructions. However, what you wanted me to do was beyond my understanding of arrays and bash scripting.

Lastly, thanks ondoho for the article. It was detailed and everything, but I gotten overwhelmed by the info and I didn't grasp it as much.

Perhaps, if I were younger and smarter, I could have understood those concepts.

Thanks again to all.
 
Old 11-09-2018, 10:44 PM   #19
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
I wrote an awk solution but did not post it before now because the OP wanted a bash solution.

This code ...
Code:
echo; echo "Construct an InFile which consists of 5-character lines,"
echo "  each of which contains three random positive integers"
echo "  separated by blanks."
yes -- " "          \
|head -900          \
|nl -v100 -nln      \
|shuf               \
|sed 's/./& /g'     \
|sed 's/[ \t]*$//'  \
>$InFile
... builds a test InFile.

This annotated awk ...
Code:
awk '{  \
# As each InFile line is read, create a Signature for it which is
#  its own three integers in sorted order.
# The Signature for InFile line 4 7 2 is 2 4 7.
# The Signature for InFile line 7 2 4 is also 2 4 7.
  split($0,w); Sig=""; for(j=1;j<=asort(w);j++) Sig=Sig w[j] " ";
# The array SeenBefore contains Signatures of integer triples already processed.
# If the SeenBefore array contains this Sigature, we already processed it.
  if (SeenBefore[Sig]) print "Skipping",$0
# If the SeenBefore array does not contain this Signature,
#  we process the InFile line and add its Signature to the SeenBefore array.
      else {print "Processing",$0; SeenBefore[Sig]=1}}'  \
$InFile >$OutFile1
... generates an OutFile which is identical to that produced by the bash solution posted by lougavulin.

Daniel B. Martin

.
 
1 members found this post helpful.
Old 11-10-2018, 06:44 PM   #20
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,263
Blog Entries: 24

Rep: Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194
Quote:
Originally Posted by lougavulin View Post
This is not exactly how astrogeek did it. For me, it seems simpler that way :
Code:
# An associative array
# For instance, dirarr['100']=1
# '100' is what we call the key and 1 is the value. Here only the key matter to us.
declare -A didarr
while read line; do
    #put sorted line digits into array, I'll give you one way to do it, see if you can find others...
    inarr=$(echo ${line} | tr " " "\n" | sort -n | tr -d "\n")
    # For instance, inarr='123', so does '123' exist into our array ?
    if [ ${didarr[$inarr]} ]; then
         echo "Skipping ${line}"
    else
         echo "Processing ${line}"
         # Add the new processed line into the array as a key
         didarr[$inarr]=1
    fi
done< patterns.txt
Nicely done!

I had considered using a string at first but thought of multi-digit values for which a simple string concatenation would fail. Even though the original question only included three single digit examples, it did not explicitly exclude multi-digit integers or lines longer than three values. I tried to anticipate this and prompt more thought on the OP's part by using the word "integer" in my last post.

Here is the full commented script I had written for the original example:

Code:
#!/bin/bash

infile=patterns.txt
declare -A procd
while read line; do

#Get sorted array of _integer values_ from input line
#Array preserves values whereas simple string may fail for multi-digit numbers
inarr=($(echo ${line}| tr " " "\n" | sort -n))

#Test whether we have seen this _sorted sequence_ before
if [ 0${procd[${inarr[*]}]} -eq 1 ]; then
        echo "Skipping $line"
else
        echo "processing $line"
        #Add to array of processed lines using concatenation of values as key
        procd[${inarr[*]}]=1
fi
done< $infile
Here are results of an example patterns.txt with two added lines, using your string example and my original array example:

Code:
cat patterns.txt
...
22 1 21
1 122 2

./lougavulin.sh
...
Processing 22 1 21
Skipping 1 122 2

./script.sh
...
processing 22 1 21
processing 1 122 2
Thanks!

Last edited by astrogeek; 11-10-2018 at 07:08 PM. Reason: Abbreviated example
 
Old 11-10-2018, 06:56 PM   #21
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,263
Blog Entries: 24

Rep: Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194
Quote:
Originally Posted by TheGeniusLOL View Post
I'm sorry for my lack of understanding this. I am 50 years old, a late bloomer to all this shell scripting stuff. I read the article but it's too Greek crytic to me and confusing.
You youngsters and your quest for instant gratification! When I was your age, long ago...

Quote:
Originally Posted by TheGeniusLOL View Post
WOW! thanks lougavulin for the help. I couldn't do this in a million years if I tried. The script is still a bit cryptic for me to understand, but all that matter is that it works.
Then you should have said so up front instead of suggesting that you were trying to learn!

LQ is all about learning, sharing of knowledge, hence the approach most members take of trying to teach. If you are here only to get working code from others then you will likely find that you are in the wrong place!

Quote:
Originally Posted by TheGeniusLOL View Post
Perhaps, if I were younger and smarter, I could have understood those concepts.
Don't sell yourself short! I can't remember what I had for breakfast most days! Most of what I know about bash I have learned since I really was your age! If I can do it then anybody and their cat can do it better, believe me!

Good luck!
 
Old 11-10-2018, 07:42 PM   #22
lougavulin
Member
 
Registered: Jul 2018
Distribution: Slackware,x86_64,current
Posts: 279

Rep: Reputation: 100Reputation: 100
@danielbmartin, thank you for your Awk version !

@astrogeek, you are right about multi-digit integers. So, just for fun, changing this line in my version :
Code:
inarr=$(echo ${line}| tr " " "\n" | sort -n | tr -d "\n")
by this line :
Code:
inarr=$(echo ${line}| tr " " "\n" | sort -n | tr "\n" "0")
Returns with your new lines :
[CODE]
Processing 1 0 0
Processing 1 2 3
Processing 5 7 9
Skipping 3 2 1
Processing 0 0 7
Processing 22 1 21
Processing 1 122 2
[CODE]

@TheGeniusLOL, just to let you know what is great with Bash (or Awk), is you can take just part of code and try again and again by changing small things to understand what it does !
For example, this line which can seem cryptic :
Code:
inarr=$(echo ${line}| tr " " "\n" | sort -n | tr -d "\n")
But you can test it and play with it, like :
Code:
line="3 2 1"
echo ${line} | tr " " "\n"
echo ${line} | tr " " "\n" | sort -n
echo ${line} | tr " " "\n" | sort -n | tr -d "\n"
Which help to understand what each command does.
Code:
$ echo ${line} | tr " " "\n"
3
2
1
$ echo ${line} | tr " " "\n" | sort -n
1
2
3
$ echo ${line} | tr " " "\n" | sort -n | tr -d "\n"
123
 
1 members found this post helpful.
Old 11-11-2018, 11:15 AM   #23
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by TheGeniusLOL View Post
... I am 50 years old, a late bloomer to all this shell scripting stuff. ...
In 2010 I bought a used desktop computer and installed Linux. I was age 68 at that time. It was my first exposure to Linux. The learning curve was daunting and there were moments of discouragement. Having received help from LQ when I was a newbie, I now contribute when possible to "give back."

I dabble in programming for entertainment and to stave off old-age brain rot. My advice to you: Hang In There!

Daniel B. Martin

.
 
1 members found this post helpful.
Old 11-11-2018, 12:17 PM   #24
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
The awk shown in post #19 works for character strings of variable lengths. To test, you may build an InFile this way ...
Code:
echo; echo "Construct an InFile which consists of three-word lines,"
echo "  each of which contains three random positive integers"
echo "  expressed by name (i.e. One, Two, etc.) separated by blanks."
yes -- " "          \
|head -900          \
|nl -v100 -nln      \
|shuf               \
|sed 's/./& /g'     \
|sed 's/[ \t]*$//'  \
|sed 's/0/Zero/g;
      s/1/One/g;  
      s/2/Two/g;  
      s/3/Three/g;
      s/4/Four/g;
      s/5/Five/g;  
      s/6/Six/g;
      s/7/Seven/g;
      s/8/Eight/g;  
      s/9/Nine/g'   \
>$InFile
Daniel B. Martin

.

Last edited by danielbmartin; 11-11-2018 at 01:21 PM. Reason: Improve comments; no change to the code.
 
Old 11-12-2018, 06:48 AM   #25
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,780

Rep: Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198
Comments on post#17:
should have $line in "quotes" to avoid unwanted expansions
Code:
inarr=$(echo "$line" | tr " " "\n" | sort -n | tr -d "\n")
I think it is not correct to remove the " " separators. For example, how would you distinguish between 12 13 14 and 121 31 4?
Better becomes
Code:
inarr=$(echo "$line" | tr " " "\n" | sort -n | tr "\n" " ")
or leave them newline-separated, as with the following variant
Code:
#!/bin/bash

# An associative array, requires bash4
# For instance, didarr['100']=1
# With variables: didarr[$key]=$val
declare -A didarr

# we read from an explicit file descriptor that is set at the end of the block
while read line <&3
do
    # quote command arguments to avoid expansions, i.e. "$line"
    skey=$(echo "$line" | tr " " "\n" | sort -n)
    # for instance, skey=1\n2\n3
    # The +word modifier allows to test for existence (not for non-empty)
    if [[ -n ${didarr[$skey]+isset} ]]
    then
         echo "Skipping $line"
    else
         echo "Processing $line"
         # Add the new processed line into the array as a key
         # We can omit the value
         didarr[$skey]=
    fi
# We set an explicit file descriptor for the while-do-done block
done 3< patterns.txt
# After the block the shell has closed the explicit descriptor

Last edited by MadeInGermany; 11-12-2018 at 11:53 AM.
 
Old 11-12-2018, 07:42 AM   #26
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Just for fun I embellished the awk solution shown in post #19. This version tells that an InFile line was already processed and, in addition, identifies that predecessor.

Construct an InFile ...
Code:
echo; echo "Construct an InFile which consists of 5-character lines,"
echo "  each of which contains three random positive integers"
echo "  separated by blanks."
yes -- " "          \
|head -900          \
|nl -v100 -nln      \
|shuf               \
|sed 's/./& /g'     \
|sed 's/[ \t]*$//'  \
>$InFile
... or use one of your own devising.

Bounce the InFile against this awk ...
Code:
awk '{  \
# As each InFile line is read, create a Signature for it which is
#  its own three integers in sorted order.
# The Signature for InFile line 4 7 2 is 2 4 7.
# The Signature for InFile line 7 2 4 is also 2 4 7.
  split($0,w); Sig=""; for(j=1;j<=asort(w);j++) Sig=Sig w[j] " ";
# The array SeenBefore contains Signatures of integer triples already processed.
# If the SeenBefore array contains this Sigature, we already processed it.
  if (SeenBefore[Sig]) print "Skipping",$0,"because it was processed at line",SeenBefore[Sig]
# If the SeenBefore array does not contain this Signature,
#  we process the InFile line and add its Signature to the SeenBefore array.
      else {print "Processing",$0; SeenBefore[Sig]=NR" ("$0")"}}'  \
$InFile >$OutFile3
This is a small part of the OutFile, chosen to show the added feature...
Code:
Processing 5 7 7
Processing 3 5 6
Processing 9 4 6
Processing 1 5 6
Processing 9 3 1
Processing 1 0 8
Skipping 5 6 1 because it was processed at line 29 (1 5 6)
Skipping 9 1 3 because it was processed at line 30 (9 3 1)
Processing 7 4 7
Processing 1 3 5
Processing 9 4 0
Skipping 5 3 6 because it was processed at line 27 (3 5 6)
Processing 9 9 5
Processing 7 0 5
Daniel B. Martin

.
 
Old 11-13-2018, 12:48 PM   #27
Field95
LQ Newbie
 
Registered: Sep 2018
Location: xmpp:zemri@dismail.de
Posts: 13

Rep: Reputation: Disabled
It seems to be solved, so here's a python throw out.
It will take each value and generate permutations of it (excepting the current value), add it to a set and see if the value already exists in it.

Code:
#!/usr/bin/env python3

import itertools
import csv 
import sys


all_perm = set()
for file in sys.argv[1:]:
    with open(file) as f:
        reader = csv.reader(f, delimiter=",")
        for line in reader:                 
            line = tuple(line)

            possibilities = itertools.permutations(line)
            next(possibilities)
             
            for possibility in possibilities:                                   
                all_perm.add(possibility)                                       
                                                                                
            if line not in all_perm:                                            
                print(*line)                                                    
                all_perm.add(line)
Code:
./pattern.py patterns.txt
1 2 3
4 5 6
2 3 6
 
Old 11-15-2018, 06:00 AM   #28
igadoter
Senior Member
 
Registered: Sep 2006
Location: wroclaw, poland
Distribution: many, primary Slackware
Posts: 2,717
Blog Entries: 1

Rep: Reputation: 625Reputation: 625Reputation: 625Reputation: 625Reputation: 625Reputation: 625
It is already solved but essentially you should use hash here. Hash which should results the same no matter of order. This way you can group entries by they hashes. The only thing to do is to look into each group to look for different entries. The other is to rethink how to represent data. Bad choice always results complicated algorithms. In your case it would be much easier process data if before switch columns and lines in input.
 
Old 11-15-2018, 06:36 AM   #29
lougavulin
Member
 
Registered: Jul 2018
Distribution: Slackware,x86_64,current
Posts: 279

Rep: Reputation: 100Reputation: 100
Quote:
Originally Posted by igadoter View Post
It is already solved but essentially you should use hash here. Hash which should results the same no matter of order. This way you can group entries by they hashes. The only thing to do is to look into each group to look for different entries. The other is to rethink how to represent data. Bad choice always results complicated algorithms. In your case it would be much easier process data if before switch columns and lines in input.
I guess to get the same hash result for different order, you need to use a specific hash, because :
Code:
$ md5sum < <(echo "1 2 3")
f2b33fb7b3d0eb95090a16060e6a24f9  -
$ md5sum < <(echo "2 3 1")
b6e27c35b42671890dc567b86e7a6c69  -

$ sha256sum < <(echo "1 2 3")
1def07dbe06eeb097aafec8a40329937cd20c93a83634b8221ea2b41a894310c  -
$ sha256sum < <(echo "2 3 1")
8a0f00a6c372b85cfc5b19d375d054c3c763e1ea914415d36839e85a8109833e  -
Or I'm missing something...
 
Old 11-15-2018, 08:37 AM   #30
igadoter
Senior Member
 
Registered: Sep 2006
Location: wroclaw, poland
Distribution: many, primary Slackware
Posts: 2,717
Blog Entries: 1

Rep: Reputation: 625Reputation: 625Reputation: 625Reputation: 625Reputation: 625Reputation: 625
Most hashes are not 1-1. It was posted time ago how to change file without changing its md5 control sum. Of course but my proposal was just idea. How accurate I don't know as there is no data range, specification. Small, medium, large, huge, bigger than Universe - all matters.

But for trick to use control sums on the fly. I mean
Code:
$ md5sum < <(echo foobar)
Worth to remember.

Last edited by igadoter; 11-15-2018 at 08:42 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] A little help with awk, just the first number- if 1 then 0 else skip it. BW-userx Programming 9 07-05-2017 02:08 PM
[SOLVED] Does Setting RANDOM seed the random number generator? andrew.comly Linux - General 3 04-29-2016 10:26 AM
read number form function Abid Malik Linux - Games 1 10-18-2010 09:18 AM
random number icecubeflower General 43 10-11-2009 01:48 PM
I need random number in C ... purpleburple Programming 4 10-28-2002 04:37 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:43 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration