[SOLVED] How to skip a number even if it's a random form of it?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I understand the first step where we sort the contents from $line.
As for the remaining steps I am clueless.
Quote:
So your next task is to create an associative array, whose keys are the concatenated values from the sorted array, $inarr. I named this array processed in my first example (which included a give-away hint of how to do this!), but you can name yours anything you want.
I don't get it on how to create a concatented key array.
I'm sorry for my lack of understanding this. I am 50 years old, a late bloomer to all this shell scripting stuff. I read the article but it's too Greek crytic to me and confusing.
I need the complete working script if you can provide it. I really would appreciate it.
This is not exactly how astrogeek did it. For me, it seems simpler that way :
Code:
# An associative array
# For instance, dirarr['100']=1
# '100' is what we call the key and 1 is the value. Here only the key matter to us.
declare -A didarr
while read line; do
#put sorted line digits into array, I'll give you one way to do it, see if you can find others...
inarr=$(echo ${line} | tr " " "\n" | sort -n | tr -d "\n")
# For instance, inarr='123', so does '123' exist into our array ?
if [ ${didarr[$inarr]} ]; then
echo "Skipping ${line}"
else
echo "Processing ${line}"
# Add the new processed line into the array as a key
didarr[$inarr]=1
fi
done< patterns.txt
This is not exactly how astrogeek did it. For me, it seems simpler that way :
Code:
# An associative array
# For instance, dirarr['100']=1
# '100' is what we call the key and 1 is the value. Here only the key matter to us.
declare -A didarr
while read line; do
#put sorted line digits into array, I'll give you one way to do it, see if you can find others...
inarr=$(echo ${line} | tr " " "\n" | sort -n | tr -d "\n")
# For instance, inarr='123', so does '123' exist into our array ?
if [ ${didarr[$inarr]} ]; then
echo "Skipping ${line}"
else
echo "Processing ${line}"
# Add the new processed line into the array as a key
didarr[$inarr]=1
fi
done< patterns.txt
WOW! thanks lougavulin for the help. I couldn't do this in a million years if I tried. The script is still a bit cryptic for me to understand, but all that matter is that it works.
Also thanks astrogeek for your hints and detailed instructions. However, what you wanted me to do was beyond my understanding of arrays and bash scripting.
Lastly, thanks ondoho for the article. It was detailed and everything, but I gotten overwhelmed by the info and I didn't grasp it as much.
Perhaps, if I were younger and smarter, I could have understood those concepts.
I wrote an awk solution but did not post it before now because the OP wanted a bash solution.
This code ...
Code:
echo; echo "Construct an InFile which consists of 5-character lines,"
echo " each of which contains three random positive integers"
echo " separated by blanks."
yes -- " " \
|head -900 \
|nl -v100 -nln \
|shuf \
|sed 's/./& /g' \
|sed 's/[ \t]*$//' \
>$InFile
... builds a test InFile.
This annotated awk ...
Code:
awk '{ \
# As each InFile line is read, create a Signature for it which is
# its own three integers in sorted order.
# The Signature for InFile line 4 7 2 is 2 4 7.
# The Signature for InFile line 7 2 4 is also 2 4 7.
split($0,w); Sig=""; for(j=1;j<=asort(w);j++) Sig=Sig w[j] " ";
# The array SeenBefore contains Signatures of integer triples already processed.
# If the SeenBefore array contains this Sigature, we already processed it.
if (SeenBefore[Sig]) print "Skipping",$0
# If the SeenBefore array does not contain this Signature,
# we process the InFile line and add its Signature to the SeenBefore array.
else {print "Processing",$0; SeenBefore[Sig]=1}}' \
$InFile >$OutFile1
... generates an OutFile which is identical to that produced by the bash solution posted by lougavulin.
This is not exactly how astrogeek did it. For me, it seems simpler that way :
Code:
# An associative array
# For instance, dirarr['100']=1
# '100' is what we call the key and 1 is the value. Here only the key matter to us.
declare -A didarr
while read line; do
#put sorted line digits into array, I'll give you one way to do it, see if you can find others...
inarr=$(echo ${line} | tr " " "\n" | sort -n | tr -d "\n")
# For instance, inarr='123', so does '123' exist into our array ?
if [ ${didarr[$inarr]} ]; then
echo "Skipping ${line}"
else
echo "Processing ${line}"
# Add the new processed line into the array as a key
didarr[$inarr]=1
fi
done< patterns.txt
Nicely done!
I had considered using a string at first but thought of multi-digit values for which a simple string concatenation would fail. Even though the original question only included three single digit examples, it did not explicitly exclude multi-digit integers or lines longer than three values. I tried to anticipate this and prompt more thought on the OP's part by using the word "integer" in my last post.
Here is the full commented script I had written for the original example:
Code:
#!/bin/bash
infile=patterns.txt
declare -A procd
while read line; do
#Get sorted array of _integer values_ from input line
#Array preserves values whereas simple string may fail for multi-digit numbers
inarr=($(echo ${line}| tr " " "\n" | sort -n))
#Test whether we have seen this _sorted sequence_ before
if [ 0${procd[${inarr[*]}]} -eq 1 ]; then
echo "Skipping $line"
else
echo "processing $line"
#Add to array of processed lines using concatenation of values as key
procd[${inarr[*]}]=1
fi
done< $infile
Here are results of an example patterns.txt with two added lines, using your string example and my original array example:
I'm sorry for my lack of understanding this. I am 50 years old, a late bloomer to all this shell scripting stuff. I read the article but it's too Greek crytic to me and confusing.
You youngsters and your quest for instant gratification! When I was your age, long ago...
Quote:
Originally Posted by TheGeniusLOL
WOW! thanks lougavulin for the help. I couldn't do this in a million years if I tried. The script is still a bit cryptic for me to understand, but all that matter is that it works.
Then you should have said so up front instead of suggesting that you were trying to learn!
LQ is all about learning, sharing of knowledge, hence the approach most members take of trying to teach. If you are here only to get working code from others then you will likely find that you are in the wrong place!
Quote:
Originally Posted by TheGeniusLOL
Perhaps, if I were younger and smarter, I could have understood those concepts.
Don't sell yourself short! I can't remember what I had for breakfast most days! Most of what I know about bash I have learned since I really was your age! If I can do it then anybody and their cat can do it better, believe me!
@TheGeniusLOL, just to let you know what is great with Bash (or Awk), is you can take just part of code and try again and again by changing small things to understand what it does !
For example, this line which can seem cryptic :
... I am 50 years old, a late bloomer to all this shell scripting stuff. ...
In 2010 I bought a used desktop computer and installed Linux. I was age 68 at that time. It was my first exposure to Linux. The learning curve was daunting and there were moments of discouragement. Having received help from LQ when I was a newbie, I now contribute when possible to "give back."
I dabble in programming for entertainment and to stave off old-age brain rot. My advice to you: Hang In There!
or leave them newline-separated, as with the following variant
Code:
#!/bin/bash
# An associative array, requires bash4
# For instance, didarr['100']=1
# With variables: didarr[$key]=$val
declare -A didarr
# we read from an explicit file descriptor that is set at the end of the block
while read line <&3
do
# quote command arguments to avoid expansions, i.e. "$line"
skey=$(echo "$line" | tr " " "\n" | sort -n)
# for instance, skey=1\n2\n3
# The +word modifier allows to test for existence (not for non-empty)
if [[ -n ${didarr[$skey]+isset} ]]
then
echo "Skipping $line"
else
echo "Processing $line"
# Add the new processed line into the array as a key
# We can omit the value
didarr[$skey]=
fi
# We set an explicit file descriptor for the while-do-done block
done 3< patterns.txt
# After the block the shell has closed the explicit descriptor
Last edited by MadeInGermany; 11-12-2018 at 11:53 AM.
Just for fun I embellished the awk solution shown in post #19. This version tells that an InFile line was already processed and, in addition, identifies that predecessor.
Construct an InFile ...
Code:
echo; echo "Construct an InFile which consists of 5-character lines,"
echo " each of which contains three random positive integers"
echo " separated by blanks."
yes -- " " \
|head -900 \
|nl -v100 -nln \
|shuf \
|sed 's/./& /g' \
|sed 's/[ \t]*$//' \
>$InFile
... or use one of your own devising.
Bounce the InFile against this awk ...
Code:
awk '{ \
# As each InFile line is read, create a Signature for it which is
# its own three integers in sorted order.
# The Signature for InFile line 4 7 2 is 2 4 7.
# The Signature for InFile line 7 2 4 is also 2 4 7.
split($0,w); Sig=""; for(j=1;j<=asort(w);j++) Sig=Sig w[j] " ";
# The array SeenBefore contains Signatures of integer triples already processed.
# If the SeenBefore array contains this Sigature, we already processed it.
if (SeenBefore[Sig]) print "Skipping",$0,"because it was processed at line",SeenBefore[Sig]
# If the SeenBefore array does not contain this Signature,
# we process the InFile line and add its Signature to the SeenBefore array.
else {print "Processing",$0; SeenBefore[Sig]=NR" ("$0")"}}' \
$InFile >$OutFile3
This is a small part of the OutFile, chosen to show the added feature...
Code:
Processing 5 7 7
Processing 3 5 6
Processing 9 4 6
Processing 1 5 6
Processing 9 3 1
Processing 1 0 8
Skipping 5 6 1 because it was processed at line 29 (1 5 6)
Skipping 9 1 3 because it was processed at line 30 (9 3 1)
Processing 7 4 7
Processing 1 3 5
Processing 9 4 0
Skipping 5 3 6 because it was processed at line 27 (3 5 6)
Processing 9 9 5
Processing 7 0 5
It seems to be solved, so here's a python throw out.
It will take each value and generate permutations of it (excepting the current value), add it to a set and see if the value already exists in it.
Code:
#!/usr/bin/env python3
import itertools
import csv
import sys
all_perm = set()
for file in sys.argv[1:]:
with open(file) as f:
reader = csv.reader(f, delimiter=",")
for line in reader:
line = tuple(line)
possibilities = itertools.permutations(line)
next(possibilities)
for possibility in possibilities:
all_perm.add(possibility)
if line not in all_perm:
print(*line)
all_perm.add(line)
It is already solved but essentially you should use hash here. Hash which should results the same no matter of order. This way you can group entries by they hashes. The only thing to do is to look into each group to look for different entries. The other is to rethink how to represent data. Bad choice always results complicated algorithms. In your case it would be much easier process data if before switch columns and lines in input.
It is already solved but essentially you should use hash here. Hash which should results the same no matter of order. This way you can group entries by they hashes. The only thing to do is to look into each group to look for different entries. The other is to rethink how to represent data. Bad choice always results complicated algorithms. In your case it would be much easier process data if before switch columns and lines in input.
I guess to get the same hash result for different order, you need to use a specific hash, because :
Most hashes are not 1-1. It was posted time ago how to change file without changing its md5 control sum. Of course but my proposal was just idea. How accurate I don't know as there is no data range, specification. Small, medium, large, huge, bigger than Universe - all matters.
But for trick to use control sums on the fly. I mean
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.