[SOLVED] How to know if a variable is similar to another
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
imagining that i have 2 variables similar but not equal , how is the code written to know this ? I have been trying to figure out how to start but i have no idea . assumming :
Code:
var1="Yes i know it for sure"
var2="Yes i know is for real"
There are very similarities in these 2 variables , how can this be done without writing the code specifically to these 2 var text .
Not sure what you're asking, here, or for what language. You're wanting to compare variables...in a program...*WITHOUT* writing code??? How, exactly, do you expect a program to work if you DON'T write code??
In the most simplistic sense, you would do an IF:
Code:
if ($var1 eq $var2) {
<do something>
}
That's it...what is IN var1 and 2 can come from anywhere, including hard-coding them. You can also (depending on the language and your actual goal), match in part of a string, be case sensitive, look for a particular word, string of characters, or even count the characters. Your question boils down to, "how can I write a program?", which is FAR too open ended. Been asking about such things for quite a while now.
I mean i bash code , and with if statements we dont have similar , witch turns difficult to write .
That if you wrote checks if it is equal , witch is not but its similar .
You will need to define "similar". Can you give an example like the first two above that would not be similar? For example...
Code:
var1="Yes i know it for sure"
var2="Yes i know is for real"
var3="Yes i saw it for sure"
var4="Yet I know it was real"
For example, do they have a similar number of characters, similar number of syllables, similar verb or subject, similar in meaning, etc.
For things which "sound" similar you should search for the soundex algorithm, although I think that is a bit old. There is also a perl soundex type library - I am totally unfamiliar with it.
But again, try to define similar and non-similar in your use case first.
You can check two strings if they are the same or not i.e A = B, A !=B
You can check two strings by lexicographical (alphabetical) order i.e A < b or A > B
You can check if a string matches an expression (regex) i.e. A =~ <some expression>
You can check a string to see if it is empty or not.
You can check for a substring within a string.
But what exactly do you mean by similar which is somewhat subjective. If you are asking how many characters are an exact match then I do not know of a function and you have to write a bit of code.
the only thing i can think out for this to work is to split the words of the 2 variables and count how many exist , something like this :
Code:
#!/bin/bash
rm tmp1.file >/dev/null 2>&1
rm tmp2.file >/dev/null 2>&1
eq="0"
var1=$(echo "yes i know it for sure" | tr " " "\n" > tmp1.file)
var2=$(echo "no i know it for not" | tr " " "\n" > tmp2.file)
var3=$(wc -l tmp1.file | awk '{print$1}')
for i in $(seq "$var3")
do
rdword=$(sed -n ${i}p tmp1.file)
chkword=$(grep -w "$rdword" tmp2.file)
if [[ ! -z "$chkword" ]]
then
eq=$((eq+1))
fi
done
echo "got $eq similar words of $var3"
but this is a scratch because the words could be in different position and it assumes as similar witch is not .
However for what i want this will work , the only problem is to define the percentage of count that is considered similar , lets say
7 of 10 = similar
3 of 10 = not similar
but i can have texts with 20 words or less , determining these percentages could be a challenge in code .
Ok, so your version of 'similar' is how many words does each sentence have in common (if I have gleaned your script correctly)
So the next question would be, do you consider a single word as a match if it only appears once in one sentence but multiple times in the other? (as grep will match it always)
If above is not desired, you may have to also remove found words from the second sentence so you only match the count exactly.
Also, for someone who has been using bash, at least on this site, for as long as you have, you should realise the need for temp files and convoluted piping is not needed.
Simply place your sentences into arrays instead of temp files
Count in arrays is done using ${#arr[@]} so wc and awk definitely not needed
seq also not needed as just use 'for word in "${arr[@]}"'
grep is easier but =~ in bash could do this sorrt of simple matching
you can test the return of grep with 'if' so '-z' test not required
eq=$((eq+1)) is more simply ((eq++))
imagining that i have 2 variables similar but not equal , how is the code written to know this ?
To be precise you want to compare the value of two variables and quantify their "sameness." There are well-documented mathematical ways to do this. To educate yourself, start here...
#!/bin/bash
rm tmp1.file
rm tmp2.file
eq="0"
echo -n "Write 1st Variable : "
read -r var1
echo -n "Write 2nd Variable : "
read -r var2
if [[ -z "$var1" && -z "$var2" ]]
then
echo "Empty variables"
exit 0
fi
echo "$var1" | tr " " "\n" > tmp1.file
echo "$var2" | tr " " "\n" > tmp2.file
var3=$(wc -l tmp1.file | awk '{print$1}')
for i in $(seq "$var3")
do
rdword=$(sed -n ${i}p tmp1.file)
chkword=$(grep -w "$rdword" tmp2.file)
if [[ ! -z "$chkword" ]]
then
eq=$((eq+1))
fi
done
nmb=$(echo "$var3 / 2" | bc )
if [[ "$eq" -ge "$nmb" ]]
then
echo "Its Similar"
else
echo "Its not similar"
fi
Basically it splits the counted number of words of first variable in 2 , then starts the searching on file 2 , if in the end 50% or more were found then its similar , else is not .
I know this code can be refined , this was just made on the run here .
As it was an intersting bas to write, this is what I was thinking of:
Code:
#!/usr/bin/env bash
declare -a sent1 sent2
declare word1 word2 cnt perc
perc=70
read -rp "Write a sentence: " -a sent1
read -rp "Write a sentence: " -a sent2
if [[ -z "${sent1[0]}" && -z "${sent2[0]}" ]]
then
echo "Sentences are equal as both are empty"
exit
elif [[ -z "${sent1[0]}" || -z "${sent2[0]}" ]]
then
echo "Sentences are not equal as one is empty"
exit
fi
for word1 in "${sent1[@]}"
do
for word2 in "${!sent2[@]}"
do
if [[ "$word1" == "${sent2[$word2]}" ]]
then
sent2[$word2]=""
(( cnt++ ))
fi
done
done
if (( (100 * cnt) / ${#sent1[*]} >= perc ))
then
echo "Sentences are at least 70% similar"
else
echo "Sentences are less then 70% similar"
fi
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,803
Rep:
Quote:
Originally Posted by pedropt
imagining that i have 2 variables similar but not equal , how is the code written to know this ?
I have been trying to figure out how to start but i have no idea .
assumming :
There are very similarities in these 2 variables , how can this be done without writing the code specifically to these 2 var text .
If you're using Python, you might look at the "fuzzywuzzy" module (find it here if it's not available from your distribution's repository). I used it a while back to to do "fuzzy" matches of user-entered text to entries in a database. It has been (or is in the process of being) ported to several other languages (check that link for a list).
As it was an interesting bash to write, this is what I was thinking of ...
Thank you for a useful piece of code. Useful, but with limitations. We may refer to the words of astrogeek in post #4...
Quote:
You will need to define "similar".
Consider these sentences: Four score and seven years ago
Four_score_and_seven_years_ago
The human eye (and mind) might consider them equivalent. The meaning is understood, yet your solution produces this result: Sentences are less then 70% similar
Beauty (and similarity) are in the eye of the beholder.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.