[SOLVED] How to know if a variable is similar to another

pedropt · 10-31-2021, 06:27 PM

imagining that i have 2 variables similar but not equal , how is the code written to know this ?

I have been trying to figure out how to start but i have no idea .

assumming :

Quote:

var1="Yes i know it for sure"
var2="Yes i know is for real"

There are very similarities in these 2 variables , how can this be done without writing the code specifically to these 2 var text .

TB0ne · 10-31-2021, 06:35 PM

Quote:

Originally Posted by pedropt

imagining that i have 2 variables similar but not equal , how is the code written to know this ? I have been trying to figure out how to start but i have no idea . assumming :

Code:

var1="Yes i know it for sure"
var2="Yes i know is for real"

There are very similarities in these 2 variables , how can this be done without writing the code specifically to these 2 var text .

Not sure what you're asking, here, or for what language. You're wanting to compare variables...in a program...*WITHOUT* writing code??? How, exactly, do you expect a program to work if you DON'T write code??

In the most simplistic sense, you would do an IF:

Code:

if ($var1 eq $var2) {
   <do something>
}

That's it...what is IN var1 and 2 can come from anywhere, including hard-coding them. You can also (depending on the language and your actual goal), match in part of a string, be case sensitive, look for a particular word, string of characters, or even count the characters. Your question boils down to, "how can I write a program?", which is FAR too open ended. Been asking about such things for quite a while now.

pedropt · 10-31-2021, 06:41 PM

I mean i bash code , and with if statements we dont have similar , witch turns difficult to write .
That if you wrote checks if it is equal , witch is not but its similar .

astrogeek · 10-31-2021, 06:55 PM

You will need to define "similar". Can you give an example like the first two above that would not be similar? For example...

Code:

var1="Yes i know it for sure"
var2="Yes i know is for real"
var3="Yes i saw it for sure"
var4="Yet I know it was real"

For example, do they have a similar number of characters, similar number of syllables, similar verb or subject, similar in meaning, etc.

For things which "sound" similar you should search for the soundex algorithm, although I think that is a bit old. There is also a perl soundex type library - I am totally unfamiliar with it.

But again, try to define similar and non-similar in your use case first.

michaelk · 10-31-2021, 06:59 PM

You can check two strings if they are the same or not i.e A = B, A !=B

You can check two strings by lexicographical (alphabetical) order i.e A < b or A > B

You can check if a string matches an expression (regex) i.e. A =~ <some expression>

You can check a string to see if it is empty or not.

You can check for a substring within a string.

But what exactly do you mean by similar which is somewhat subjective. If you are asking how many characters are an exact match then I do not know of a function and you have to write a bit of code.

pedropt · 10-31-2021, 07:08 PM

the only thing i can think out for this to work is to split the words of the 2 variables and count how many exist , something like this :

Code:

#!/bin/bash
rm tmp1.file >/dev/null 2>&1
rm tmp2.file >/dev/null 2>&1
eq="0"
var1=$(echo "yes i know it for sure" | tr " " "\n" > tmp1.file)
var2=$(echo "no i know it for not" | tr " " "\n" > tmp2.file)
var3=$(wc -l tmp1.file | awk '{print$1}')
for i in $(seq "$var3")
do
rdword=$(sed -n ${i}p tmp1.file)
chkword=$(grep -w "$rdword" tmp2.file)
if [[ ! -z "$chkword" ]]
then
eq=$((eq+1))
fi
done
echo "got $eq similar words of $var3"

but this is a scratch because the words could be in different position and it assumes as similar witch is not .
However for what i want this will work , the only problem is to define the percentage of count that is considered similar , lets say
7 of 10 = similar
3 of 10 = not similar

but i can have texts with 20 words or less , determining these percentages could be a challenge in code .

grail · 10-31-2021, 07:39 PM

Ok, so your version of 'similar' is how many words does each sentence have in common (if I have gleaned your script correctly)

So the next question would be, do you consider a single word as a match if it only appears once in one sentence but multiple times in the other? (as grep will match it always)

If above is not desired, you may have to also remove found words from the second sentence so you only match the count exactly.

Also, for someone who has been using bash, at least on this site, for as long as you have, you should realise the need for temp files and convoluted piping is not needed.
Simply place your sentences into arrays instead of temp files
Count in arrays is done using ${#arr[@]} so wc and awk definitely not needed
seq also not needed as just use 'for word in "${arr[@]}"'
grep is easier but =~ in bash could do this sorrt of simple matching
you can test the return of grep with 'if' so '-z' test not required
eq=$((eq+1)) is more simply ((eq++))

michaelk · 10-31-2021, 07:43 PM

It sounds like you want something like a natural language parser.

danielbmartin · 10-31-2021, 08:08 PM

Quote:

Originally Posted by pedropt

imagining that i have 2 variables similar but not equal , how is the code written to know this ?

To be precise you want to compare the value of two variables and quantify their "sameness." There are well-documented mathematical ways to do this. To educate yourself, start here...

String similarity — the basic know your algorithms guide!
by Mohit Mayank
https://itnext.io/string-similarity-...e-3de3d7346227

Daniel B. Martin

.

pedropt · 10-31-2021, 08:18 PM

I think i found the solution for this :

Code:

#!/bin/bash
rm tmp1.file
rm tmp2.file
eq="0"
echo -n "Write 1st Variable : "
read -r var1
echo -n "Write 2nd Variable : "
read -r var2

if [[ -z "$var1" && -z "$var2" ]]
then
echo "Empty variables"
exit 0
fi
echo "$var1" | tr " " "\n" > tmp1.file
echo "$var2" | tr " " "\n" > tmp2.file
var3=$(wc -l tmp1.file | awk '{print$1}')
for i in $(seq "$var3")
do
rdword=$(sed -n ${i}p tmp1.file)
chkword=$(grep -w "$rdword" tmp2.file)
if [[ ! -z "$chkword" ]]
then
eq=$((eq+1))
fi
done
nmb=$(echo "$var3 / 2" | bc )
if [[ "$eq" -ge "$nmb" ]]
then
echo "Its Similar"
else
echo "Its not similar"
fi

Basically it splits the counted number of words of first variable in 2 , then starts the searching on file 2 , if in the end 50% or more were found then its similar , else is not .

I know this code can be refined , this was just made on the run here .

dugan · 10-31-2021, 08:48 PM

Check this out:

http://fstrcmp.sourceforge.net/

Apparently, most distros have it in their standard repos.

grail · 10-31-2021, 09:44 PM

As it was an intersting bas to write, this is what I was thinking of:

Code:

#!/usr/bin/env bash

declare -a sent1 sent2
declare word1 word2 cnt perc

perc=70

read -rp "Write a sentence: " -a sent1
read -rp "Write a sentence: " -a sent2

if [[ -z "${sent1[0]}" && -z "${sent2[0]}" ]]
then
	echo "Sentences are equal as both are empty"
	exit
elif [[ -z "${sent1[0]}" || -z "${sent2[0]}" ]]
then
	echo "Sentences are not equal as one is empty"
	exit
fi

for word1 in "${sent1[@]}"
do
	for word2 in "${!sent2[@]}"
	do
		if [[ "$word1" == "${sent2[$word2]}" ]]
		then
			sent2[$word2]=""
			(( cnt++ ))
		fi
	done
done

if (( (100 * cnt) / ${#sent1[*]} >= perc ))
then
	echo "Sentences are at least 70% similar"
else
	echo "Sentences are less then 70% similar"
fi

rnturn · 10-31-2021, 10:51 PM

Quote:

Originally Posted by pedropt

imagining that i have 2 variables similar but not equal , how is the code written to know this ?

I have been trying to figure out how to start but i have no idea .

assumming :

There are very similarities in these 2 variables , how can this be done without writing the code specifically to these 2 var text .

If you're using Python, you might look at the "fuzzywuzzy" module (find it here if it's not available from your distribution's repository). I used it a while back to to do "fuzzy" matches of user-entered text to entries in a database. It has been (or is in the process of being) ported to several other languages (check that link for a list).

HTH...

danielbmartin · 11-01-2021, 05:53 AM

Quote:

Originally Posted by grail

As it was an interesting bash to write, this is what I was thinking of ...

Thank you for a useful piece of code. Useful, but with limitations. We may refer to the words of astrogeek in post #4...

Quote:

You will need to define "similar".

Consider these sentences:
Four score and seven years ago
Four_score_and_seven_years_ago

The human eye (and mind) might consider them equivalent. The meaning is understood, yet your solution produces this result:
Sentences are less then 70% similar

Beauty (and similarity) are in the eye of the beholder.

Daniel B. Martin

.

pan64 · 11-01-2021, 06:26 AM

there is something called similarity index: https://stackoverflow.com/questions/...en-two-strings (containing a lot of additional hints too)