LinuxQuestions.org - [SOLVED] How to know if a variable is similar to another

Page 1 of 2

Show 50 post(s) from this thread on one page

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - How to know if a variable is similar to another (https://www.linuxquestions.org/questions/programming-9/how-to-know-if-a-variable-is-similar-to-another-4175702908/)

pedropt

10-31-2021 06:27 PM

How to know if a variable is similar to another

imagining that i have 2 variables similar but not equal , how is the code written to know this ?

I have been trying to figure out how to start but i have no idea .

assumming :

Quote:

var1="Yes i know it for sure"
var2="Yes i know is for real"

There are very similarities in these 2 variables , how can this be done without writing the code specifically to these 2 var text .

TB0ne

10-31-2021 06:35 PM

Quote:

Originally Posted by pedropt (Post 6297401)

imagining that i have 2 variables similar but not equal , how is the code written to know this ? I have been trying to figure out how to start but i have no idea . assumming :

Code:

var1="Yes i know it for sure"

var2="Yes i know is for real"

There are very similarities in these 2 variables , how can this be done without writing the code specifically to these 2 var text .

Not sure what you're asking, here, or for what language. You're wanting to compare variables...in a program...*WITHOUT* writing code??? How, exactly, do you expect a program to work if you DON'T write code??

In the most simplistic sense, you would do an IF:

Code:

if ($var1 eq $var2) {

  <do something>

}

That's it...what is IN var1 and 2 can come from anywhere, including hard-coding them. You can also (depending on the language and your actual goal), match in part of a string, be case sensitive, look for a particular word, string of characters, or even count the characters. Your question boils down to, "how can I write a program?", which is FAR too open ended. Been asking about such things for quite a while now.

pedropt

10-31-2021 06:41 PM

I mean i bash code , and with if statements we dont have similar , witch turns difficult to write .
That if you wrote checks if it is equal , witch is not but its similar .

astrogeek

10-31-2021 06:55 PM

You will need to define "similar". Can you give an example like the first two above that would not be similar? For example...

Code:

var1="Yes i know it for sure"

var2="Yes i know is for real"

var3="Yes i saw it for sure"

var4="Yet I know it was real"

For example, do they have a similar number of characters, similar number of syllables, similar verb or subject, similar in meaning, etc.

For things which "sound" similar you should search for the soundex algorithm, although I think that is a bit old. There is also a perl soundex type library - I am totally unfamiliar with it.

But again, try to define similar and non-similar in your use case first.

michaelk

10-31-2021 06:59 PM

You can check two strings if they are the same or not i.e A = B, A !=B

You can check two strings by lexicographical (alphabetical) order i.e A < b or A > B

You can check if a string matches an expression (regex) i.e. A =~ <some expression>

You can check a string to see if it is empty or not.

You can check for a substring within a string.

But what exactly do you mean by similar which is somewhat subjective. If you are asking how many characters are an exact match then I do not know of a function and you have to write a bit of code.

pedropt

10-31-2021 07:08 PM

the only thing i can think out for this to work is to split the words of the 2 variables and count how many exist , something like this :

Code:

#!/bin/bash

rm tmp1.file >/dev/null 2>&1

rm tmp2.file >/dev/null 2>&1

eq="0"

var1=$(echo "yes i know it for sure" | tr " " "\n" > tmp1.file)

var2=$(echo "no i know it for not" | tr " " "\n" > tmp2.file)

var3=$(wc -l tmp1.file | awk '{print$1}')

for i in $(seq "$var3")

do

rdword=$(sed -n ${i}p tmp1.file)

chkword=$(grep -w "$rdword" tmp2.file)

if [[ ! -z "$chkword" ]]

then

eq=$((eq+1))

fi

done

echo "got $eq similar words of $var3"

but this is a scratch because the words could be in different position and it assumes as similar witch is not .
However for what i want this will work , the only problem is to define the percentage of count that is considered similar , lets say
7 of 10 = similar
3 of 10 = not similar

but i can have texts with 20 words or less , determining these percentages could be a challenge in code .

grail

10-31-2021 07:39 PM

Ok, so your version of 'similar' is how many words does each sentence have in common (if I have gleaned your script correctly)

So the next question would be, do you consider a single word as a match if it only appears once in one sentence but multiple times in the other? (as grep will match it always)

If above is not desired, you may have to also remove found words from the second sentence so you only match the count exactly.

Also, for someone who has been using bash, at least on this site, for as long as you have, you should realise the need for temp files and convoluted piping is not needed.
Simply place your sentences into arrays instead of temp files
Count in arrays is done using ${#arr[@]} so wc and awk definitely not needed
seq also not needed as just use 'for word in "${arr[@]}"'
grep is easier but =~ in bash could do this sorrt of simple matching
you can test the return of grep with 'if' so '-z' test not required
eq=$((eq+1)) is more simply ((eq++))

michaelk

10-31-2021 07:43 PM

It sounds like you want something like a natural language parser.

danielbmartin

10-31-2021 08:08 PM

Quote:

Originally Posted by pedropt (Post 6297401)

imagining that i have 2 variables similar but not equal , how is the code written to know this ?

To be precise you want to compare the value of two variables and quantify their "sameness." There are well-documented mathematical ways to do this. To educate yourself, start here...

String similarity — the basic know your algorithms guide!
by Mohit Mayank
https://itnext.io/string-similarity-...e-3de3d7346227

Daniel B. Martin

.

pedropt

10-31-2021 08:18 PM

I think i found the solution for this :

Code:

#!/bin/bash

rm tmp1.file

rm tmp2.file

eq="0"

echo -n "Write 1st Variable : "

read -r var1

echo -n "Write 2nd Variable : "

read -r var2



if [[ -z "$var1" && -z "$var2" ]]

then

echo "Empty variables"

exit 0

fi

echo "$var1" | tr " " "\n" > tmp1.file

echo "$var2" | tr " " "\n" > tmp2.file

var3=$(wc -l tmp1.file | awk '{print$1}')

for i in $(seq "$var3")

do

rdword=$(sed -n ${i}p tmp1.file)

chkword=$(grep -w "$rdword" tmp2.file)

if [[ ! -z "$chkword" ]]

then

eq=$((eq+1))

fi

done

nmb=$(echo "$var3 / 2" | bc )

if [[ "$eq" -ge "$nmb" ]]

then

echo "Its Similar"

else

echo "Its not similar"

fi

Basically it splits the counted number of words of first variable in 2 , then starts the searching on file 2 , if in the end 50% or more were found then its similar , else is not .

I know this code can be refined , this was just made on the run here .

dugan

10-31-2021 08:48 PM

Check this out:

http://fstrcmp.sourceforge.net/

Apparently, most distros have it in their standard repos.

grail

10-31-2021 09:44 PM

As it was an intersting bas to write, this is what I was thinking of:

Code:

#!/usr/bin/env bash



declare -a sent1 sent2

declare word1 word2 cnt perc



perc=70



read -rp "Write a sentence: " -a sent1

read -rp "Write a sentence: " -a sent2



if [[ -z "${sent1[0]}" && -z "${sent2[0]}" ]]

then

        echo "Sentences are equal as both are empty"

        exit

elif [[ -z "${sent1[0]}" || -z "${sent2[0]}" ]]

then

        echo "Sentences are not equal as one is empty"

        exit

fi



for word1 in "${sent1[@]}"

do

        for word2 in "${!sent2[@]}"

        do

                if [[ "$word1" == "${sent2[$word2]}" ]]

                then

                        sent2[$word2]=""

                        (( cnt++ ))

                fi

        done

done



if (( (100 * cnt) / ${#sent1[*]} >= perc ))

then

        echo "Sentences are at least 70% similar"

else

        echo "Sentences are less then 70% similar"

fi

rnturn

10-31-2021 10:51 PM

Quote:

Originally Posted by pedropt (Post 6297401)

imagining that i have 2 variables similar but not equal , how is the code written to know this ?

I have been trying to figure out how to start but i have no idea .

assumming :

There are very similarities in these 2 variables , how can this be done without writing the code specifically to these 2 var text .

If you're using Python, you might look at the "fuzzywuzzy" module (find it here if it's not available from your distribution's repository). I used it a while back to to do "fuzzy" matches of user-entered text to entries in a database. It has been (or is in the process of being) ported to several other languages (check that link for a list).

HTH...

danielbmartin

11-01-2021 05:53 AM

Quote:

Originally Posted by grail (Post 6297443)

As it was an interesting bash to write, this is what I was thinking of ...

Thank you for a useful piece of code. Useful, but with limitations. We may refer to the words of astrogeek in post #4...

Quote:

You will need to define "similar".

Consider these sentences:
Four score and seven years ago
Four_score_and_seven_years_ago

The human eye (and mind) might consider them equivalent. The meaning is understood, yet your solution produces this result:
Sentences are less then 70% similar

Beauty (and similarity) are in the eye of the beholder.

Daniel B. Martin

.

pan64

11-01-2021 06:26 AM

there is something called similarity index: https://stackoverflow.com/questions/...en-two-strings (containing a lot of additional hints too)

All times are GMT -5. The time now is 04:54 PM.

Page 1 of 2

Show 50 post(s) from this thread on one page