LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-31-2021, 07:27 PM   #1
pedropt
Member
 
Registered: Aug 2014
Distribution: Devuan
Posts: 324

Rep: Reputation: Disabled
How to know if a variable is similar to another


imagining that i have 2 variables similar but not equal , how is the code written to know this ?

I have been trying to figure out how to start but i have no idea .

assumming :

Quote:
var1="Yes i know it for sure"
var2="Yes i know is for real"
There are very similarities in these 2 variables , how can this be done without writing the code specifically to these 2 var text .
 
Old 10-31-2021, 07:35 PM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 24,292

Rep: Reputation: 7138Reputation: 7138Reputation: 7138Reputation: 7138Reputation: 7138Reputation: 7138Reputation: 7138Reputation: 7138Reputation: 7138Reputation: 7138Reputation: 7138
Quote:
Originally Posted by pedropt View Post
imagining that i have 2 variables similar but not equal , how is the code written to know this ? I have been trying to figure out how to start but i have no idea . assumming :
Code:
var1="Yes i know it for sure"
var2="Yes i know is for real"
There are very similarities in these 2 variables , how can this be done without writing the code specifically to these 2 var text .
Not sure what you're asking, here, or for what language. You're wanting to compare variables...in a program...*WITHOUT* writing code??? How, exactly, do you expect a program to work if you DON'T write code??

In the most simplistic sense, you would do an IF:
Code:
if ($var1 eq $var2) {
   <do something>
}
That's it...what is IN var1 and 2 can come from anywhere, including hard-coding them. You can also (depending on the language and your actual goal), match in part of a string, be case sensitive, look for a particular word, string of characters, or even count the characters. Your question boils down to, "how can I write a program?", which is FAR too open ended. Been asking about such things for quite a while now.
 
Old 10-31-2021, 07:41 PM   #3
pedropt
Member
 
Registered: Aug 2014
Distribution: Devuan
Posts: 324

Original Poster
Rep: Reputation: Disabled
I mean i bash code , and with if statements we dont have similar , witch turns difficult to write .
That if you wrote checks if it is equal , witch is not but its similar .
 
Old 10-31-2021, 07:55 PM   #4
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_12{.0|.1}
Posts: 5,777
Blog Entries: 23

Rep: Reputation: 3785Reputation: 3785Reputation: 3785Reputation: 3785Reputation: 3785Reputation: 3785Reputation: 3785Reputation: 3785Reputation: 3785Reputation: 3785Reputation: 3785
You will need to define "similar". Can you give an example like the first two above that would not be similar? For example...

Code:
var1="Yes i know it for sure"
var2="Yes i know is for real"
var3="Yes i saw it for sure"
var4="Yet I know it was real"
For example, do they have a similar number of characters, similar number of syllables, similar verb or subject, similar in meaning, etc.

For things which "sound" similar you should search for the soundex algorithm, although I think that is a bit old. There is also a perl soundex type library - I am totally unfamiliar with it.

But again, try to define similar and non-similar in your use case first.

Last edited by astrogeek; 10-31-2021 at 07:59 PM.
 
Old 10-31-2021, 07:59 PM   #5
michaelk
Moderator
 
Registered: Aug 2002
Posts: 22,081

Rep: Reputation: 4430Reputation: 4430Reputation: 4430Reputation: 4430Reputation: 4430Reputation: 4430Reputation: 4430Reputation: 4430Reputation: 4430Reputation: 4430Reputation: 4430
You can check two strings if they are the same or not i.e A = B, A !=B

You can check two strings by lexicographical (alphabetical) order i.e A < b or A > B

You can check if a string matches an expression (regex) i.e. A =~ <some expression>

You can check a string to see if it is empty or not.

You can check for a substring within a string.

But what exactly do you mean by similar which is somewhat subjective. If you are asking how many characters are an exact match then I do not know of a function and you have to write a bit of code.
 
Old 10-31-2021, 08:08 PM   #6
pedropt
Member
 
Registered: Aug 2014
Distribution: Devuan
Posts: 324

Original Poster
Rep: Reputation: Disabled
the only thing i can think out for this to work is to split the words of the 2 variables and count how many exist , something like this :


Code:
#!/bin/bash
rm tmp1.file >/dev/null 2>&1
rm tmp2.file >/dev/null 2>&1
eq="0"
var1=$(echo "yes i know it for sure" | tr " " "\n" > tmp1.file)
var2=$(echo "no i know it for not" | tr " " "\n" > tmp2.file)
var3=$(wc -l tmp1.file | awk '{print$1}')
for i in $(seq "$var3")
do
rdword=$(sed -n ${i}p tmp1.file)
chkword=$(grep -w "$rdword" tmp2.file)
if [[ ! -z "$chkword" ]]
then
eq=$((eq+1))
fi
done
echo "got $eq similar words of $var3"
but this is a scratch because the words could be in different position and it assumes as similar witch is not .
However for what i want this will work , the only problem is to define the percentage of count that is considered similar , lets say
7 of 10 = similar
3 of 10 = not similar

but i can have texts with 20 words or less , determining these percentages could be a challenge in code .

Last edited by pedropt; 10-31-2021 at 08:17 PM.
 
Old 10-31-2021, 08:39 PM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,852

Rep: Reputation: 3111Reputation: 3111Reputation: 3111Reputation: 3111Reputation: 3111Reputation: 3111Reputation: 3111Reputation: 3111Reputation: 3111Reputation: 3111Reputation: 3111
Ok, so your version of 'similar' is how many words does each sentence have in common (if I have gleaned your script correctly)

So the next question would be, do you consider a single word as a match if it only appears once in one sentence but multiple times in the other? (as grep will match it always)

If above is not desired, you may have to also remove found words from the second sentence so you only match the count exactly.

Also, for someone who has been using bash, at least on this site, for as long as you have, you should realise the need for temp files and convoluted piping is not needed.
Simply place your sentences into arrays instead of temp files
Count in arrays is done using ${#arr[@]} so wc and awk definitely not needed
seq also not needed as just use 'for word in "${arr[@]}"'
grep is easier but =~ in bash could do this sorrt of simple matching
you can test the return of grep with 'if' so '-z' test not required
eq=$((eq+1)) is more simply ((eq++))
 
Old 10-31-2021, 08:43 PM   #8
michaelk
Moderator
 
Registered: Aug 2002
Posts: 22,081

Rep: Reputation: 4430Reputation: 4430Reputation: 4430Reputation: 4430Reputation: 4430Reputation: 4430Reputation: 4430Reputation: 4430Reputation: 4430Reputation: 4430Reputation: 4430
It sounds like you want something like a natural language parser.
 
Old 10-31-2021, 09:08 PM   #9
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,841

Rep: Reputation: 649Reputation: 649Reputation: 649Reputation: 649Reputation: 649Reputation: 649
Quote:
Originally Posted by pedropt View Post
imagining that i have 2 variables similar but not equal , how is the code written to know this ?
To be precise you want to compare the value of two variables and quantify their "sameness." There are well-documented mathematical ways to do this. To educate yourself, start here...

String similarity the basic know your algorithms guide!
by Mohit Mayank
https://itnext.io/string-similarity-...e-3de3d7346227

Daniel B. Martin

.
 
1 members found this post helpful.
Old 10-31-2021, 09:18 PM   #10
pedropt
Member
 
Registered: Aug 2014
Distribution: Devuan
Posts: 324

Original Poster
Rep: Reputation: Disabled
I think i found the solution for this :

Code:
#!/bin/bash
rm tmp1.file
rm tmp2.file
eq="0"
echo -n "Write 1st Variable : "
read -r var1
echo -n "Write 2nd Variable : "
read -r var2

if [[ -z "$var1" && -z "$var2" ]]
then
echo "Empty variables"
exit 0
fi
echo "$var1" | tr " " "\n" > tmp1.file
echo "$var2" | tr " " "\n" > tmp2.file
var3=$(wc -l tmp1.file | awk '{print$1}')
for i in $(seq "$var3")
do
rdword=$(sed -n ${i}p tmp1.file)
chkword=$(grep -w "$rdword" tmp2.file)
if [[ ! -z "$chkword" ]]
then
eq=$((eq+1))
fi
done
nmb=$(echo "$var3 / 2" | bc )
if [[ "$eq" -ge "$nmb" ]]
then
echo "Its Similar"
else
echo "Its not similar"
fi
Basically it splits the counted number of words of first variable in 2 , then starts the searching on file 2 , if in the end 50% or more were found then its similar , else is not .

I know this code can be refined , this was just made on the run here .
 
Old 10-31-2021, 09:48 PM   #11
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 10,188

Rep: Reputation: 4752Reputation: 4752Reputation: 4752Reputation: 4752Reputation: 4752Reputation: 4752Reputation: 4752Reputation: 4752Reputation: 4752Reputation: 4752Reputation: 4752
Check this out:

http://fstrcmp.sourceforge.net/

Apparently, most distros have it in their standard repos.
 
1 members found this post helpful.
Old 10-31-2021, 10:44 PM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,852

Rep: Reputation: 3111Reputation: 3111Reputation: 3111Reputation: 3111Reputation: 3111Reputation: 3111Reputation: 3111Reputation: 3111Reputation: 3111Reputation: 3111Reputation: 3111
As it was an intersting bas to write, this is what I was thinking of:
Code:
#!/usr/bin/env bash

declare -a sent1 sent2
declare word1 word2 cnt perc

perc=70

read -rp "Write a sentence: " -a sent1
read -rp "Write a sentence: " -a sent2

if [[ -z "${sent1[0]}" && -z "${sent2[0]}" ]]
then
	echo "Sentences are equal as both are empty"
	exit
elif [[ -z "${sent1[0]}" || -z "${sent2[0]}" ]]
then
	echo "Sentences are not equal as one is empty"
	exit
fi

for word1 in "${sent1[@]}"
do
	for word2 in "${!sent2[@]}"
	do
		if [[ "$word1" == "${sent2[$word2]}" ]]
		then
			sent2[$word2]=""
			(( cnt++ ))
		fi
	done
done

if (( (100 * cnt) / ${#sent1[*]} >= perc ))
then
	echo "Sentences are at least 70% similar"
else
	echo "Sentences are less then 70% similar"
fi

Last edited by grail; 10-31-2021 at 10:46 PM.
 
1 members found this post helpful.
Old 10-31-2021, 11:51 PM   #13
rnturn
Senior Member
 
Registered: Jan 2003
Location: Illinois (SW Chicago 'burbs)
Distribution: openSUSE, Raspbian, Slackware. Older: Coherent, MacOS, Red Hat, Big Iron IXs: AIX, Solaris, Tru64
Posts: 2,523

Rep: Reputation: 509Reputation: 509Reputation: 509Reputation: 509Reputation: 509Reputation: 509
Quote:
Originally Posted by pedropt View Post
imagining that i have 2 variables similar but not equal , how is the code written to know this ?

I have been trying to figure out how to start but i have no idea .

assumming :



There are very similarities in these 2 variables , how can this be done without writing the code specifically to these 2 var text .
If you're using Python, you might look at the "fuzzywuzzy" module (find it here if it's not available from your distribution's repository). I used it a while back to to do "fuzzy" matches of user-entered text to entries in a database. It has been (or is in the process of being) ported to several other languages (check that link for a list).

HTH...
 
Old 11-01-2021, 06:53 AM   #14
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,841

Rep: Reputation: 649Reputation: 649Reputation: 649Reputation: 649Reputation: 649Reputation: 649
Quote:
Originally Posted by grail View Post
As it was an interesting bash to write, this is what I was thinking of ...
Thank you for a useful piece of code. Useful, but with limitations. We may refer to the words of astrogeek in post #4...
Quote:
You will need to define "similar".
Consider these sentences:
Four score and seven years ago
Four_score_and_seven_years_ago


The human eye (and mind) might consider them equivalent. The meaning is understood, yet your solution produces this result:
Sentences are less then 70% similar

Beauty (and similarity) are in the eye of the beholder.

Daniel B. Martin

.
 
1 members found this post helpful.
Old 11-01-2021, 07:26 AM   #15
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 17,202

Rep: Reputation: 5822Reputation: 5822Reputation: 5822Reputation: 5822Reputation: 5822Reputation: 5822Reputation: 5822Reputation: 5822Reputation: 5822Reputation: 5822Reputation: 5822
there is something called similarity index: https://stackoverflow.com/questions/...en-two-strings (containing a lot of additional hints too)
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Another root hole in OS X. We know it, you know it, the bad people know it, and no patch exis LXer Syndicated Linux News 0 08-18-2015 09:42 PM
LXer: Another root hole in OS X. We know it, you know it, the bad people know it – and no patch ex LXer Syndicated Linux News 0 08-18-2015 09:12 PM
AWK a variable Ouptut to a new variable and using the new variable with the old one alertroshannow Linux - Newbie 4 02-16-2009 01:08 AM
using a variable in another variable. dina3e Programming 3 07-18-2008 12:17 AM
Shell scripting : how to name a variable with the value of another variable JimOrJoe Programming 9 06-15-2008 10:36 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:04 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration