How to know if a variable is similar to another
imagining that i have 2 variables similar but not equal , how is the code written to know this ?
I have been trying to figure out how to start but i have no idea . assumming : Quote:
|
Quote:
In the most simplistic sense, you would do an IF: Code:
if ($var1 eq $var2) { |
I mean i bash code , and with if statements we dont have similar , witch turns difficult to write .
That if you wrote checks if it is equal , witch is not but its similar . |
You will need to define "similar". Can you give an example like the first two above that would not be similar? For example...
Code:
var1="Yes i know it for sure" For things which "sound" similar you should search for the soundex algorithm, although I think that is a bit old. There is also a perl soundex type library - I am totally unfamiliar with it. But again, try to define similar and non-similar in your use case first. |
You can check two strings if they are the same or not i.e A = B, A !=B
You can check two strings by lexicographical (alphabetical) order i.e A < b or A > B You can check if a string matches an expression (regex) i.e. A =~ <some expression> You can check a string to see if it is empty or not. You can check for a substring within a string. But what exactly do you mean by similar which is somewhat subjective. If you are asking how many characters are an exact match then I do not know of a function and you have to write a bit of code. |
the only thing i can think out for this to work is to split the words of the 2 variables and count how many exist , something like this :
Code:
#!/bin/bash However for what i want this will work , the only problem is to define the percentage of count that is considered similar , lets say 7 of 10 = similar 3 of 10 = not similar but i can have texts with 20 words or less , determining these percentages could be a challenge in code . |
Ok, so your version of 'similar' is how many words does each sentence have in common (if I have gleaned your script correctly)
So the next question would be, do you consider a single word as a match if it only appears once in one sentence but multiple times in the other? (as grep will match it always) If above is not desired, you may have to also remove found words from the second sentence so you only match the count exactly. Also, for someone who has been using bash, at least on this site, for as long as you have, you should realise the need for temp files and convoluted piping is not needed. Simply place your sentences into arrays instead of temp files Count in arrays is done using ${#arr[@]} so wc and awk definitely not needed seq also not needed as just use 'for word in "${arr[@]}"' grep is easier but =~ in bash could do this sorrt of simple matching you can test the return of grep with 'if' so '-z' test not required eq=$((eq+1)) is more simply ((eq++)) |
It sounds like you want something like a natural language parser.
|
Quote:
String similarity — the basic know your algorithms guide! by Mohit Mayank https://itnext.io/string-similarity-...e-3de3d7346227 Daniel B. Martin . |
I think i found the solution for this :
Code:
#!/bin/bash I know this code can be refined , this was just made on the run here . |
Check this out:
http://fstrcmp.sourceforge.net/ Apparently, most distros have it in their standard repos. |
As it was an intersting bas to write, this is what I was thinking of:
Code:
#!/usr/bin/env bash |
Quote:
HTH... |
Quote:
Quote:
Four score and seven years ago Four_score_and_seven_years_ago The human eye (and mind) might consider them equivalent. The meaning is understood, yet your solution produces this result: Sentences are less then 70% similar Beauty (and similarity) are in the eye of the beholder. Daniel B. Martin . |
there is something called similarity index: https://stackoverflow.com/questions/...en-two-strings (containing a lot of additional hints too)
|
All times are GMT -5. The time now is 04:54 PM. |