LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Numerical encoding of text, by position (http://www.linuxquestions.org/questions/linux-newbie-8/numerical-encoding-of-text-by-position-804399/)

danielbmartin 04-26-2010 08:47 PM

Numerical encoding of text, by position
 
Hello.

I have a file of words and want to encode them in a numeric form, based on position. This is best explained by example:

PEOPLE ==> 123152

Reading left to right:
P was first encountered at position 1 so it is encoded as 1.
E was first encountered at position 2 so it is encoded as 2.
O was first encountered at position 3 so it is encoded as 3.
P (again) was first encountered at position 1 so it is encoded as 1.
L was first encountered at position 5 so it is encoded as 5.
E (again) was first encountered at position 2 so it is encoded as 2.

More examples:
SENSE => 12312
COMMITTEE => 123356688
POSITION => 12345428

I have done this encoding in REXX with the TRANSLATE function, but cannot figure out how to do it with a Linux command (or string of commands).

The desirable solution uses commands but not awk or Perl.

Thank you.

Daniel B. Martin

kurwongbah 04-27-2010 12:13 AM

Okay you asked for it, if I can't use perl or awk ;)

mystr="PEOPLE"; index=1; for (( i=0; i<${#mystr}; i++ )); do mychr=${mystr:$i:1}; if [ -z "${mychr//[a-zA-Z]}" ]; then mystr=`echo $mystr | tr ${mystr:$i:1} $index`; let index+=1; fi; done; echo $mystr

Have fun!
Jeroen

kurwongbah 04-27-2010 12:21 AM

Sorry cut and paste got me confused.
This does the trick.

mystr="PEOPLE"; for (( i=0; i<${#mystr}; i++ )); do let index=$i+1; mychr=${mystr:$i:1}; if [ -z "${mychr//[a-zA-Z]}" ]; then mystr=`echo $mystr | tr ${mystr:$i:1} $index`; fi; done; echo $mystr

grail 04-27-2010 02:07 AM

For those interested in an alternative:
Code:

echo "PEOPLE" | awk -F "" '{for(i=1;i<=NF;i++)if(!($i in _)){_[$i]=i;x=x""i}else x=x""_[$i]}END{print x}'
And for the bashites :)
Code:

var="PEOPLE";for x in $(seq 0 $((${#var}-1)));do echo -n $(expr index "$var" ${var:$x:1});done;echo

danielbmartin 04-28-2010 10:20 PM

Thank you all for the thought and suggestions. I've worked on this problem and made progress.

I want to avoid, if possible, a solution with explicit loops. I want to use, if possible, the tr command because it seems so similar to the REXX TRANSLATE built-in function. This is what I've got at present.

echo 'AARDVARK' | tr 'KRAVDRAA' '123456789abcdef' | tr '87654321' '123456789abcdef'

echo 'PEOPLE' | tr 'ELPOEP' '123456789abcdef' | tr '654321' '123456789abcdef'

Both examples generate the desired encoding. I'd like to generalize this solution to work for input words of any length. I've barely begun to learn about Regular Expressions, and think REs may be the key to a general solution. Ideas?

Daniel B. Martin

grail 04-29-2010 12:31 AM

Well I don't have a solution but I can see an issue with extending this.
If we break down the last two parts, as echo speaks for itself:

1. tr 'ELPOEP' '123456789abcdef' - the reversing is not to much of an issue, but what happens to words, all be they rare, that are longer than 16 characters?
2. tr '654321' '123456789abcdef' - firstly is the same issue above with length of the word, but as the first string/variable here is dependent on the length of the initial string, I believe (but could well be wrong) that you will need some kind of loop to create the value and again with reference to the lengths greater than 9 here, you will now
need to start accessing letters of the alphabet into the loop as well.

Whilst expedient for the current small scenarios, which of course if you guarantee the length won't be an issue then it is fine, I believe some of the earlier
offerings may be more prudent. Although i can see the issue where the indexes, in mine for example, continue the numbering order so it would be hard to tell if 11 means
two lots of position 1 or a single at position 11.


All times are GMT -5. The time now is 12:07 AM.