LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 04-26-2010, 08:47 PM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,101

Rep: Reputation: 288Reputation: 288Reputation: 288
Numerical encoding of text, by position


Hello.

I have a file of words and want to encode them in a numeric form, based on position. This is best explained by example:

PEOPLE ==> 123152

Reading left to right:
P was first encountered at position 1 so it is encoded as 1.
E was first encountered at position 2 so it is encoded as 2.
O was first encountered at position 3 so it is encoded as 3.
P (again) was first encountered at position 1 so it is encoded as 1.
L was first encountered at position 5 so it is encoded as 5.
E (again) was first encountered at position 2 so it is encoded as 2.

More examples:
SENSE => 12312
COMMITTEE => 123356688
POSITION => 12345428

I have done this encoding in REXX with the TRANSLATE function, but cannot figure out how to do it with a Linux command (or string of commands).

The desirable solution uses commands but not awk or Perl.

Thank you.

Daniel B. Martin
 
Old 04-27-2010, 12:13 AM   #2
kurwongbah
Member
 
Registered: Apr 2010
Posts: 82

Rep: Reputation: 23
Okay you asked for it, if I can't use perl or awk

mystr="PEOPLE"; index=1; for (( i=0; i<${#mystr}; i++ )); do mychr=${mystr:$i:1}; if [ -z "${mychr//[a-zA-Z]}" ]; then mystr=`echo $mystr | tr ${mystr:$i:1} $index`; let index+=1; fi; done; echo $mystr

Have fun!
Jeroen
 
Old 04-27-2010, 12:21 AM   #3
kurwongbah
Member
 
Registered: Apr 2010
Posts: 82

Rep: Reputation: 23
Sorry cut and paste got me confused.
This does the trick.

mystr="PEOPLE"; for (( i=0; i<${#mystr}; i++ )); do let index=$i+1; mychr=${mystr:$i:1}; if [ -z "${mychr//[a-zA-Z]}" ]; then mystr=`echo $mystr | tr ${mystr:$i:1} $index`; fi; done; echo $mystr
 
1 members found this post helpful.
Old 04-27-2010, 02:07 AM   #4
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,519

Rep: Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896
For those interested in an alternative:
Code:
echo "PEOPLE" | awk -F "" '{for(i=1;i<=NF;i++)if(!($i in _)){_[$i]=i;x=x""i}else x=x""_[$i]}END{print x}'
And for the bashites
Code:
var="PEOPLE";for x in $(seq 0 $((${#var}-1)));do echo -n $(expr index "$var" ${var:$x:1});done;echo

Last edited by grail; 04-27-2010 at 02:35 AM.
 
1 members found this post helpful.
Old 04-28-2010, 10:20 PM   #5
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,101

Original Poster
Rep: Reputation: 288Reputation: 288Reputation: 288
Thank you all for the thought and suggestions. I've worked on this problem and made progress.

I want to avoid, if possible, a solution with explicit loops. I want to use, if possible, the tr command because it seems so similar to the REXX TRANSLATE built-in function. This is what I've got at present.

echo 'AARDVARK' | tr 'KRAVDRAA' '123456789abcdef' | tr '87654321' '123456789abcdef'

echo 'PEOPLE' | tr 'ELPOEP' '123456789abcdef' | tr '654321' '123456789abcdef'

Both examples generate the desired encoding. I'd like to generalize this solution to work for input words of any length. I've barely begun to learn about Regular Expressions, and think REs may be the key to a general solution. Ideas?

Daniel B. Martin
 
Old 04-29-2010, 12:31 AM   #6
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,519

Rep: Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896
Well I don't have a solution but I can see an issue with extending this.
If we break down the last two parts, as echo speaks for itself:

1. tr 'ELPOEP' '123456789abcdef' - the reversing is not to much of an issue, but what happens to words, all be they rare, that are longer than 16 characters?
2. tr '654321' '123456789abcdef' - firstly is the same issue above with length of the word, but as the first string/variable here is dependent on the length of the initial string, I believe (but could well be wrong) that you will need some kind of loop to create the value and again with reference to the lengths greater than 9 here, you will now
need to start accessing letters of the alphabet into the loop as well.

Whilst expedient for the current small scenarios, which of course if you guarantee the length won't be an issue then it is fine, I believe some of the earlier
offerings may be more prudent. Although i can see the issue where the indexes, in mine for example, continue the numbering order so it would be hard to tell if 11 means
two lots of position 1 or a single at position 11.
 
1 members found this post helpful.
  


Reply

Tags
encode, translate


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
console text encoding rajmohannambiar Linux - Newbie 7 01-13-2010 02:32 AM
how to count the numerical digits in between the text using a command or a script? Kilam orez Linux - Newbie 9 01-03-2010 12:15 AM
Text file manipulation: Extracting specific rows according to numerical pattern CHARL0TTE Linux - Newbie 3 10-07-2009 07:14 AM
Gnome - Change icon text position on desktop ma1069 Linux - Desktop 3 03-06-2008 04:25 AM
html to text + encoding? David the H. Linux - General 6 11-22-2004 05:10 AM


All times are GMT -5. The time now is 01:30 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration