LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-27-2013, 05:00 AM   #1
Cheah Boon Huat
LQ Newbie
 
Registered: Feb 2013
Posts: 10

Rep: Reputation: Disabled
Smile Identify additional or lacking of characters between pairs of string ends


Hi,
The following is an example of file i have in ubuntu platform. Column 2 consists of reference strings with which strings from column 1 are compared. The symbols that denote the additional or lacking of characters can only be put on strings in column 1, when strings at the same row are compared across the 2 columns.

Note:
1) . (a full stop) denotes lacking of a character at the end of string.
2) .. (two full stops) denote lacking of 2 characters at the end of string and so on.
3)'' (apostrophe) denotes additional character(s) at the end of string.
4) This additional & lacking of characters only happen at the end of strings.
5) Range of the number additional/lacking (in my actual file) = 0 - 10
6) length of strings in my actual file = around 20

input:
Code:
Column 1     Column 2
PETER        PETER
PETER        PETERAB
PETER        PETERABC
JOHN         ABJOHN
JOHN         ABCJOHN
JOHNSON      JOHN
JOHNSON      JOH
JOHN         OHN
JOHN         HN     
JOHNSON      ABJOHN
ABJOHN       JOHNSON
Expected output
Code:
Column 1     Column 2
PETER        PETER
PETER..      PETERAB
PETER...     PETERABC
..JOHN       ABJOHN
...JOHN      ABCJOHN
JOHN'SON'    JOHN
JOH'NSON'    JOH
'J'OHN       OHN
'JO'HN       HN     
..JOHN'SON'  ABJOHN
'AB'JOHN...  JOHNSON
Can anyone provide me with scripts to solve this data processing. Thank you very much.
 
Old 02-27-2013, 06:12 PM   #2
psionl0
Member
 
Registered: Jan 2011
Distribution: slackware_64 14.1
Posts: 722
Blog Entries: 2

Rep: Reputation: 124Reputation: 124
That sounds like something that would be better handled with a programming language that has a good range of string handling functions rather than a script.

What sort of output would you expect from comparing POP with PREPPER?
 
Old 02-27-2013, 07:54 PM   #3
Cheah Boon Huat
LQ Newbie
 
Registered: Feb 2013
Posts: 10

Original Poster
Rep: Reputation: Disabled
Hi psion10,
Thanks for the very good example for consolidating my question! For your information, all the strings in my actual file are "biological" strings that have higher number of identical characters in the pairs of strings. If you reread my original input, the length of strings in my actual file is around 20. The "expected" degree of similarity between the pairs of strings is 16. Expected means what i can confidently say about my actual file. Thank you again.
 
Old 02-27-2013, 09:28 PM   #4
psionl0
Member
 
Registered: Jan 2011
Distribution: slackware_64 14.1
Posts: 722
Blog Entries: 2

Rep: Reputation: 124Reputation: 124
It sounds like you want a function that will find the largest substring that is common to both names and that you can assume that only one substring of that length will exist.
 
Old 02-28-2013, 01:01 AM   #5
Cheah Boon Huat
LQ Newbie
 
Registered: Feb 2013
Posts: 10

Original Poster
Rep: Reputation: Disabled
Exactly. Thank you.
 
Old 02-28-2013, 02:11 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
So if you are requesting assistance, please show what you have tried and where you are stuck?
 
Old 02-28-2013, 03:11 AM   #7
Cheah Boon Huat
LQ Newbie
 
Registered: Feb 2013
Posts: 10

Original Poster
Rep: Reputation: Disabled
Sorry grail,
I haven't picked up any programming language yet.In this case, i will request for assistance next time.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] To identify and count the identical string of alphabets across columns Cheah Boon Huat Linux - Newbie 14 02-28-2013 03:54 AM
to identify the language of string in perl? vijay mishra Linux - Newbie 5 04-23-2012 09:40 AM
Insert additional characters into filenames to rename them. Trap Linux - General 4 05-14-2009 06:25 AM
Getting the no repeated pairs out of a a string charlitos Programming 2 02-17-2009 02:52 AM
Grep String Search, and identify source file. carl0ski Linux - General 4 01-21-2006 08:15 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:20 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration