LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-07-2015, 07:13 AM   #1
stf92
Senior Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware
Posts: 4,442

Rep: Reputation: 76
Pattern matching and replacement in a character stream.


Hi: I have a plain text ASCII file which is full of dates consisting of a three-digit year. I would like to substract 753 from each of those dates and substitute the result for the original date. That is, if in the text I have '204', I would replace it by '-549', which is 204 - 753. One aproach could be to read the file one character at a time, searching for three-digit sequences, producin an output file with the transformed input, using a three-character long FIFO stack. Is there not a more straightforward way, using linux commands?
 
Old 01-07-2015, 07:34 AM   #2
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
How is the year recognized? By position in the line? By being exactly three numbers surrounded by non-numerical characters?

Help us to help you. Provide a sample input file (10-15 lines will do). Construct a sample output file which corresponds to your sample input and post both samples here. With "InFile" and "OutFile" examples we can better understand your needs and also judge if our proposed solution fills those needs.

Daniel B. Martin
 
Old 01-07-2015, 08:13 AM   #3
stf92
Senior Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware
Posts: 4,442

Original Poster
Rep: Reputation: 76
Quote:
Originally Posted by danielbmartin View Post
How is the year recognized? By position in the line? By being exactly three numbers surrounded by non-numerical characters?
By being exactly three numbers surrounded by non-numerical characters. Input file:
Quote:
powers contending for the possession of the island as only a temporary
accommodation; on both sides the rivals were ever renewing their
attempts to dispossess each other. Four several times--in 360 in the
time of Dionysius the elder; in 410 in that of Timoleon; in 445 in
that of Agathocles; in 476 in that of Pyrrhus--the Carthaginians were
masters of all Sicily excepting Syracuse, and were baffled by its
solid walls; almost as often the Syracusans, under able leaders, such
Output file:
Quote:
powers contending for the possession of the island as only a temporary
accommodation; on both sides the rivals were ever renewing their
attempts to dispossess each other. Four several times--in -393 in the
time of Dionysius the elder; in -343 in that of Timoleon; in -308 in
that of Agathocles; in -277 in that of Pyrrhus--the Carthaginians were
masters of all Sicily excepting Syracuse, and were baffled by its
solid walls; almost as often the Syracusans, under able leaders, such
 
Old 01-07-2015, 08:27 AM   #4
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,863
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
Homework assignment? If so, you should consider that there wasn't 'year #0': After BC1 came AD1
 
Old 01-07-2015, 08:30 AM   #5
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,846

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
if every occurrence of 3 consecutive digits mean a year and you need to subtract 753 (so there is no exception) you may try a perl script (and also if the 3 digits are not splitted into two lines):
perl -ne 's/\d\d\d/$&-753/eg; print $_' filename
 
1 members found this post helpful.
Old 01-07-2015, 08:51 AM   #6
stf92
Senior Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware
Posts: 4,442

Original Poster
Rep: Reputation: 76
That worked fine! But what if I want to preserve the original year, say inclosed in curly brackets?
 
Old 01-07-2015, 09:04 AM   #7
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,846

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
is this your homework?
perl -ne 's/<search expression>/<substitute expression>/eg; print $_' filename is the general form you need to use.
In my previous tip the surrounding non-digit chars were not checked.
here is an improved solution:
s/(\s)(\d\d\d)(\s)/$1.($2-753)."{$2}".$3/eg
 
1 members found this post helpful.
Old 01-07-2015, 09:19 AM   #8
stf92
Senior Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware
Posts: 4,442

Original Poster
Rep: Reputation: 76
That makes a much nicer work, thanks in deed. I think I'll read some primer on perl, though it would be nice to do the same with more traditional Unix tools, perhaps sed combined with some other commands. To answer your question, it is no homework at all. I am reading Mommsen's history of Rome and all dates are there given in years ab urbe condita so, to get the year as we would write it nowadays I must substract 753, Ab urbe condita are years since the (mytical) foundation of Rome, 753 BC.

I have thought of something like this before posting:
Code:
char stack[3]; /* this is a FIFO stack */

char push{
/* Reads the next input character and
   pushes it into stack and returns the popped element 
   */
   }
   push;
   push;
   push;
loop:
   old_c=push;
   if all elements in stack are digits
       write the transformed 
       push;
       push;
       push;
   else
       write old_c;
   goto loop;
But I knew there should be a more direct way, as effectively there was.

Last edited by stf92; 01-07-2015 at 09:27 AM.
 
Old 01-07-2015, 09:25 AM   #9
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
this would id 3 digits in a row
Code:
[0-9][0-9][0-9]

Last edited by schneidz; 01-07-2015 at 09:27 AM.
 
Old 01-07-2015, 09:34 AM   #10
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,846

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
glad to help you
if you really want to say thanks just press YES.
stack[3] is not enough if you want to check delimiters too.
in c you can use isdigit() to check chars (also the non-digit delimiters), that is easy, also you can easily construct the replacement string. so it looks feasible.
 
1 members found this post helpful.
Old 01-07-2015, 09:50 AM   #11
stf92
Senior Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware
Posts: 4,442

Original Poster
Rep: Reputation: 76
pan64, I already had pressed YES, but if you want one more I'll gladly give you another one. Isdigit? Great! I'll try to do it in C too, given it's the language I ignore the least. Good bye and regards.

@schneidz: I guess non digit chars would be something like [A-z] or [@-z].
 
Old 01-07-2015, 10:14 AM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Quote:
I think I'll read some primer on perl, though it would be nice to do the same with more traditional Unix tools
Some might take offence to calling perl a non-traditional tool Not sure I have used a distro without perl installed as standard

As a non-standard solution, here is a little ruby:
Code:
ruby -pe '$_.gsub!(/\b\d{3}\b/){|m| "#{m.to_i - 753} {#{m}}" }' file
I would also mention that perl also includes the '-p' option so you could swap out '-n' and remove the final print alltogether
 
Old 01-07-2015, 12:11 PM   #13
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by grail View Post
As a non-standard solution, here is a little ruby:
Code:
ruby -pe '$_.gsub!(/\b\d{3}\b/){|m| "#{m.to_i - 753} {#{m}}" }' file
This ruby solution is admirably concise.

For comparison, this is an awk solution ...

With this InFile ...
Code:
powers contending for the possession of the island as only a temporary
accommodation; on both sides the rivals were ever renewing their
attempts to dispossess each other. Four several times--in 360 in the
time of Dionysius the elder; in 410 in that of Timoleon; in 445 in
that of Agathocles; in 476 in that of Pyrrhus--the Carthaginians were
masters of all Sicily excepting Syracuse, and were baffled by its
solid walls; almost as often the Syracusans, under able leaders, such
... this awk ...
Code:
awk '{for (j=1;j<=length($0)-3;j++) 
  {if (substr($0,j,3) ~ /[0-9][0-9][0-9]/)
   {yr=substr($0,j,3)-753; yr<1?yr--:0;
    $0=substr($0,1,j-1) yr " {" substr($0,j,3) "}" substr($0,j+3);
     j=j+10}} print}' $InFile >$OutFile
... produced this OutFile ...
Code:
powers contending for the possession of the island as only a temporary
accommodation; on both sides the rivals were ever renewing their
attempts to dispossess each other. Four several times--in -394 {360} in the
time of Dionysius the elder; in -344 {410} in that of Timoleon; in -309 {445} in
that of Agathocles; in -278 {476} in that of Pyrrhus--the Carthaginians were
masters of all Sicily excepting Syracuse, and were baffled by its
solid walls; almost as often the Syracusans, under able leaders, such
Daniel B. Martin
 
Old 01-08-2015, 01:06 AM   #14
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
This is probably a little simpler:
Code:
awk '{for(i=1;i<=NF;i++)if($i + 0 > 0)$i = $i + 0 - 753 " {"$i"}"}1' file
 
1 members found this post helpful.
Old 01-08-2015, 01:59 AM   #15
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,846

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
grail, I would add a length check: if ( (length($i) == 3) && ($i + 0 > 0) ) $i = $i - 753, otherwise nice solution
(the secont +0 is not necessary)

stf92, non-digit chars are: ! isdigit()
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Pattern to matching mohitvad Linux - General 3 10-13-2014 12:13 PM
Matching patterns or partial pattern matching yaplej Programming 6 12-16-2012 10:21 AM
print pattern matching lines until immediate occurence of a character keerthika Linux - Newbie 7 04-11-2012 05:58 AM
[SOLVED] awk with pipe delimited file (specific column matching and multiple pattern matching) lolmon Programming 4 08-31-2011 12:17 PM
[SOLVED] /bin/bash if statement pattern search, end of pattern special character? headhunter_unit23 Programming 3 04-29-2010 08:05 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:58 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration