[SOLVED] Pattern matching and replacement in a character stream.
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Pattern matching and replacement in a character stream.
Hi: I have a plain text ASCII file which is full of dates consisting of a three-digit year. I would like to substract 753 from each of those dates and substitute the result for the original date. That is, if in the text I have '204', I would replace it by '-549', which is 204 - 753. One aproach could be to read the file one character at a time, searching for three-digit sequences, producin an output file with the transformed input, using a three-character long FIFO stack. Is there not a more straightforward way, using linux commands?
How is the year recognized? By position in the line? By being exactly three numbers surrounded by non-numerical characters?
Help us to help you. Provide a sample input file (10-15 lines will do). Construct a sample output file which corresponds to your sample input and post both samples here. With "InFile" and "OutFile" examples we can better understand your needs and also judge if our proposed solution fills those needs.
How is the year recognized? By position in the line? By being exactly three numbers surrounded by non-numerical characters?
By being exactly three numbers surrounded by non-numerical characters. Input file:
Quote:
powers contending for the possession of the island as only a temporary
accommodation; on both sides the rivals were ever renewing their
attempts to dispossess each other. Four several times--in 360 in the
time of Dionysius the elder; in 410 in that of Timoleon; in 445 in
that of Agathocles; in 476 in that of Pyrrhus--the Carthaginians were
masters of all Sicily excepting Syracuse, and were baffled by its
solid walls; almost as often the Syracusans, under able leaders, such
Output file:
Quote:
powers contending for the possession of the island as only a temporary
accommodation; on both sides the rivals were ever renewing their
attempts to dispossess each other. Four several times--in -393 in the
time of Dionysius the elder; in -343 in that of Timoleon; in -308 in
that of Agathocles; in -277 in that of Pyrrhus--the Carthaginians were
masters of all Sicily excepting Syracuse, and were baffled by its
solid walls; almost as often the Syracusans, under able leaders, such
if every occurrence of 3 consecutive digits mean a year and you need to subtract 753 (so there is no exception) you may try a perl script (and also if the 3 digits are not splitted into two lines):
perl -ne 's/\d\d\d/$&-753/eg; print $_' filename
is this your homework?
perl -ne 's/<search expression>/<substitute expression>/eg; print $_' filename is the general form you need to use.
In my previous tip the surrounding non-digit chars were not checked.
here is an improved solution:
s/(\s)(\d\d\d)(\s)/$1.($2-753)."{$2}".$3/eg
That makes a much nicer work, thanks in deed. I think I'll read some primer on perl, though it would be nice to do the same with more traditional Unix tools, perhaps sed combined with some other commands. To answer your question, it is no homework at all. I am reading Mommsen's history of Rome and all dates are there given in years ab urbe condita so, to get the year as we would write it nowadays I must substract 753, Ab urbe condita are years since the (mytical) foundation of Rome, 753 BC.
I have thought of something like this before posting:
Code:
char stack[3]; /* this is a FIFO stack */
char push{
/* Reads the next input character and
pushes it into stack and returns the popped element
*/
}
push;
push;
push;
loop:
old_c=push;
if all elements in stack are digits
write the transformed
push;
push;
push;
else
write old_c;
goto loop;
But I knew there should be a more direct way, as effectively there was.
glad to help you
if you really want to say thanks just press YES.
stack[3] is not enough if you want to check delimiters too.
in c you can use isdigit() to check chars (also the non-digit delimiters), that is easy, also you can easily construct the replacement string. so it looks feasible.
pan64, I already had pressed YES, but if you want one more I'll gladly give you another one. Isdigit? Great! I'll try to do it in C too, given it's the language I ignore the least. Good bye and regards.
@schneidz: I guess non digit chars would be something like [A-z] or [@-z].
powers contending for the possession of the island as only a temporary
accommodation; on both sides the rivals were ever renewing their
attempts to dispossess each other. Four several times--in 360 in the
time of Dionysius the elder; in 410 in that of Timoleon; in 445 in
that of Agathocles; in 476 in that of Pyrrhus--the Carthaginians were
masters of all Sicily excepting Syracuse, and were baffled by its
solid walls; almost as often the Syracusans, under able leaders, such
powers contending for the possession of the island as only a temporary
accommodation; on both sides the rivals were ever renewing their
attempts to dispossess each other. Four several times--in -394 {360} in the
time of Dionysius the elder; in -344 {410} in that of Timoleon; in -309 {445} in
that of Agathocles; in -278 {476} in that of Pyrrhus--the Carthaginians were
masters of all Sicily excepting Syracuse, and were baffled by its
solid walls; almost as often the Syracusans, under able leaders, such
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.