LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-21-2013, 05:41 PM   #1
Mr. Alex
Senior Member
 
Registered: May 2010
Distribution: No more Linux. Done with it.
Posts: 1,238

Rep: Reputation: Disabled
Regexp script needed, please help


Hello people.
Help me to make a simple regexp script please. There's a csv-file with lines of text. I need script to find all "new line" brakes in this text file and those "new line" brakes that are not followed by at least three digits in a row, change to "<br>". I guess the script is one line and very simple, I just don't have time and real need to learn sed regexp or any other regexps.
 
Old 05-21-2013, 06:08 PM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by Mr. Alex View Post
Help me to make a simple regexp script please.
Do make an effort to post anything we can actually help you with I'd say.
 
Old 05-21-2013, 06:25 PM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by Mr. Alex View Post
Hello people.
Help me to make a simple regexp script please. There's a csv-file with lines of text. I need script to find all "new line" brakes in this text file and those "new line" brakes that are not followed by at least three digits in a row, change to "<br>". I guess the script is one line and very simple, I just don't have time and real need to learn sed regexp or any other regexps.
Help us to help you. Provide a sample input file (10-15 lines will do). Construct a sample output file which corresponds to your sample input and post both samples here. With "Before and After" examples we can better understand your needs and also judge if our proposed solution fills those needs.

Daniel B. Martin
 
Old 05-22-2013, 01:41 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
I am with the above ... without an example it would prove difficult to help.

Also, you mention <br>, which would seem to imply passing html, if so this would be useful information.
 
Old 05-22-2013, 03:06 AM   #5
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,836

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
not to speak about that you can find simple csv regexp on the net: http://dotnetslackers.com/Regex/re-7...commas_yo.aspx
 
Old 05-23-2013, 09:22 AM   #6
Mr. Alex
Senior Member
 
Registered: May 2010
Distribution: No more Linux. Done with it.
Posts: 1,238

Original Poster
Rep: Reputation: Disabled
Quote:
Also, you mention <br>, which would seem to imply passing html, if so this would be useful information.
It is for saving as HTML, but the source files are csv.

Quote:
Originally Posted by danielbmartin View Post
Help us to help you. Provide a sample input file (10-15 lines will do). Construct a sample output file which corresponds to your sample input and post both samples here. With "Before and After" examples we can better understand your needs and also judge if our proposed solution fills those needs.

Daniel B. Martin
Code:
=== before ===

1234;aaa;bbb;ccc;ddd
6352;jncasdu;qweyghq;cnhjab;
234;ajbsndc;acyue;aciusdh;


ksjdhcaiuh,

asdnclius
3478;sadcbh;wdoehuiq;csanld
1290;asdhuoci;qdbhejkw;

1. cshdb
2. huscdio
3. csdhuioa
4. wduqhie
2763;chsduaio;wdebhk;scdhuao

=== after ===

1234;aaa;bbb;ccc;ddd
6352;jncasdu;qweyghq;cnhjab;
234;ajbsndc;acyue;aciusdh;<br><br><br>ksjdhcaiuh,<br><br>asdnclius
3478;sadcbh;wdoehuiq;csanld
1290;asdhuoci;qdbhejkw;<br><br>1. cshdb<br>2. huscdio<br>3. csdhuioa<br>4. wduqhie
2763;chsduaio;wdebhk;scdhuao
Quote:
Originally Posted by pan64 View Post
not to speak about that you can find simple csv regexp on the net: http://dotnetslackers.com/Regex/re-7...commas_yo.aspx
I don't need to split csv file.
 
Old 05-23-2013, 10:10 AM   #7
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
With this InFile ...
Code:
1234;aaa;bbb;ccc;ddd
6352;jncasdu;qweyghq;cnhjab;
234;ajbsndc;acyue;aciusdh;


ksjdhcaiuh,

asdnclius
3478;sadcbh;wdoehuiq;csanld
1290;asdhuoci;qdbhejkw;

1. cshdb
2. huscdio
3. csdhuioa
4. wduqhie
2763;chsduaio;wdebhk;scdhuao
... this brute-force code ...
Code:
 sed 's/\(^[0-9][0-9][0-9]\)/~\1/' $InFile  \
|paste -sd'%' -   \
|sed 's/^~//'     \
|sed 's/%~/\n/g'  \
|sed 's/%/<br>/g' \
> $OutFile
... produced this OutFile ...
Code:
1234;aaa;bbb;ccc;ddd
6352;jncasdu;qweyghq;cnhjab;
234;ajbsndc;acyue;aciusdh;<br><br><br>ksjdhcaiuh,<br><br>asdnclius
3478;sadcbh;wdoehuiq;csanld
1290;asdhuoci;qdbhejkw;<br><br>1. cshdb<br>2. huscdio<br>3. csdhuioa<br>4. wduqhie
2763;chsduaio;wdebhk;scdhuao
Daniel B. Martin
 
Old 05-23-2013, 10:22 AM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Maybe something like:
Code:
awk 'NR > 1{ORS = /^[0-9]+;/?RT:"<br>"; print x}{x=$0}END{print x}' file
 
Old 05-23-2013, 11:48 AM   #9
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by grail View Post
Maybe something like:
Code:
awk 'NR > 1{ORS = /^[0-9]+;/?RT:"<br>"; print x}{x=$0}END{print x}' file
Short and sweet!

Minor nitpick:

Quote:
I need script to find all "new line" brakes in this text file and those "new line" brakes that are not followed by at least three digits in a row, change to "<br>".
Your awk doesn't handle the three-consecutive-digits test.

Daniel B. Martin
 
Old 05-23-2013, 12:39 PM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Fair enough ... change + to {3,}
 
Old 05-23-2013, 12:56 PM   #11
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by grail View Post
Fair enough ... change + to {3,}
With this InFile ...
Code:
1234;aaa;bbb;ccc;ddd
6352;jncasdu;qweyghq;cnhjab;
23;ajbsndc;acyue;aciusdh;


ksjdhcaiuh,

asdnclius
3478;sadcbh;wdoehuiq;csanld
1290;asdhuoci;qdbhejkw;

1. cshdb
2. huscdio
3. csdhuioa
4. wduqhie
2763;chsduaio;wdebhk;scdhuao
... this awk ...
Code:
awk 'NR > 1{ORS = /^[0-9]{3,};/?RT:"<br>"; print x}
  {x=$0}END{print x}' $InFile >$OutFile
... produced this OutFile ...
Code:
1234;aaa;bbb;ccc;ddd<br>6352;jncasdu;qweyghq;cnhjab;<br>23;ajbsndc;acyue;aciusdh;<br><br><br>ksjdhcaiuh,<br><br>asdnclius<br>3478;sadcbh;wdoehuiq;csanld<br>1290;asdhuoci;qdbhejkw;<br><br>1. cshdb<br>2. huscdio<br>3. csdhuioa<br>4. wduqhie<br>2763;chsduaio;wdebhk;scdhuao<br>
... which looks wrong.

Daniel B. Martin
 
Old 05-23-2013, 02:01 PM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
And I would be guessing you are using a pre version 4 of gawk? Otherwise it works fine
 
Old 05-23-2013, 04:28 PM   #13
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by grail View Post
And I would be guessing you are using a pre version 4 of gawk?
Ahh, yes.
Code:
daniel@daniel-desktop:~$ gawk --version
GNU Awk 3.1.6
Copyright (C) 1989, 1991-2007 Free Software Foundation.
Daniel B. Martin
 
Old 05-24-2013, 03:04 AM   #14
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
I haven't had to try it in a while but there is a --re-interval (from memory) which I think may help.
 
1 members found this post helpful.
Old 05-24-2013, 08:05 AM   #15
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by grail View Post
... there is a --re-interval ... which I think may help.
Yes!

This works with GNU awk 3.1.6 ...
Code:
awk --re-interval 'NR > 1{ORS = /^[0-9]{3,};/?RT:"<br>"; print x}
  {x=$0}END{print x}' $InFile >$OutFile
Daniel B. Martin
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] script needed thetiger2003 Linux - Newbie 6 09-21-2011 04:38 AM
Start-Up Script & Shutdown/Kill Script needed guggilamsandeep Red Hat 1 05-11-2011 08:58 AM
serious script help needed bino25 Linux - Newbie 8 02-18-2011 04:15 PM
[SOLVED] Script to remove text strings before a regexp on every other line? kmkocot Programming 10 07-12-2010 11:58 PM
Script needed ganeshinforum Programming 2 01-05-2007 07:29 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:58 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration