LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 02-22-2008, 09:51 AM   #16
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241

very simplistically

Code:
# more file
this is valid 1
<!--
     some text
     that is N lines long
-->
this is valid 2
# awk '/<!--/,/-->/{next}1' file
this is valid 1
this is valid 2
 
Old 02-22-2008, 10:12 AM   #17
jettachamp26
Member
 
Registered: Feb 2008
Location: Florida
Distribution: ubuntu
Posts: 30

Original Poster
Rep: Reputation: 15
would you be as so kind to explain how that code works? I only started shell scripting about 2 weeks ago and haven't really been exposed to awk at all.
 
Old 02-22-2008, 10:16 AM   #18
jettachamp26
Member
 
Registered: Feb 2008
Location: Florida
Distribution: ubuntu
Posts: 30

Original Poster
Rep: Reputation: 15
just an update for drunna, I had to modify the code in post #2. I caught the code adding h6 into the text in between the brackets when there happened to be a capital P in a word.

Code:
 sed -i '/Headline:/{n;n;n;s/<P>/<h6>/;s/<\/P>/<\/h6>/;}' infile

I've learned so much in the past 2 days its crazy. Thanks again for everyones (very speedy!) help!
 
Old 02-22-2008, 10:22 AM   #19
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Code:
awk '/<!--/,/-->/{next}{print}' file
from the pattern <!-- till the pattern -->, skip. else print the rest.
For more information, read this
 
Old 02-22-2008, 10:39 AM   #20
jettachamp26
Member
 
Registered: Feb 2008
Location: Florida
Distribution: ubuntu
Posts: 30

Original Poster
Rep: Reputation: 15
Thanks. Good link too. I'll be sure to bookmark it.
 
Old 02-22-2008, 11:09 AM   #21
jettachamp26
Member
 
Registered: Feb 2008
Location: Florida
Distribution: ubuntu
Posts: 30

Original Poster
Rep: Reputation: 15
I tried using it, and tried tweaking it a little, but it wouldn't work inside my script.

Not really sure why it's not working though. I do know that I have awk on my system and that awk works.

any help?
here's the code to make it clear whats not working.
Code:
awk '/<!--/,/-->/{next}{print}' infile
also tried

Code:
awk -i '/<!--/,/-->/{next}{print}' infile
but i believe the designator only works with sed.
 
Old 02-22-2008, 11:44 AM   #22
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373
Hi,

If I try the code on the sample input it seems to work:

Code:
$ cat infile 
this is valid 1
<!--
     some text
     that is N lines long
-->
this is valid 2
$
$
$ awk '/<!--/,/-->/{next}{print}' infile 
this is valid 1
this is valid 2
If you are more at home using sed you can do this: sed '/<!--/,/-->/d' infile. Result will be the same as the output of the awk command.

You mention that you needed to tweak the awk command but you don't tell what needs to be tweaked. If the above doesn't work could you post a the relevant part of the input and the desired output?

Hope this helps.

BTW: Glad to read that you have learned something, it's always good to read that the help given is actually helping.
 
Old 02-22-2008, 12:22 PM   #23
jettachamp26
Member
 
Registered: Feb 2008
Location: Florida
Distribution: ubuntu
Posts: 30

Original Poster
Rep: Reputation: 15
the sed command did the trick just added the -i designator.

I think the reason the awk command doesn't work is that the actual functionality of the script isn't my code. I just adapted the code to do the things I need. I still need to sit down and take the time to learn how all the little bits and pieces work.

the original script I used is by Ian Spillane http://iantheteacher.blogspot.com
 
Old 02-22-2008, 10:46 PM   #24
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
If -i is all you need to get the thing working using drunna's sed suggestion, then you just need to redirect
to a new file in awk and rename the new file back to the original. the -i in sed is just an in place modification to the file
Code:
awk '/..../{...}' file  > temp
mv temp file
 
Old 02-25-2008, 09:21 AM   #25
jettachamp26
Member
 
Registered: Feb 2008
Location: Florida
Distribution: ubuntu
Posts: 30

Original Poster
Rep: Reputation: 15
Hey,

Back again with a question. I've been reading up on Reg expressions, but I'm not really getting anywhere.

The problem: I have some code where there is a link wrapped in an ugly <U></U> and <FONT></FONT> tags. I can clean up all of the tags just fine, until I get to the font tag where I use the command in code #2 below to try and remove it. What ends up happening is it removes everything except for the very last </a> tag. The <p></p> tags are left fine too.

Through process of elimination, I found the problem lies in the command in code #2.

I know that .* is greedy, but I thought that is what the " and > were for.
I also tried using .*? instead, but to no avail.

Code #1
Code:
<FONT COLOR="#000080"><U><A HREF="http://www.example.com/"></A><A HREF="http://www.example.com/"></A><A HREF="http://www.example.com/">"http://www.example.com/"</A><A HREF="http://www.example.com/"></A><A HREF="http://www.example.com/"></A></U></FONT>
</P>
<P STYLE="margin-bottom: 0in">Example2:
<FONT COLOR="#000080"><U><A HREF="http://www.example2.com/"></A><A HREF="http://www.example2.com/"></A><A HREF="http://www.example2.com/">http://www.example2.com/</A><A HREF="http://www.example2.com/"></A><A HREF="http://www.example2.com/"></A></U></FONT>
</P>
Through process of elimination, I narrowed it down to this one command.

Code #2
Code:
sed -i 's/<FONT .*">//g' "$f"
 
Old 02-25-2008, 09:27 AM   #26
jettachamp26
Member
 
Registered: Feb 2008
Location: Florida
Distribution: ubuntu
Posts: 30

Original Poster
Rep: Reputation: 15
I would just do a check for the COLOR="000080" but this is used across many different files, and the Font tag is useless to me in all of the files.
 
Old 02-25-2008, 09:40 AM   #27
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373
Hi,

Like ghostdog74 and I said before: parsing html/xml files is tricky business

You need to find something unique to use in the reg-exp. This <FONT .*"> is not unique enough (hence it is greedy).

If I look at the example given, this 000080 is unique, <FONT .*000080"> will remove the first font entry on every line. But this only works if only the color value (000080) is used.

Wouldn't it be less work if you would take a look at perl and the html/xml parsing modules that are already available?

EDIT

I just noticed you new post, which renders my answers useless (well, most of it )

/EDIT

Last edited by druuna; 02-25-2008 at 09:41 AM.
 
Old 02-25-2008, 10:05 AM   #28
jettachamp26
Member
 
Registered: Feb 2008
Location: Florida
Distribution: ubuntu
Posts: 30

Original Poster
Rep: Reputation: 15
i might just have to check out perl and its html/xml parsing because i am stumped.

EDIT
Do you happen to have any good links I can read up on it?
/EDIT

Last edited by jettachamp26; 02-25-2008 at 10:06 AM.
 
Old 02-25-2008, 10:12 AM   #29
jettachamp26
Member
 
Registered: Feb 2008
Location: Florida
Distribution: ubuntu
Posts: 30

Original Poster
Rep: Reputation: 15
Also,

Would you know if/how I can remove duplicate empty <a></a> like in Code #1 of post #25?

could I use something like this?

Code:
sed -i 's/<a href=".*"></a>//g' infile
is there a way to tell it not to change the one that actually has text in between the <a href=""></a> tags?

Last edited by jettachamp26; 02-25-2008 at 10:13 AM.
 
Old 02-25-2008, 10:32 AM   #30
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373
Hi again,

The font problem could probably be solved this way: ^<FONT .*#[0-9A-Z]\{6\}"> (but I'm not 100% sure if this will exclude false positives).

This looks for: <FONT at the beginning of a line (the first ^), followed by a space and anything ( .*) and it should end with a # followed by 6 chars, which should be an A to F or 0 to 9. the last two chars should be a " and a >.

Quote:
is there a way to tell it not to change the one that actually has text in between the <a href=""></a> tags?
Maybe, but this depends on the actual code (what is were and is it unique enough).
 
  


Reply

Tags
replace, sed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bash script text replacement... matthurne Programming 4 06-07-2011 06:46 PM
Help with BASH to search text files on disk purveshk Linux - Newbie 3 02-19-2008 01:14 PM
how to change some text of a certain line of a text file with bash and *nix scripting alred Programming 6 07-10-2006 11:55 AM
Bash scripting to check text in a website carlp Programming 2 09-20-2005 11:14 AM
Recursive search in bash scripting ! zulfilee Linux - Software 3 12-12-2004 10:40 PM


All times are GMT -5. The time now is 09:41 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration