ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
would you be as so kind to explain how that code works? I only started shell scripting about 2 weeks ago and haven't really been exposed to awk at all.
just an update for drunna, I had to modify the code in post #2. I caught the code adding h6 into the text in between the brackets when there happened to be a capital P in a word.
Code:
sed -i '/Headline:/{n;n;n;s/<P>/<h6>/;s/<\/P>/<\/h6>/;}' infile
I've learned so much in the past 2 days its crazy. Thanks again for everyones (very speedy!) help!
If I try the code on the sample input it seems to work:
Code:
$ cat infile
this is valid 1
<!--
some text
that is N lines long
-->
this is valid 2
$
$
$ awk '/<!--/,/-->/{next}{print}' infile
this is valid 1
this is valid 2
If you are more at home using sed you can do this: sed '/<!--/,/-->/d' infile. Result will be the same as the output of the awk command.
You mention that you needed to tweak the awk command but you don't tell what needs to be tweaked. If the above doesn't work could you post a the relevant part of the input and the desired output?
Hope this helps.
BTW: Glad to read that you have learned something, it's always good to read that the help given is actually helping.
the sed command did the trick just added the -i designator.
I think the reason the awk command doesn't work is that the actual functionality of the script isn't my code. I just adapted the code to do the things I need. I still need to sit down and take the time to learn how all the little bits and pieces work.
If -i is all you need to get the thing working using drunna's sed suggestion, then you just need to redirect
to a new file in awk and rename the new file back to the original. the -i in sed is just an in place modification to the file
Back again with a question. I've been reading up on Reg expressions, but I'm not really getting anywhere.
The problem: I have some code where there is a link wrapped in an ugly <U></U> and <FONT></FONT> tags. I can clean up all of the tags just fine, until I get to the font tag where I use the command in code #2 below to try and remove it. What ends up happening is it removes everything except for the very last </a> tag. The <p></p> tags are left fine too.
Through process of elimination, I found the problem lies in the command in code #2.
I know that .* is greedy, but I thought that is what the " and > were for.
I also tried using .*? instead, but to no avail.
Like ghostdog74 and I said before: parsing html/xml files is tricky business
You need to find something unique to use in the reg-exp. This <FONT .*"> is not unique enough (hence it is greedy).
If I look at the example given, this 000080 is unique, <FONT .*000080"> will remove the first font entry on every line. But this only works if only the color value (000080) is used.
Wouldn't it be less work if you would take a look at perl and the html/xml parsing modules that are already available?
EDIT
I just noticed you new post, which renders my answers useless (well, most of it )
The font problem could probably be solved this way: ^<FONT .*#[0-9A-Z]\{6\}"> (but I'm not 100% sure if this will exclude false positives).
This looks for: <FONT at the beginning of a line (the first ^), followed by a space and anything ( .*) and it should end with a # followed by 6 chars, which should be an A to F or 0 to 9. the last two chars should be a " and a >.
Quote:
is there a way to tell it not to change the one that actually has text in between the <a href=""></a> tags?
Maybe, but this depends on the actual code (what is were and is it unique enough).
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.