LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-01-2018, 06:07 PM   #1
wenzu
LQ Newbie
 
Registered: Oct 2018
Posts: 6

Rep: Reputation: Disabled
Removing lines from file


I have a list of files and want to remove the lines between
the "Cookies Consent Notice" lines.


Quote:
<!-- Cookies Consent Notice (Production CDN, www.test.com, en-US) start -->

<script src="https://cdn.cookielaw.org/consent/bcd12cae-aa3c-4ffb-ac51-8d41462cdcb4.js" type="text/javascript" charset="UTF-8"></script>

<script type="text/javascript">

function OptanonWrapper() { }

</script>

<!-- Cookies Consent Notice (Production CDN, www.test.com, en-US) end -->
 
Old 10-01-2018, 06:32 PM   #2
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,269
Blog Entries: 24

Rep: Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196
Welcome to LQ!

The common way to do that would be using sed, although there are many ways to accomplish this task.

Using sed, something like this would do the trick (other scripting languages would use similar regular expressions)...

Code:
/Cookies[^>]* start/,/Cookies[^>]* end/{/Cookies/!d;}
Depending on the surrounding content you may need to make the expressions a bit more specific.

LQ does not operate like a help desk where you get complete solutions, so it is better if you try to solve a problem yourself then ask for help when you get stuck, showing us what you have done first. This also tells others what you language preferences are and how you have framed the task in your own mind, which can help produce better answers for your case.

Please review the Site FAQ for guidance in posting your questions and general forum usage.

Good luck!

Last edited by astrogeek; 10-01-2018 at 06:43 PM. Reason: better grammar, typos
 
1 members found this post helpful.
Old 10-01-2018, 06:46 PM   #3
wenzu
LQ Newbie
 
Registered: Oct 2018
Posts: 6

Original Poster
Rep: Reputation: Disabled
I have tried getting the line numbers whre I get a match for "Cookies Consent Notice",
then using awk to get the two line numbers where the match occurs, then remove lines
between the two line numbers.

Quote:
grep -n 'Cookies Consent Notice' ./browse/ingredients.html
When I have tried

Quote:
grep -n 'Cookies Consent Notice' ./browse/ingredients.html
306: <!-- Cookies Consent Notice (Production CDN, www.test.com, en-US) start -->
316:<!-- Cookies Consent Notice (Production CDN, www.test.com, en-US) end -->
However when I tried the following, I only got the second match line number

Quote:
grep -n 'Cookies Consent Notice' ./browse/ingredients.html | awk -F ":" '/1/ {print $1}'
316
 
Old 10-01-2018, 06:57 PM   #4
wenzu
LQ Newbie
 
Registered: Oct 2018
Posts: 6

Original Poster
Rep: Reputation: Disabled
Ok, got it when I use

Quote:
grep -n 'Cookies Consent Notice' ./browse/ingredients.html | awk 'BEGIN { FS = ":" } ; { print $1 }'
306
316
 
Old 10-01-2018, 07:06 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,141

Rep: Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123
You need to be very specific (and very correct) when asking for help. astrogeek gave an answer to what you asked, but probably not what you meant.
As for your own solution, it appears you are confusing regex pattern with address specifiers. Why are you using regex at all in that awk ? - just print $1.

- too slow typing again ...
 
Old 10-01-2018, 07:29 PM   #6
wenzu
LQ Newbie
 
Registered: Oct 2018
Posts: 6

Original Poster
Rep: Reputation: Disabled
He asked what I had done, so I put some more things I had tried. Thanks.
 
Old 10-01-2018, 08:12 PM   #7
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,269
Blog Entries: 24

Rep: Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196
Thanks for the response, but I am now confused about what your ultimate goal is.

Per your original question I think that you are trying to strip the 'Consent Notice' script lines from an HTML document.

If that is the case I would not use the line numbers at all as it is at least an unnecessary step and confusing as it may obscure what you are really trying to accomplish. That said, if the line numbers are important for some other reason then you should tell us what that is.

It is also unclear in your original question whether you want to remove the 'Consent Notice' lines themselves, or strictly the lines between them. My earlier example removes only the lines between, but can be easily modified to remove those lines as well.

One effective way to ask text processing questions is to provide an example input file, which you have done, and show what you would expect the actual result to look like.

Again, using your original question as the guide, this would be a possible example...

Code:
Source file looks like this...

<HTML>
<Other stuff> Goes here...
<!-- Cookies Consent Notice (Production CDN, www.test.com, en-US) start -->

<script src="https://cdn.cookielaw.org/consent/bcd12cae-aa3c-4ffb-ac51-8d41462cdcb4.js" type="text/javascript" charset="UTF-8"></script>

<script type="text/javascript">

function OptanonWrapper() { }

</script>

<!-- Cookies Consent Notice (Production CDN, www.test.com, en-US) end -->
</Other stuff>
</HTML>
Code:
Result output should look like this...

<HTML>
<Other stuff> Goes here...
<!-- Cookies Consent Notice (Production CDN, www.test.com, en-US) start -->
<!-- Cookies Consent Notice (Production CDN, www.test.com, en-US) end -->
</Other stuff>
</HTML>
Is that what you expect?

Last edited by astrogeek; 10-01-2018 at 08:39 PM. Reason: typo(s)
 
Old 10-02-2018, 12:59 AM   #8
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
use the correct tool for the job.
sed & awk & co. are not so suited for HTML and XML.
i would use something like xmlstarlet to correctly identify the script element (maybe through "src=*cdn.cookielaw.org*" or some such) and remove it.

but i don't know what the subsequent OptanonWrapper() is about. i know nothing about javascript.

PS: what happened to "show us what you tried"?
 
1 members found this post helpful.
Old 10-02-2018, 08:51 AM   #9
individual
Member
 
Registered: Jul 2018
Posts: 315
Blog Entries: 1

Rep: Reputation: 233Reputation: 233Reputation: 233
Quote:
Originally Posted by astrogeek View Post
One effective way to ask text processing questions is to provide an example input file, which you have done, and show what you would expect the actual result to look like.
I was wondering about this at first, and then I did "View Page Source." That is the HTML source referenced in the OP.
 
Old 10-02-2018, 10:30 AM   #10
KenJackson
Member
 
Registered: Jul 2006
Location: Maryland, USA
Distribution: Fedora and others
Posts: 757

Rep: Reputation: 145Reputation: 145
If astrogeek accurately described what you're trying to do, this little script does it:
Code:
#!/usr/bin/awk -f
/<!-- Cookies Consent Notice \(Prod.+www.test.com.+\) end -->/ { del = 0 }
! del { print $0 }
/<!-- Cookies Consent Notice \(Prod.+www.test.com.+\) start -->/ { del = 1 }
You can either put the html file on the command line or pipe it in.

Notice that I used ".+" as a wildcard to match multiple characters. If you need to be more precise, you could replace them with actual text. I'm using "del" as a variable. All variables in awk are auto-initialized to zero.

If you want to delete the "Cookie Consent" comments also, you can swap the two long lines.
 
Old 10-02-2018, 04:19 PM   #11
wenzu
LQ Newbie
 
Registered: Oct 2018
Posts: 6

Original Poster
Rep: Reputation: Disabled
I have tried the following bash code but the file is not being changed

The idea is to run the sed command

Quote:
sed -e '305,315d' ./browse/abaca.html > ./browse/abaca.html.fixed
However after running the bash script I am getting

Quote:
Search Pattern: Cookies Consent Notice
Pattern found in file: Yes
[nb,li,lf]: 2 305 315
sed -e '305,315d' ./browse/abaca.html > ./browse/abaca.html.fixed
sed: -e expression #1, char 1: unknown command: `''


Quote:
if [ -n "$li" ] || [ -n "$lf" ] ; then
if (( nb == 1 )); then
dptn="'${li}d'"
elif (( nb == 2 )); then
dptn="'${li},${lf}d'"
fi

echo "sed -e $dptn $f > $f.fixed"

case $exec_sed in
yes|Yes)
sed -e $dptn $f > "$f.fixed"
;;
no|No)
echo "Execute: No"
;;
esac

fi
 
Old 10-02-2018, 04:28 PM   #12
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,269
Blog Entries: 24

Rep: Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196
Before we begin to debug your bash code, please take the time to again read through the suggestions offered, and questions asked to this point.

If your questions are to be based around a certain input file, abaca.html for example, then please post the relevant contents of that file along with an example of what you expect the result to be.

Your bash code is incomplete and mostly irrelevant to the questions asked, so let's not go there until we know what it is you are trying to accomplish.

This is also the first indication that you want to write this into an interactive script which takes arguments to be used in the replacement operation. Please tell us clearly, and precisely what you are trying to accomplish so that we can try to resolve one problem at a time - help us help you!

Please review the Site FAQ for guidance in asking well formed questions.

Last edited by astrogeek; 10-02-2018 at 05:00 PM. Reason: typo
 
Old 10-02-2018, 05:11 PM   #13
wenzu
LQ Newbie
 
Registered: Oct 2018
Posts: 6

Original Poster
Rep: Reputation: Disabled
I have got it working now.

I wanted to change the following file

Code:
<script src="https://cdn.cookielaw.org/consent/bcd12cae-aa3c-4ffb-ac51-8d41462cdcb4.js" type="text/javascript" charset="UTF-8"></script>
        <!-- Cookies Consent Notice (Production CDN, www.test.com, en-US) start -->

<script src="https://cdn.cookielaw.org/consent/bcd12cae-aa3c-4ffb-ac51-8d41462cdcb4.js" type="text/javascript" charset="UTF-8"></script>

<script type="text/javascript">

function OptanonWrapper() { }

</script>

<!-- Cookies Consent Notice (Production CDN, www.test.com, en-US) end -->
      </body>
    </html>
I want to get a file as follows

Code:
<script src="https://cdn.cookielaw.org/consent/bcd12cae-aa3c-4ffb-ac51-8d41462cdcb4.js" type="text/javascript" charset="UTF-8"></script>
      </body>
    </html>

Last edited by wenzu; 10-02-2018 at 05:14 PM.
 
Old 10-02-2018, 06:01 PM   #14
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,269
Blog Entries: 24

Rep: Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196
Very good!

It is also helpful if you can post your solution here so that others who are looking for a solution to similar problems can benefit from the experience.

When you are satisfied that the problem is solved please use the Thread Tools list at top of the first post to mark the thread as SOLVED.

Thanks, and good luck!
 
Old 10-03-2018, 03:05 AM   #15
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 900

Rep: Reputation: 290Reputation: 290Reputation: 290
Quote:
Originally Posted by wenzu View Post
I have a list of files and want to remove the lines between
the "Cookies Consent Notice" lines.
Quote:
Originally Posted by wenzu View Post
I wanted to change the following file
Code:
<script src="https://cdn.cookielaw.org/consent/bcd12cae-aa3c-4ffb-ac51-8d41462cdcb4.js" type="text/javascript" charset="UTF-8"></script>
        <!-- Cookies Consent Notice (Production CDN, www.test.com, en-US) start -->

<script src="https://cdn.cookielaw.org/consent/bcd12cae-aa3c-4ffb-ac51-8d41462cdcb4.js" type="text/javascript" charset="UTF-8"></script>

<script type="text/javascript">

function OptanonWrapper() { }

</script>

<!-- Cookies Consent Notice (Production CDN, www.test.com, en-US) end -->
      </body>
    </html>
I want to get a file as follows

Code:
<script src="https://cdn.cookielaw.org/consent/bcd12cae-aa3c-4ffb-ac51-8d41462cdcb4.js" type="text/javascript" charset="UTF-8"></script>
      </body>
    </html>
Then it seems you've changed your requirement on the road because now you don't want the "Cookies Consent Notice" lines anymore.
Anyway, a very good starting point has been given in #2 as you only need to change a few characters to make it work for your case.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Removing all lines in a file containing sameword. secondchanti Linux - Newbie 10 08-06-2010 12:16 PM
removing blank lines in a text file christianunix Linux - Newbie 11 10-29-2007 12:24 AM
Removing new lines from a file psandeepnair1985 Programming 5 03-25-2007 11:46 AM
removing lines from file script iluvatar Programming 9 08-20-2004 05:49 AM
Removing lines from file Aylar Programming 2 04-22-2004 06:34 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:51 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration