LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 02-17-2021, 12:20 AM   #1
gregors
Member
 
Registered: Mar 2018
Posts: 177

Rep: Reputation: Disabled
Question awk anyone?


Hi there!

I want to make a news page (html with lots of pictures) more handy. Since my cat|tr|grep chain didn't lead to what I want, I think that awk might be able to do the job.
Problem is that I don't just want single lines but some combination of corresponding lines.

To make long things short: I want to look for a line with

<span class="teaser__headline">++ Japan beginnt mit Impfprogramm ++</span>

and take its (visible) text, combining it with the text from the following line that starts with

<p class="teaser__shorttext">Fünf Monate ...

So I need to look for "teaser__headline", take the text from that paragraph and make it followed by the text from the line that contains "teaser__shorttext".

The result should look similar to

++ Japan beginnt mit Impfprogramm ++
Fünf Monate ...


The next (?) step would be to see how things are linked with <a> tags and use them to make my result clickable ...

If there's a better forum my question please let me know. And if you don't know awk just like me: sorry to bother ...

TIA

Gregor

Last edited by gregors; 02-17-2021 at 12:22 AM.
 
Old 02-17-2021, 12:54 AM   #2
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
First of all I'd look if that site doesn't have an RSS/Atom feed that might already contain a compressed version of the news.

Otherwise, I wouldn't use awk/grep etc. but a tool that is designed to deal with HTML (and similar code like XML).
Two such tools are xmllint (part of libxml2) and xmlstarlet.
The thing you want to learn are "xpath queries". Yes, there's a small learning curve but you'll soon appreciate working with the code you're parsing, not against it.

Just look around for a suitable tutorial.

If you give us example code we can help more.
 
2 members found this post helpful.
Old 02-17-2021, 01:00 AM   #3
gregors
Member
 
Registered: Mar 2018
Posts: 177

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by ondoho View Post
First of all I'd look if that site doesn't have an RSS/Atom feed that might already contain a compressed version of the news.
Thanks a lot for this hint! In fact there is an RSS feed for that page.

Gregor
 
Old 02-17-2021, 01:04 AM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
awk is awesome, and this is certainly do-able - but you'll get a bunch of recommendations to use a "proper" tool that understands the format. pup in one such, and CPAN will have a few as well if you're into perl.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sed inside awk or awk inside awk maddyfreaks Linux - Newbie 4 06-29-2016 01:10 PM
[SOLVED] Once again... awk.. awk... awk shivaa Linux - Newbie 13 12-31-2012 04:56 AM
awk question on handling *.CSV "text fields" in awk jschiwal Programming 8 05-27-2010 06:23 AM
shell command using awk fields inside awk one71 Programming 6 06-26-2008 04:11 PM
Some comments on awk and awk scripts makyo Programming 4 03-02-2008 05:39 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 02:49 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration