[SOLVED] how to change the content of lines in an html file using regex/grep
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I would argue that the reason for needing local copies is evident, though, in fairness, maybe not to non-dance playing amateur musicians.
Musicians playing for folk dancing often used to to need to carry bags of sheet music so that they could comply with various callers requirements for tune sets for dances.
This, for many, is not now necessary with digital scores being more convenient.
So, the answer to 'Why ?' is because it is not usual to have a reliable internet connection when you are sitting on a farmers wagon in a barn playing for a dance.
Last edited by toothwright; 10-21-2019 at 05:46 AM.
I would argue that the reason is evident, though in fairness may be not to non dance playing amateur musicians.
Musicians playing for folk dancing often used to to carry bags of sheet music so that they could comply with the callers wishes.
This is not now necessary, digital scores are more convenient.
So, the answer to 'Why not leave it as it is?' is because it is not usual to have an internet connection when you are sitting on a farmers wagon in a barn-dance.
Aha! So, You have copied the scores to your laptop and now want to be able to display them. So why use a web page to do that? Just open them with the file browser using the PDF viewer. Mayhaps you’re working harder than you need to.
The musicians I know and play with are very comfortable with their three ring binders that contain only lyrics.. Most of them don’t even read music, and a couple are complete Luddites who wouldn’t know how to turn a laptop on...
Hm. If I understand well you want to edit that html? Browsers usually open the pages in read-only mode, but obviously you can edit your own files (using a html editor).
Also: many talented amateur dance musicians seem to have an almost infinite memory for scores; others require an android pad.
The database is not as straight forward as it appears. Some entries bring up whole sets, many others are links to individual tunes.
I like to use HTML to arrange the order that I want. Just the browser does not allow this personalisation.
OK. I get that.
But I don’t how you would do that by changing “the content of lines in an html file using regexp...” I think you’ll just need to sit down and code it.
SciTE has a copy function: ctrl+d will duplicate a line (or what is selected), then the copy can be edited to change the page/document being linked to.
Hopefully you’ve been working on that since you posted in #22 and are almost done...
@pan64
Yes, that is exactly what I am doing, (using bluefish).
My original question resulted from a hope that automation of the repetitive parts of the edit to reduce manual interference would be possible.
It has proved difficult to analyse the HTML lines I need to change in order to develop a working regex so I have returned to manual edits - 1000 lines to go!.
again, wget can download a page and all the links contained on that page, store on the local disk/pendrive and also rewrite the links [on the donwloaded page] to use the downloaded files. https://superuser.com/questions/8006...asnt-specified
from the other hand you only need a bulk search/replace to modify the original url, do not need to do it line by line.
So, in the SciTE processed line for example, I would like to exchange the tune name "Ashokan farewell" with "MM245"
Problem is that both items vary in length in the tuples of the file and the names do not always use the UK font....this is why I'm stuck....
I'll explore fzf next, thank you for the example
Last edited by toothwright; 10-22-2019 at 10:50 AM.
there was mention of names being duplicate with variations ( the unique number )
so I pre-empted
Code:
sed 's@\(<li>\)\(.\+\)<a href="http.\+/[[:alpha:]]\+\([0-9]\+\)\(.\+\).htm.\+\([[:alpha:]]\{2\}\)\([0-9]\+\).\+@<li><a href="Tunes/\5\3\4.pdf">\5\3 \2</a></li>@'
notice that the early sample data is no longer touched
essentially the same
Code:
sed 's@\(<li>\)\(.\+\)\(<a href="\)http.\+/[[:alpha:]]\+\([0-9]\+\)\(.\+\).htm.\+\([[:alpha:]]\{2\}\)\([0-9]\+\)\(.\+\)@\1\3Tunes/\6\7\5.pdf">\6\4 \2\8@'
notice that \1 \2 \3 are the "chunks" wrapped in \(\)
Edit3
and this one is more like the original
Code:
<input sed 's@\(<li>\)\(.\+\)\(<a href="\)http.\+/[[:alpha:]]\+\([0-9]\+\)\(.\+\).htm\(.\+\)\([[:alpha:]]\{2\}\)\([0-9]\+\)\(.\+\)@\1\2\3Tunes/\7\4\5.pdf\6\7\8\9@'
sed -E 's@(<li>)(.+)(<a href=")http.+/[[:alpha:]]+([0-9]+)(.+).htm">([[:alpha:]]{2})([0-9]+)(.+)@\1\3Tunes/\6\7\5.pdf">\6\4 \2\8@'
Code:
sed -E 's@http.+/[[:alpha:]]+([0-9]+)(.+).htm(">)([[:alpha:]]{2})([0-9]+)@Tunes/\4\1\2.pdf\3\4\5@'
technically the MM bit should be ([[:upper:]]{2})
if it is not always two, then
(">)([[:upper:]]+)([0-9]+)
so that is UpperCase letter 1 or more times and digit 1 or more times
you may have noticed I corrected another bad habit, the use of *
that is the previous match zero or more times, and a lot of the time is used incorrectly
I don't use sed much these days, and I have still not got rid of the bad habits I picked up following early examples
? is the previous 0 or 1 times
it does seem very confusing, but once you get your head around it it does make perfect sense,
curl secretsiteplaceholder/mm/sheets.htm | \
sed -E '/sheet_list|^\//s@/mm/([[:upper:]]+[0-9]+.+).htm@Tunes/\1.pdf@ \
> sheetmusic.htm
that is much cleaner
curl is probably not installed by default
you don't *need* it, just run the sed on local copy
edit: if the local copy has http link in it
Code:
sed -E '/sheet_list|^\//s@http.+/mm/([[:upper:]]+[0-9]+.+).htm@Tunes/\1.pdf
real sample
Code:
<p class="sheet_list_title"><a name="MM9001"><a href="/mm/MM09001_pipes_from_the_world_of_bash.htm">MM9001 <span class="sheet_title">pipes from the world of bash</span></a></a></p>
<p class="sheet_list_tunes">The Chelsea Flower Show
/ <span class="disabled">Choo Choo</span>
/ Acid Burns <a href="/mm/MM09002_wooden_shoe_antlerpipes.htm">MM9002</a></p>
@Firerat
I should like to thank you for the analysis and programming suggestions.
It is very patient of you to try to lead me to a solution and I am trying to implement your technique.
I shall post when I make any progress.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.