editing a very large HTML file (or, extracting URLs from a file)
I've worked out the answer, but since I'd already written out this question, I'll post it anyway, in case someone finds it useful.
I have a file, made up of concatenated HTML files. I was going to open it and do some sorting, and search and replace work. (The aim is to make a tab-delimited file of urls for a Google custom search).
But now it won't open in gedit - too big at 600kb, I guess. And if I try to open it in OpenOffice, it opens it as HTML, in a semi-WYSIWYG mode rather than as source, in spite of the .txt suffix.
I can view the source by opening it in Firefox, but when I copy and paste, only part of the file is pasted (but how much depends which program I'm copying to).
Can I set OpenOffice to open it as text? Or is there another WYSIWYG program that will let me edit a large file like this? (I'd rather not learn to use a terminal based editors to do one simple task).
Or, (and perhaps this is more useful) is there a program or line command that will let me extract just the urls from the file?
Open in Opera, view source. Choose Edit -> Select all.*
Copy, and paste into OpenOffice. No problem. No idea why, but it works.
* ctrl-A doesn't work for some reason - several shortcuts don't work in Opera in Ubuntu, don't know about other distros.