editing a very large HTML file (or, extracting URLs from a file)
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
editing a very large HTML file (or, extracting URLs from a file)
I've worked out the answer, but since I'd already written out this question, I'll post it anyway, in case someone finds it useful.
---
PROBLEM:
I have a file, made up of concatenated HTML files. I was going to open it and do some sorting, and search and replace work. (The aim is to make a tab-delimited file of urls for a Google custom search).
But now it won't open in gedit - too big at 600kb, I guess. And if I try to open it in OpenOffice, it opens it as HTML, in a semi-WYSIWYG mode rather than as source, in spite of the .txt suffix.
I can view the source by opening it in Firefox, but when I copy and paste, only part of the file is pasted (but how much depends which program I'm copying to).
Can I set OpenOffice to open it as text? Or is there another WYSIWYG program that will let me edit a large file like this? (I'd rather not learn to use a terminal based editors to do one simple task).
Or, (and perhaps this is more useful) is there a program or line command that will let me extract just the urls from the file?
---
SOLUTION:
Open in Opera, view source. Choose Edit -> Select all.*
Copy, and paste into OpenOffice. No problem. No idea why, but it works.
* ctrl-A doesn't work for some reason - several shortcuts don't work in Opera in Ubuntu, don't know about other distros.
Another obvious solution would have been using console; grep, sed and awk can do pretty good job with picking up data from large files, and tying that all up in a shell script usually makes it even better. No need to open big files in big applications, just feed the text to the basic Unix (in this case Linux) tools and have it done. Or if you don't want to play around to get it perfect, they'll at least help you shrink the data size a lot smaller, so you can then open the result with gedit or something else, and pick up what you wanted to.
To be honest, if you are going to stay with Linux, knowing 1 cli editor is going to come in very handy. Sometimes the GUI tool isn't the right approach eg if GUI breaks.
Personally I use vim,and that def won't choke on 600K. I've gone into files of 10s of MB, possibly larger, just runs a bit slower as the file gets bigger.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.