Regex search in files with python
Is it a non-trivial problem to do a regular expression search in a text file where the pattern spans multiple lines? I've been looking at the re module and I quite understand how to do regular expression searches or matches with a string, and I've successfully implemented such in some experimentation python programs I've written. However, in the case where you have a large text file (in tens of megabytes), I really don't want to have to do a 'file.readlines' and then join the whole furshlugginer thing into one giant string before doing a regex search on it with the regex pattern possibly spanning several lines. Currently I can do a search line by line, but at the moment, thinking up an easy way to do pattern matching across multiple lines just isn't coming to me. I can think of how to do it if I specifically coded the regex pattern matching myself to be specific to this problem, but I just don't want to do that because it seems like more work than I should be doing!
Eh... as in most things, the solution is probably so simple I'm over looking it... it's probably even in the python library reference even though I've combed through that thing many times over for all things regular expression related...
And on a semi-related note...
... in general, how would one handle replacing a block of text somewhere in the middle of a largish text file, again without putting the whole thing into memory? The problem I see here is that just doing a seek to the place where you want to start replacing, if the replacement is longer than the block that is being replaced, it will overwrite stuff that shouldn't be overwritten. I suppose that one can write to a temporary file all the changes that are made to the original file in some arbitrary format (like XML or some other custom format) where each change has the line number and position in the line where the change will take place, and then which lines will be replaced. For smallish files like configuration files, I suppose this approach is fine, but what if you have a some odd million lines long text file like a novel for example (if novels even aspire to such a great number of lines)? And if the way I've described it is just about the most sensible obvious way to do it, what then can be done to speed up the process?
Although I suppose I probably won't come across a situation where a single text file will be abnormally large enough to have an impact on processing speed, still, it would be nice to know, especially since this particular problem along with the regex search across multiple lines has all but befuddled me...
Thanks in advance for any insights, or even successful search keywords! I've been looking for a while myself, but if such exists somewhere I've probably been using bad keywords AND have been trying to do web searches when my brain hasn't quite started functioning yet (me is a HORRIBLE morning person)...
|