LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-08-2003, 06:36 PM   #1
smokybobo
LQ Newbie
 
Registered: Feb 2003
Posts: 29

Rep: Reputation: 15
Regex search in files with python


Is it a non-trivial problem to do a regular expression search in a text file where the pattern spans multiple lines? I've been looking at the re module and I quite understand how to do regular expression searches or matches with a string, and I've successfully implemented such in some experimentation python programs I've written. However, in the case where you have a large text file (in tens of megabytes), I really don't want to have to do a 'file.readlines' and then join the whole furshlugginer thing into one giant string before doing a regex search on it with the regex pattern possibly spanning several lines. Currently I can do a search line by line, but at the moment, thinking up an easy way to do pattern matching across multiple lines just isn't coming to me. I can think of how to do it if I specifically coded the regex pattern matching myself to be specific to this problem, but I just don't want to do that because it seems like more work than I should be doing!

Eh... as in most things, the solution is probably so simple I'm over looking it... it's probably even in the python library reference even though I've combed through that thing many times over for all things regular expression related...

And on a semi-related note...

... in general, how would one handle replacing a block of text somewhere in the middle of a largish text file, again without putting the whole thing into memory? The problem I see here is that just doing a seek to the place where you want to start replacing, if the replacement is longer than the block that is being replaced, it will overwrite stuff that shouldn't be overwritten. I suppose that one can write to a temporary file all the changes that are made to the original file in some arbitrary format (like XML or some other custom format) where each change has the line number and position in the line where the change will take place, and then which lines will be replaced. For smallish files like configuration files, I suppose this approach is fine, but what if you have a some odd million lines long text file like a novel for example (if novels even aspire to such a great number of lines)? And if the way I've described it is just about the most sensible obvious way to do it, what then can be done to speed up the process?

Although I suppose I probably won't come across a situation where a single text file will be abnormally large enough to have an impact on processing speed, still, it would be nice to know, especially since this particular problem along with the regex search across multiple lines has all but befuddled me...

Thanks in advance for any insights, or even successful search keywords! I've been looking for a while myself, but if such exists somewhere I've probably been using bad keywords AND have been trying to do web searches when my brain hasn't quite started functioning yet (me is a HORRIBLE morning person)...
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Python search and replace Accordion Programming 1 02-22-2005 07:54 PM
Search for Files vs. ls albean Linux - Newbie 2 11-14-2004 02:35 PM
Python compilation error, "module search path"etc.. mindfestival Linux - Newbie 0 05-31-2004 02:52 AM
Perl Regex Help -- Readin In Text Files smaida Programming 1 04-04-2004 11:27 PM
How do I change python search path Giallo998 Linux - Software 1 10-22-2003 07:27 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:51 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration