match text over multiple lines Python

donnied · 05-23-2009, 07:08 PM

I'm trying to find two separate bits of data. I have a list with multiple entries:
Name: Bob
ID: 123 (I want this)
...
W:
1
2
2
3 (I want this)

I want to save the ID and then get whatever appears on the 8th line after W: I'm not sure how to match text over multiple lines. I was thinking along the lines of:

Code:

def id():
                if re.search(r'[0-9]{6}', line):
                        id =  re.search(r'[0-9]{6}', line).group()
                        return id

def clp():
                if re.search(r'WR.*', line):
                        clp = re.search(r'.*', line).group()
                        return clp

for line in open('workfile').readlines():
        print student_id()
        print clp()

ghostdog74 · 05-23-2009, 07:39 PM

if you want to use re module, compile your pattern with re.M|re.DOTALL to match multiple lines. however, looking at your case, there is really no need to use regex.
one way

Code:

f=open("file")
for line in f:
  if "ID" in line:
     id=line.split()[-1].strip()
  if "W:" in line:
     for i in range(8): line=f.next()
     print "Eighth line after W: is ",line.strip()

second way is to use indexing, if you file is not too big, get everything into memory

Code:

data=open("file").read().split("\n")
for n,line in enumerate(data):
    if "ID" in line: 
        #get your id        
    if "W:" in line:
        print data[n+8]

donnied · 05-23-2009, 08:00 PM

Wow. With your code I got done in 5 minutes what I'd been tinkering with for hours. Thank you.

If I did want to use the re.M|re.DOTALL as mentioned how would I do that? (I'm curious for future reference and I think it might help the discrete chunking of things into modules.)

ghostdog74 · 05-23-2009, 08:19 PM

Code:

regex=re.compile("<pattern here>",re.M|re.DOTALL)

please read the docs! as well as Python regular expression HOWTO (google)

donnied · 05-23-2009, 08:56 PM

I read the docs, but I don't really get it until I've used it (after I've seen a concrete example).

Thanks again.

donnied · 05-24-2009, 09:18 AM

When I use the indexing solution I had data that didn't match.
for example:
Joe Brown 123456 3 4 1 3.5
Joe Brown 123456 3 4 1 4
or
Jane Doe 654321 1 3 1 2
Jane Doe 654321 1 3 1 3

Ooops. READING-WRITING also has 'WRITING'
I'll probably go with something like:

Code:

for n,line in enumerate(data):
        if "Name" in line:
                name=line.split()[-1].strip()
                print name

        if "ID" in line:
                id=line.split()[-1].strip()
                print id

        if "ORAL" in line:
                print data[n+8]
                oral = data[n+8]

        if "READING" in line:
                print data[n+8]
                reading = data[n+8]

        if "BROAD" in line:
                print data[n+8]
                broad = data[n+8]

        if "WRITING" in line:
                if "READING" in line:
                        print 'not what we want'
                else:
                        print data[n+8]
                        writing = data[n+8]
                        calpstring = name + ' ' + id + oral + reading + broad + writing

                wcm2.write(calpstring + '\n')

wcm2.close()