LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Python: How to use the re module? (http://www.linuxquestions.org/questions/programming-9/python-how-to-use-the-re-module-697958/)

donnied 01-17-2009 12:25 PM

Python: How to use the re module?
 
In Python if I want to use the re module with a file how would I do that? Open the file and read in with readlines is what I'm imagining:

Code:

for line in open('file1.csv').readlines():
        lines = [line.rstrip()]
        m = re.match('the string I want to match', lines)
        m.group()

but it doesn't work. I'm basically trying to emulate grep '^[0-9](6)'

And what would be the Python equivalent of sed 's/cat/dog/g'?
(I don't want to use sed or the OS module.)

ntubski 01-17-2009 05:26 PM

Maybe you are over thinking this.

Code:

for line in open('in.txt').readlines():
    if re.match(r'^[0-9]\(6\)', line):
        print line,

Note that the regex syntax of python is like egrep rather than grep (thus the back-slashes).

Quote:

And what would be the Python equivalent of sed 's/cat/dog/g'?
Code:

for line in open('in.txt').readlines():
    print re.sub('cat', 'dog', line),

Both these examples should probably use re.compile outside of the loop for efficiency.

donnied 01-19-2009 12:26 PM

Thank you I ended up using:
Code:

g4 = open('workfile4', 'wb')

for line in open('workfile3').readlines():
    if re.match(r'^[0-9]{6}', line):
        studentid = re.match(r'^[0-9]{6}', line).group()
    else:
        print studentid, ",", line
        class1 = studentid, ",", line
        s4 =  str(class1)
        g4.write(s4)

g4.close()


To clean the file(s) up I used a lot of:
Code:

scriptutil.freplace('.', shellglobs=('workfile3',),regexl=((r'\\$',r'', None),))
scriptutil.freplace('.', shellglobs=('example',),regexl=((r'([A-Z])\,([A-Z])',r'\1\2', None),))

I'm not sure how efficient the scriptil function is, but I am only working with about one thousand lines of text.


All times are GMT -5. The time now is 11:59 AM.