LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Python: replacing all words found on a list (http://www.linuxquestions.org/questions/programming-9/python-replacing-all-words-found-on-a-list-795492/)

General 03-15-2010 07:30 AM

Python: replacing all words found on a list
 
I've simplified the code, for the purpose of this example:

Code:

#!/usr/bin/env python
# coding=utf-8-sig

import re

nouns = ['cow','cowboy']

text = 'thecowatethecowboy'

for x in nouns:
        if re.search(x, text):
                text = text.replace(x, 'NOUN')

print text

The result:

theNOUNatetheNOUNboy

Whereas I want:

theNOUNatetheNOUN

The fix I found was:

Code:

nouns = ['cowboy','cow']
This works in my short example, but for some mysterious reason, of which I am unable to discover, when implimented in my full code, the shorter items are still replaced first, thus I get the 'NOUNboy' problem.

In other words, I can't seem to get this solution to work in my actual code, so I think I need a more robust solution. How can I guarantee that is will replace longer items first?

troop 03-15-2010 07:35 AM

Code:

def bylength(word1, word2): return len(word2) - len(word1)
nouns.sort(cmp=bylength)


General 03-15-2010 07:50 AM

I tried implimenting your suggestion, then found why I couldn't reproduce the problem: nouns is a dictionary.

Code:

#!/usr/bin/env python
# coding=utf-8-sig

import re

nouns = {'carport': 1,
        'car': 2}

text = 'thecarwentintothecarport'

for x in nouns:
        if re.search(x, text):
                text = text.replace(x, 'NOUN')

print text

It seems this is always processed smallest to largest, regardless of the order.

grail 03-15-2010 09:57 AM

How about:

Code:

#!/usr/bin/env python
# coding=utf-8-sig

mylist = ['cow','cowboy']

text = 'thecowatethecowboy'

for x in reversed(mylist):
    text = text.replace(x, 'NOUN')

print text



All times are GMT -5. The time now is 05:53 AM.