LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Using sed with wildcards (https://www.linuxquestions.org/questions/linux-newbie-8/using-sed-with-wildcards-846664/)

elliot01 11-26-2010 04:22 AM

Using sed with wildcards
 
Hi all,

Can someone please help me with sed and deleting text using wildards.

I have, for instance, the following text:
Code:

Nov 24 14:27:40 FGT50B123456789 date=2010-11-24,time=14:27:45,devname=FGT50B123456789,device_id=FGT50B123456789,log_id=0315012544,type=webfilter,subtype=urlfilter,pri=warning,urlfilter_idx=185,urlfilter_list="Main List",vd="root",policyid=1,identidx=0,serial=14512345,user="N/A",group="N/A",src=10.11.22.144,sport=53706,src_port=53706,src_int="internal",dst=64.154.84.32,dport=80,dst_port=80,dst_int="wan1",service="http",hostname="peach.bskyb.com",profiletype="Webfilter_Profile",profilegroup="N/A",profile="Main",status="blocked",req_type="referral",url="/HG?hc=&hb=DM57021769NA84EN3%3BDM53111942ZM&cd=1&hv=6&n=/SportingLife%20-%20Racing%20-%20Results%20-%20Fast%20-%",msg="URL was blocked because it is in the URL filter list"
This is one of thousands of lines in a text.

I am trying to use sed to delete the field and value 'urlfilter_idx=185' (the value maybe be different on each line).

I have tried:
sed -i 's/,urlfilter_idx=*,//' temp2.txt (does nothing)
sed -i 's/,urlfilter_idx=.*,//' temp2.txt (deletes everything up to the last comma)

How can I get sed to delete 'urlfilter_idx=' and any following value up to the next comma? :confused:

Thanks in advance

druuna 11-26-2010 04:39 AM

Hi,

Sed is greedy by nature that is why your second example removes all after the last comma. A more strict regular expression is needed.

It looks like there is always a number after the urlfilter_idx= value, if that is the case try this:

sed 's/urlfilter_idx=[0-9]*,//' infile

If there are multiple instances of urlfilter_idx= on one line, you need to add sed g option:

sed 's/urlfilter_idx=[0-9]*,//g' infile

Hope this helps.

syg00 11-26-2010 04:42 AM

Quote:

Originally Posted by elliot01 (Post 4171592)
I have tried:
sed -i 's/,urlfilter_idx=*,//' temp2.txt (does nothing)
sed -i 's/,urlfilter_idx=.*,//' temp2.txt (deletes everything up to the last comma)

Do you understand why these produced the results they do ?.
Think about the issue - you need to look for all non-comma characters up to the comma. If you're lucky and they are always digits, that makes it much easier. regex is (generally) greedy, and will look for the greatest/longest fit - be as specific as possible.

Damn I type slowwww .... ;)

elliot01 11-26-2010 05:04 AM

Hi guys,

sed 's/urlfilter_idx=[0-9]*,//' infile

works a treat for that particular field, but there are other fields I would also want to delete, for instance 'urlfilter_list="Main List"'.

This is why it would have been more ideal to have sed matching 'myfieldname*,' (i.e. up to the next comma - not the last!). For the other example, based on your suggestions, I seem to be getting the following to work:

sed -i 's/urlfilter_list=[a-zA-Z" ]*,//' temp2.txt

Does this seem ok?

druuna 11-26-2010 05:15 AM

Hi,

The principal I mentioned in post #2 stays the same: sed is greedy and you need to create a regular expression that only catches that what is needed.

All fields in the posted line are separated by a comma (and I assume that none of the actual values contain a comma). Thus the comma can be used to make sed none-greedy.

BTW: Although [a-Z] is recognized by some commands, you should not use it (sed complains about an invalid range). Use [a-zA-Z] or [[:alpha:]]

sed -i 's/urlfilter_list=[[:alpha:]]*,//' temp2.txt

BTW2: You can string sed commands together: sed -e 's/urlfilter_idx=[0-9]*,//' -e 's/urlfilter_list=[[:alpha:]]*,//' infile

Hope this helps.

elliot01 11-26-2010 05:18 AM

I revised it to equal:
sed -i 's/urlfilter_list=[a-Z0-9" ]*,//' temp2.txt

Thanks for you help guys :)

EDIT: Just read your update, I will use :alpha: etc :)

syg00 11-26-2010 05:23 AM

That will work for all the characters you can think of (and include). Better to search for everything except a comma. Try this
Code:

sed -i 's/urlfilter_list=[^,]*,//' temp2.txt
(untested)


All times are GMT -5. The time now is 06:23 PM.