Using sed with wildcards
Hi all,
Can someone please help me with sed and deleting text using wildards. I have, for instance, the following text: Code:
Nov 24 14:27:40 FGT50B123456789 date=2010-11-24,time=14:27:45,devname=FGT50B123456789,device_id=FGT50B123456789,log_id=0315012544,type=webfilter,subtype=urlfilter,pri=warning,urlfilter_idx=185,urlfilter_list="Main List",vd="root",policyid=1,identidx=0,serial=14512345,user="N/A",group="N/A",src=10.11.22.144,sport=53706,src_port=53706,src_int="internal",dst=64.154.84.32,dport=80,dst_port=80,dst_int="wan1",service="http",hostname="peach.bskyb.com",profiletype="Webfilter_Profile",profilegroup="N/A",profile="Main",status="blocked",req_type="referral",url="/HG?hc=&hb=DM57021769NA84EN3%3BDM53111942ZM&cd=1&hv=6&n=/SportingLife%20-%20Racing%20-%20Results%20-%20Fast%20-%",msg="URL was blocked because it is in the URL filter list" I am trying to use sed to delete the field and value 'urlfilter_idx=185' (the value maybe be different on each line). I have tried: sed -i 's/,urlfilter_idx=*,//' temp2.txt (does nothing) sed -i 's/,urlfilter_idx=.*,//' temp2.txt (deletes everything up to the last comma) How can I get sed to delete 'urlfilter_idx=' and any following value up to the next comma? :confused: Thanks in advance |
Hi,
Sed is greedy by nature that is why your second example removes all after the last comma. A more strict regular expression is needed. It looks like there is always a number after the urlfilter_idx= value, if that is the case try this: sed 's/urlfilter_idx=[0-9]*,//' infile If there are multiple instances of urlfilter_idx= on one line, you need to add sed g option: sed 's/urlfilter_idx=[0-9]*,//g' infile Hope this helps. |
Quote:
Think about the issue - you need to look for all non-comma characters up to the comma. If you're lucky and they are always digits, that makes it much easier. regex is (generally) greedy, and will look for the greatest/longest fit - be as specific as possible. Damn I type slowwww .... ;) |
Hi guys,
sed 's/urlfilter_idx=[0-9]*,//' infile works a treat for that particular field, but there are other fields I would also want to delete, for instance 'urlfilter_list="Main List"'. This is why it would have been more ideal to have sed matching 'myfieldname*,' (i.e. up to the next comma - not the last!). For the other example, based on your suggestions, I seem to be getting the following to work: sed -i 's/urlfilter_list=[a-zA-Z" ]*,//' temp2.txt Does this seem ok? |
Hi,
The principal I mentioned in post #2 stays the same: sed is greedy and you need to create a regular expression that only catches that what is needed. All fields in the posted line are separated by a comma (and I assume that none of the actual values contain a comma). Thus the comma can be used to make sed none-greedy. BTW: Although [a-Z] is recognized by some commands, you should not use it (sed complains about an invalid range). Use [a-zA-Z] or [[:alpha:]] sed -i 's/urlfilter_list=[[:alpha:]]*,//' temp2.txt BTW2: You can string sed commands together: sed -e 's/urlfilter_idx=[0-9]*,//' -e 's/urlfilter_list=[[:alpha:]]*,//' infile Hope this helps. |
I revised it to equal:
sed -i 's/urlfilter_list=[a-Z0-9" ]*,//' temp2.txt Thanks for you help guys :) EDIT: Just read your update, I will use :alpha: etc :) |
That will work for all the characters you can think of (and include). Better to search for everything except a comma. Try this
Code:
sed -i 's/urlfilter_list=[^,]*,//' temp2.txt |
All times are GMT -5. The time now is 06:23 PM. |