LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-26-2010, 04:22 AM   #1
elliot01
Member
 
Registered: Jun 2009
Location: UK
Distribution: CentOS / RedHat
Posts: 89

Rep: Reputation: 16
Using sed with wildcards


Hi all,

Can someone please help me with sed and deleting text using wildards.

I have, for instance, the following text:
Code:
Nov 24 14:27:40 FGT50B123456789 date=2010-11-24,time=14:27:45,devname=FGT50B123456789,device_id=FGT50B123456789,log_id=0315012544,type=webfilter,subtype=urlfilter,pri=warning,urlfilter_idx=185,urlfilter_list="Main List",vd="root",policyid=1,identidx=0,serial=14512345,user="N/A",group="N/A",src=10.11.22.144,sport=53706,src_port=53706,src_int="internal",dst=64.154.84.32,dport=80,dst_port=80,dst_int="wan1",service="http",hostname="peach.bskyb.com",profiletype="Webfilter_Profile",profilegroup="N/A",profile="Main",status="blocked",req_type="referral",url="/HG?hc=&hb=DM57021769NA84EN3%3BDM53111942ZM&cd=1&hv=6&n=/SportingLife%20-%20Racing%20-%20Results%20-%20Fast%20-%",msg="URL was blocked because it is in the URL filter list"
This is one of thousands of lines in a text.

I am trying to use sed to delete the field and value 'urlfilter_idx=185' (the value maybe be different on each line).

I have tried:
sed -i 's/,urlfilter_idx=*,//' temp2.txt (does nothing)
sed -i 's/,urlfilter_idx=.*,//' temp2.txt (deletes everything up to the last comma)

How can I get sed to delete 'urlfilter_idx=' and any following value up to the next comma?

Thanks in advance

Last edited by elliot01; 11-26-2010 at 04:33 AM.
 
Old 11-26-2010, 04:39 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Sed is greedy by nature that is why your second example removes all after the last comma. A more strict regular expression is needed.

It looks like there is always a number after the urlfilter_idx= value, if that is the case try this:

sed 's/urlfilter_idx=[0-9]*,//' infile

If there are multiple instances of urlfilter_idx= on one line, you need to add sed g option:

sed 's/urlfilter_idx=[0-9]*,//g' infile

Hope this helps.
 
Old 11-26-2010, 04:42 AM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,140

Rep: Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123
Quote:
Originally Posted by elliot01 View Post
I have tried:
sed -i 's/,urlfilter_idx=*,//' temp2.txt (does nothing)
sed -i 's/,urlfilter_idx=.*,//' temp2.txt (deletes everything up to the last comma)
Do you understand why these produced the results they do ?.
Think about the issue - you need to look for all non-comma characters up to the comma. If you're lucky and they are always digits, that makes it much easier. regex is (generally) greedy, and will look for the greatest/longest fit - be as specific as possible.

Damn I type slowwww ....

Last edited by syg00; 11-26-2010 at 04:43 AM.
 
1 members found this post helpful.
Old 11-26-2010, 05:04 AM   #4
elliot01
Member
 
Registered: Jun 2009
Location: UK
Distribution: CentOS / RedHat
Posts: 89

Original Poster
Rep: Reputation: 16
Hi guys,

sed 's/urlfilter_idx=[0-9]*,//' infile

works a treat for that particular field, but there are other fields I would also want to delete, for instance 'urlfilter_list="Main List"'.

This is why it would have been more ideal to have sed matching 'myfieldname*,' (i.e. up to the next comma - not the last!). For the other example, based on your suggestions, I seem to be getting the following to work:

sed -i 's/urlfilter_list=[a-zA-Z" ]*,//' temp2.txt

Does this seem ok?

Last edited by elliot01; 11-26-2010 at 05:05 AM.
 
Old 11-26-2010, 05:15 AM   #5
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

The principal I mentioned in post #2 stays the same: sed is greedy and you need to create a regular expression that only catches that what is needed.

All fields in the posted line are separated by a comma (and I assume that none of the actual values contain a comma). Thus the comma can be used to make sed none-greedy.

BTW: Although [a-Z] is recognized by some commands, you should not use it (sed complains about an invalid range). Use [a-zA-Z] or [[:alpha:]]

sed -i 's/urlfilter_list=[[:alpha:]]*,//' temp2.txt

BTW2: You can string sed commands together: sed -e 's/urlfilter_idx=[0-9]*,//' -e 's/urlfilter_list=[[:alpha:]]*,//' infile

Hope this helps.
 
1 members found this post helpful.
Old 11-26-2010, 05:18 AM   #6
elliot01
Member
 
Registered: Jun 2009
Location: UK
Distribution: CentOS / RedHat
Posts: 89

Original Poster
Rep: Reputation: 16
I revised it to equal:
sed -i 's/urlfilter_list=[a-Z0-9" ]*,//' temp2.txt

Thanks for you help guys

EDIT: Just read your update, I will use :alpha: etc

Last edited by elliot01; 11-26-2010 at 05:19 AM.
 
Old 11-26-2010, 05:23 AM   #7
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,140

Rep: Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123
That will work for all the characters you can think of (and include). Better to search for everything except a comma. Try this
Code:
sed -i 's/urlfilter_list=[^,]*,//' temp2.txt
(untested)
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
need help with wildcards yanivmomo Programming 4 05-24-2010 07:37 AM
need help with wildcards liorpana Programming 2 05-12-2010 08:45 AM
Sed qestion - how to use wildcards tensigh Linux - Software 5 03-04-2010 09:55 PM
using wildcards nadroj Linux - General 5 01-28-2007 08:39 PM
Wildcards dazdaz Linux - Newbie 3 01-23-2005 05:33 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 01:06 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration