Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
11-26-2010, 04:22 AM
|
#1
|
|
Member
Registered: Jun 2009
Location: UK
Distribution: CentOS / RedHat
Posts: 55
Rep:
|
Using sed with wildcards
Hi all,
Can someone please help me with sed and deleting text using wildards.
I have, for instance, the following text:
Code:
Nov 24 14:27:40 FGT50B123456789 date=2010-11-24,time=14:27:45,devname=FGT50B123456789,device_id=FGT50B123456789,log_id=0315012544,type=webfilter,subtype=urlfilter,pri=warning,urlfilter_idx=185,urlfilter_list="Main List",vd="root",policyid=1,identidx=0,serial=14512345,user="N/A",group="N/A",src=10.11.22.144,sport=53706,src_port=53706,src_int="internal",dst=64.154.84.32,dport=80,dst_port=80,dst_int="wan1",service="http",hostname="peach.bskyb.com",profiletype="Webfilter_Profile",profilegroup="N/A",profile="Main",status="blocked",req_type="referral",url="/HG?hc=&hb=DM57021769NA84EN3%3BDM53111942ZM&cd=1&hv=6&n=/SportingLife%20-%20Racing%20-%20Results%20-%20Fast%20-%",msg="URL was blocked because it is in the URL filter list"
This is one of thousands of lines in a text.
I am trying to use sed to delete the field and value 'urlfilter_idx=185' (the value maybe be different on each line).
I have tried:
sed -i 's/,urlfilter_idx=*,//' temp2.txt (does nothing)
sed -i 's/,urlfilter_idx=.*,//' temp2.txt (deletes everything up to the last comma)
How can I get sed to delete 'urlfilter_idx=' and any following value up to the next comma?
Thanks in advance
Last edited by elliot01; 11-26-2010 at 04:33 AM.
|
|
|
|
11-26-2010, 04:39 AM
|
#2
|
|
LQ Veteran
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 8,705
|
Hi,
Sed is greedy by nature that is why your second example removes all after the last comma. A more strict regular expression is needed.
It looks like there is always a number after the urlfilter_idx= value, if that is the case try this:
sed 's/urlfilter_idx=[0-9]*,//' infile
If there are multiple instances of urlfilter_idx= on one line, you need to add sed g option:
sed 's/urlfilter_idx=[0-9]*,//g' infile
Hope this helps.
|
|
|
|
11-26-2010, 04:42 AM
|
#3
|
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 11,228
|
Quote:
Originally Posted by elliot01
I have tried:
sed -i 's/,urlfilter_idx=*,//' temp2.txt (does nothing)
sed -i 's/,urlfilter_idx=.*,//' temp2.txt (deletes everything up to the last comma)
|
Do you understand why these produced the results they do ?.
Think about the issue - you need to look for all non-comma characters up to the comma. If you're lucky and they are always digits, that makes it much easier. regex is (generally) greedy, and will look for the greatest/longest fit - be as specific as possible.
Damn I type slowwww .... 
Last edited by syg00; 11-26-2010 at 04:43 AM.
|
|
|
1 members found this post helpful.
|
11-26-2010, 05:04 AM
|
#4
|
|
Member
Registered: Jun 2009
Location: UK
Distribution: CentOS / RedHat
Posts: 55
Original Poster
Rep:
|
Hi guys,
sed 's/urlfilter_idx=[0-9]*,//' infile
works a treat for that particular field, but there are other fields I would also want to delete, for instance 'urlfilter_list="Main List"'.
This is why it would have been more ideal to have sed matching 'myfieldname*,' (i.e. up to the next comma - not the last!). For the other example, based on your suggestions, I seem to be getting the following to work:
sed -i 's/urlfilter_list=[a-zA-Z" ]*,//' temp2.txt
Does this seem ok?
Last edited by elliot01; 11-26-2010 at 05:05 AM.
|
|
|
|
11-26-2010, 05:15 AM
|
#5
|
|
LQ Veteran
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 8,705
|
Hi,
The principal I mentioned in post #2 stays the same: sed is greedy and you need to create a regular expression that only catches that what is needed.
All fields in the posted line are separated by a comma (and I assume that none of the actual values contain a comma). Thus the comma can be used to make sed none-greedy.
BTW: Although [a-Z] is recognized by some commands, you should not use it (sed complains about an invalid range). Use [a-zA-Z] or [[:alpha:]]
sed -i 's/urlfilter_list=[[:alpha:]]*,//' temp2.txt
BTW2: You can string sed commands together: sed -e 's/urlfilter_idx=[0-9]*,//' -e 's/urlfilter_list=[[:alpha:]]*,//' infile
Hope this helps.
|
|
|
1 members found this post helpful.
|
11-26-2010, 05:18 AM
|
#6
|
|
Member
Registered: Jun 2009
Location: UK
Distribution: CentOS / RedHat
Posts: 55
Original Poster
Rep:
|
I revised it to equal:
sed -i 's/urlfilter_list=[a-Z0-9" ]*,//' temp2.txt
Thanks for you help guys
EDIT: Just read your update, I will use :alpha: etc 
Last edited by elliot01; 11-26-2010 at 05:19 AM.
|
|
|
|
11-26-2010, 05:23 AM
|
#7
|
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 11,228
|
That will work for all the characters you can think of (and include). Better to search for everything except a comma. Try this
Code:
sed -i 's/urlfilter_list=[^,]*,//' temp2.txt
(untested)
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 08:25 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|