LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 03-04-2013, 05:01 PM   #1
CaptainDerp
LQ Newbie
 
Registered: Mar 2013
Posts: 12

Rep: Reputation: Disabled
Remove All Objects after the first whitespace on each line using Sed? Help Please.


I have a list of urls, and they have comments on each line, I need to clean all that junk out of the txt file so only the domains remain, so I figured the easiest way to do this is with sed/awk/grep but I really stink at using these tools right now.

For example:

somesite.com blah blah/blah some blah/html
anothersiter.ru dfsdf/rewer /r/wer er wr/
website.bg crap more crap /morecrap.html
anotherfrickingsite.fr randomstuff

Required Output:

somesite.com
anothersiter.ru
website.bg
anotherfrickingsite.fr


So does anyone know who to Remove All Objects after the first whitespace on each line using Sed?

Ive googled hard and only found how to remove leading and trailing whitespaces. Which is not helping


Also, how the heck do you remove a forward or backward slash with sed??

sed 's/somecrap//g' crap.txt > crap.out works for objects, but not with a forward slash.

Last edited by CaptainDerp; 03-04-2013 at 05:08 PM. Reason: title was misleading
 
Old 03-04-2013, 05:36 PM   #2
Kustom42
Senior Member
 
Registered: Mar 2012
Distribution: Red Hat
Posts: 1,568

Rep: Reputation: 411Reputation: 411Reputation: 411Reputation: 411Reputation: 411
I'm about to blow your mind....

Code:
cut -d " " -f1
or if you want to use awk

Code:
awk '{print $1}'

Either of those should work just fine.
 
Old 03-04-2013, 05:37 PM   #3
rigor
Member
 
Registered: Sep 2011
Posts: 135

Rep: Reputation: Disabled
When I put your sample data in a file named data.txt, and run this command:

Code:
sed  's/\(^[^ ]\+\)\(.\)\+/\1/'  <  data.txt
I get your sample output. The following is a way to replace all forward slashes in a line, with the word "barf":

Code:
sed  's/\//barf/g' <  data.txt
It's aaaaaaall about escaping. If you use slashes to delimit portions of commands, then you have to escape the slash, in the second command. Although, you can also use characters other than slashes as delimiters in commands, then there's no need to escape the slash for pattern matching:

sed 's{/{barf{g' < data.txt

BTW, since this is a non-Linux forum, I don't what environment you are using. But if you are using an environment that has the info command, which is available, for example, for Cygwin running under MS-Windows, then from a Cygwin shell environment, running the command:

info sed

will allow you to go through the info document about sed, which is greatly more detailed than the typical manual page for sed. In particular, for what you are doing, you might focus on the section for the "s" command, the section of examples, possibly using the section of regular expressions for anything that isn't sufficiently explained in the other two sections.

Yes, cut will very simply do the exact task you've mentioned cutting out everything after the first whitespace. If you find you need something more involved, you might want to use sed or awk.

If you encounter a more complicated data line, where it's not as simple to isolate the URL, with sed or awk you can use more complicated patterns that should recognize a URL almost no matter where in a line it is.

Hope this helps.

Last edited by rigor; 03-04-2013 at 05:54 PM.
 
Old 03-04-2013, 05:38 PM   #4
Kustom42
Senior Member
 
Registered: Mar 2012
Distribution: Red Hat
Posts: 1,568

Rep: Reputation: 411Reputation: 411Reputation: 411Reputation: 411Reputation: 411
Quote:
Originally Posted by CaptainDerp View Post

Also, how the heck do you remove a forward or backward slash with sed??

sed 's/somecrap//g' crap.txt > crap.out works for objects, but not with a forward slash.


You need to escape your backslash so it is interpreted as an actual backslash.

Just put a \ infront of it so it looks like \/ as opposed to just /. In bash, "\", says to interpret the character as is and ignore any special meaning.
 
Old 03-04-2013, 05:40 PM   #5
Kustom42
Senior Member
 
Registered: Mar 2012
Distribution: Red Hat
Posts: 1,568

Rep: Reputation: 411Reputation: 411Reputation: 411Reputation: 411Reputation: 411
Quote:
Originally Posted by rigor View Post
When I put your sample data in a file named data.txt, and run this command:

Code:
sed  's/\(^[^ ]\+\)\(.\)\+/\1/'  <  data.txt
I get your sample output. The following is a way to replace all forward slashes in a line, with the word "barf":

Code:
sed  's/\//barf/g' <  data.txt
It's aaaaaaall about escaping. If you use slashes to delimit portions of commands, then you have to escape the slash, in the second command. Although, you can also use characters other than slashes as delimiters in commands, then there's no need to escape the slash for pattern matching:

sed 's{/{barf{g' < data.txt

Hope this helps.
That is way overkill, yes you could use sed to do this but its always best to use the proper tool for the job. Awk and cut can do this with half the typing and half the possibility for user error.
 
Old 03-04-2013, 05:43 PM   #6
CaptainDerp
LQ Newbie
 
Registered: Mar 2013
Posts: 12

Original Poster
Rep: Reputation: Disabled
wow

you guys are frickin amazing, this site is amazing.

THANK YOU!
 
Old 03-04-2013, 10:16 PM   #7
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,269

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
Quote:
Also, how the heck do you remove a forward or backward slash with sed??
Use another separator eg
Code:
# instead of 
s/\/this/\/that/

# use ':'
s:/this:/that:
http://www.grymoire.com/Unix/Sed.html
 
Old 03-05-2013, 08:04 AM   #8
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,516

Rep: Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896
If you had to use sed I am not sure why it has to be so hard:
Code:
sed -n 's/ .*//p' file
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] using sed to remove all characters on a line except the first sorrymouse Programming 4 10-31-2011 10:33 AM
Using sed to remove all but the last 17 characters on a line simplified Programming 5 06-04-2010 03:33 AM
Trying to use sed to remove last line if it contains a certain string. slaxative Linux - Software 1 03-18-2008 02:13 AM
[SOLVED] sed: How to remove the end of a line? angel115 Programming 2 10-01-2007 10:29 AM
Using sed in bash to remove whitespace jimieee Programming 3 01-28-2004 10:33 AM


All times are GMT -5. The time now is 05:42 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration