LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Remove All Objects after the first whitespace on each line using Sed? Help Please. (http://www.linuxquestions.org/questions/programming-9/remove-all-objects-after-the-first-whitespace-on-each-line-using-sed-help-please-4175452699/)

CaptainDerp 03-04-2013 05:01 PM

Remove All Objects after the first whitespace on each line using Sed? Help Please.
 
I have a list of urls, and they have comments on each line, I need to clean all that junk out of the txt file so only the domains remain, so I figured the easiest way to do this is with sed/awk/grep but I really stink at using these tools right now.

For example:

somesite.com blah blah/blah some blah/html
anothersiter.ru dfsdf/rewer /r/wer er wr/
website.bg crap more crap /morecrap.html
anotherfrickingsite.fr randomstuff

Required Output:

somesite.com
anothersiter.ru
website.bg
anotherfrickingsite.fr


So does anyone know who to Remove All Objects after the first whitespace on each line using Sed?

Ive googled hard and only found how to remove leading and trailing whitespaces. Which is not helping


Also, how the heck do you remove a forward or backward slash with sed??

sed 's/somecrap//g' crap.txt > crap.out works for objects, but not with a forward slash.

Kustom42 03-04-2013 05:36 PM

I'm about to blow your mind....

Code:

cut -d " " -f1
or if you want to use awk

Code:

awk '{print $1}'

Either of those should work just fine.

rigor 03-04-2013 05:37 PM

When I put your sample data in a file named data.txt, and run this command:

Code:

sed  's/\(^[^ ]\+\)\(.\)\+/\1/'  <  data.txt
I get your sample output. The following is a way to replace all forward slashes in a line, with the word "barf":

Code:

sed  's/\//barf/g' <  data.txt
It's aaaaaaall about escaping. If you use slashes to delimit portions of commands, then you have to escape the slash, in the second command. Although, you can also use characters other than slashes as delimiters in commands, then there's no need to escape the slash for pattern matching:

sed 's{/{barf{g' < data.txt

BTW, since this is a non-Linux forum, I don't what environment you are using. But if you are using an environment that has the info command, which is available, for example, for Cygwin running under MS-Windows, then from a Cygwin shell environment, running the command:

info sed

will allow you to go through the info document about sed, which is greatly more detailed than the typical manual page for sed. In particular, for what you are doing, you might focus on the section for the "s" command, the section of examples, possibly using the section of regular expressions for anything that isn't sufficiently explained in the other two sections.

Yes, cut will very simply do the exact task you've mentioned cutting out everything after the first whitespace. If you find you need something more involved, you might want to use sed or awk.

If you encounter a more complicated data line, where it's not as simple to isolate the URL, with sed or awk you can use more complicated patterns that should recognize a URL almost no matter where in a line it is.

Hope this helps.

Kustom42 03-04-2013 05:38 PM

Quote:

Originally Posted by CaptainDerp (Post 4904649)

Also, how the heck do you remove a forward or backward slash with sed??

sed 's/somecrap//g' crap.txt > crap.out works for objects, but not with a forward slash.



You need to escape your backslash so it is interpreted as an actual backslash.

Just put a \ infront of it so it looks like \/ as opposed to just /. In bash, "\", says to interpret the character as is and ignore any special meaning.

Kustom42 03-04-2013 05:40 PM

Quote:

Originally Posted by rigor (Post 4904670)
When I put your sample data in a file named data.txt, and run this command:

Code:

sed  's/\(^[^ ]\+\)\(.\)\+/\1/'  <  data.txt
I get your sample output. The following is a way to replace all forward slashes in a line, with the word "barf":

Code:

sed  's/\//barf/g' <  data.txt
It's aaaaaaall about escaping. If you use slashes to delimit portions of commands, then you have to escape the slash, in the second command. Although, you can also use characters other than slashes as delimiters in commands, then there's no need to escape the slash for pattern matching:

sed 's{/{barf{g' < data.txt

Hope this helps.

That is way overkill, yes you could use sed to do this but its always best to use the proper tool for the job. Awk and cut can do this with half the typing and half the possibility for user error.

CaptainDerp 03-04-2013 05:43 PM

wow
 
you guys are frickin amazing, this site is amazing.

THANK YOU!

chrism01 03-04-2013 10:16 PM

Quote:

Also, how the heck do you remove a forward or backward slash with sed??
Use another separator eg
Code:

# instead of
s/\/this/\/that/

# use ':'
s:/this:/that:

http://www.grymoire.com/Unix/Sed.html

grail 03-05-2013 08:04 AM

If you had to use sed I am not sure why it has to be so hard:
Code:

sed -n 's/ .*//p' file


All times are GMT -5. The time now is 04:56 PM.