Remove All Objects after the first whitespace on each line using Sed? Help Please.
I have a list of urls, and they have comments on each line, I need to clean all that junk out of the txt file so only the domains remain, so I figured the easiest way to do this is with sed/awk/grep but I really stink at using these tools right now.
For example: somesite.com blah blah/blah some blah/html anothersiter.ru dfsdf/rewer /r/wer er wr/ website.bg crap more crap /morecrap.html anotherfrickingsite.fr randomstuff Required Output: somesite.com anothersiter.ru website.bg anotherfrickingsite.fr So does anyone know who to Remove All Objects after the first whitespace on each line using Sed? Ive googled hard and only found how to remove leading and trailing whitespaces. Which is not helping Also, how the heck do you remove a forward or backward slash with sed?? sed 's/somecrap//g' crap.txt > crap.out works for objects, but not with a forward slash. |
I'm about to blow your mind....
Code:
cut -d " " -f1 Code:
awk '{print $1}' Either of those should work just fine. |
When I put your sample data in a file named data.txt, and run this command:
Code:
sed 's/\(^[^ ]\+\)\(.\)\+/\1/' < data.txt Code:
sed 's/\//barf/g' < data.txt sed 's{/{barf{g' < data.txt BTW, since this is a non-Linux forum, I don't what environment you are using. But if you are using an environment that has the info command, which is available, for example, for Cygwin running under MS-Windows, then from a Cygwin shell environment, running the command: info sed will allow you to go through the info document about sed, which is greatly more detailed than the typical manual page for sed. In particular, for what you are doing, you might focus on the section for the "s" command, the section of examples, possibly using the section of regular expressions for anything that isn't sufficiently explained in the other two sections. Yes, cut will very simply do the exact task you've mentioned cutting out everything after the first whitespace. If you find you need something more involved, you might want to use sed or awk. If you encounter a more complicated data line, where it's not as simple to isolate the URL, with sed or awk you can use more complicated patterns that should recognize a URL almost no matter where in a line it is. Hope this helps. |
Quote:
You need to escape your backslash so it is interpreted as an actual backslash. Just put a \ infront of it so it looks like \/ as opposed to just /. In bash, "\", says to interpret the character as is and ignore any special meaning. |
Quote:
|
wow
you guys are frickin amazing, this site is amazing.
THANK YOU! |
Quote:
Code:
# instead of |
If you had to use sed I am not sure why it has to be so hard:
Code:
sed -n 's/ .*//p' file |
All times are GMT -5. The time now is 02:39 AM. |