Review your favorite Linux distribution.
Go Back > Forums > Linux Forums > Linux - Newbie
User Name
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!


  Search this Thread
Old 04-10-2015, 08:07 PM   #1
LQ Newbie
Registered: Apr 2015
Posts: 5

Rep: Reputation: Disabled
deleting extra text within a line... (tearing out remaining hair..!)

Hi all,
Im trying to tidy up a web scrape txt file so it just shows the URL's..
As below, I've got most with just the url but struggling to work out how to delete extra txt within a line, without deleting the whole line.
i.e a command that says.. delete from this word.. till end of line .. or delete everything between word1 and word2

//" //" alt="New Zealand Trails" /></a></div>
//" st_title="" class="st_sharethis_button" displayText="sharethis"></span>
//" webkitallowfullscreen="" width="500px"></iframe></div>

Thanks in advance

Last edited by pulsar1279; 04-10-2015 at 08:23 PM.
Old 04-10-2015, 08:59 PM   #2
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,988

Rep: Reputation: 2217Reputation: 2217Reputation: 2217Reputation: 2217Reputation: 2217Reputation: 2217Reputation: 2217Reputation: 2217Reputation: 2217Reputation: 2217Reputation: 2217
That would be (normally) sed - but it gets interesting constructing the regex if the text varies.
If you (always) want to lose all the text after the blank, use cut (or awk if there is more processing to be done).

Maybe better to start again - have a look at what "lynx -dump ..." produces; it yanks all the urls for you in a group. Minimal editting to get the lot.
Old 04-10-2015, 09:03 PM   #3
Senior Member
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 4,680

Rep: Reputation: 1566Reputation: 1566Reputation: 1566Reputation: 1566Reputation: 1566Reputation: 1566Reputation: 1566Reputation: 1566Reputation: 1566Reputation: 1566Reputation: 1566
Welcome to LQ!

This is an example where the 'sed' command can be used.
Inspecting your file suggests that you want to delete from the double quote character to the end of the line, which can be written as the regular expression ".*
Using this in combination with the substitute option to sed gives
sed 's/".*//g' <scrape.txt>
It is important to use the single quotes around the sed command options to protect them from being interpreted by the shell.
Redirect the output to a file by adding '> <output.txt>' to the above command.


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Tearing my hair out! WTF is php doing? wh33t Linux - Server 5 07-06-2014 07:07 AM
[SOLVED] Tearing Hair Out.. Squid on bridge not forwarding as it should, someone please help! systemlordanubis Linux - Networking 1 04-12-2011 07:10 AM
Intel HDA not working. Tearing my hair out c0mandr Linux - Hardware 4 03-22-2007 11:06 AM
Samba problems - tearing hair ! solar1951 Linux - Networking 5 08-12-2006 01:08 PM
Tearing my hair out!! kcommins Linux - Software 6 08-29-2005 03:36 PM > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 12:23 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration