LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-10-2015, 08:07 PM   #1
pulsar1279
LQ Newbie
 
Registered: Apr 2015
Posts: 5

Rep: Reputation: Disabled
deleting extra text within a line... (tearing out remaining hair..!)


Hi all,
Im trying to tidy up a web scrape txt file so it just shows the URL's..
As below, I've got most with just the url but struggling to work out how to delete extra txt within a line, without deleting the whole line.
i.e a command that says.. delete from this word.. till end of line .. or delete everything between word1 and word2

//newzealandtrails.com/sites/all/themes/nztrails/css/print.css?nmkrag");
//newzealandtrails.com/sites/all/themes/nztrails/css/tabs.css?nmkrag");
//newzealandtrails.com/sites/default/files/ctools
//newzealandtrails.com/sites/default/files/New20Zealand%20Trails.png" //newzealandtrails.com/sites/default/files/nztrails-logo_0_0.png" alt="New Zealand Trails" /></a></div>
//newzealandtrails.com/welcome-new-zealand-trails" st_title="" class="st_sharethis_button" displayText="sharethis"></span>
//player.vimeo.com/video/71298207" webkitallowfullscreen="" width="500px"></iframe></div>
//w.sharethis.com/button/buttons.js"></script>

Thanks in advance

Last edited by pulsar1279; 04-10-2015 at 08:23 PM.
 
Old 04-10-2015, 08:59 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 14,843

Rep: Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823
That would be (normally) sed - but it gets interesting constructing the regex if the text varies.
If you (always) want to lose all the text after the blank, use cut (or awk if there is more processing to be done).

Maybe better to start again - have a look at what "lynx -dump ..." produces; it yanks all the urls for you in a group. Minimal editting to get the lot.
 
Old 04-10-2015, 09:03 PM   #3
allend
Senior Member
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 4,433

Rep: Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353
Welcome to LQ!

This is an example where the 'sed' command can be used.
Inspecting your file suggests that you want to delete from the double quote character to the end of the line, which can be written as the regular expression ".*
Using this in combination with the substitute option to sed gives
Code:
sed 's/".*//g' <scrape.txt>
It is important to use the single quotes around the sed command options to protect them from being interpreted by the shell.
Redirect the output to a file by adding '> <output.txt>' to the above command.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tearing my hair out! WTF is php doing? wh33t Linux - Server 5 07-06-2014 07:07 AM
[SOLVED] Tearing Hair Out.. Squid on bridge not forwarding as it should, someone please help! systemlordanubis Linux - Networking 1 04-12-2011 07:10 AM
Intel HDA not working. Tearing my hair out c0mandr Linux - Hardware 4 03-22-2007 11:06 AM
Samba problems - tearing hair ! solar1951 Linux - Networking 5 08-12-2006 01:08 PM
Tearing my hair out!! kcommins Linux - Software 6 08-29-2005 03:36 PM


All times are GMT -5. The time now is 02:07 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration