LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-25-2014, 06:38 PM   #1
Lucien Lachance
Member
 
Registered: May 2013
Posts: 82

Rep: Reputation: Disabled
Removing Unwanted Scraped Text


How can I match this string? I would like to remove the first three words with the pipe characters next to each other i.e., "Cookbook | Ingredients | Recipes".

"Cookbook | Ingredients | Recipes Ingredients"


So far I have:

Code:
line.text.gsub(/\t|\[edit\]/, '') # removes tabs and edit hyperlink tags

Last edited by Lucien Lachance; 08-26-2014 at 01:09 PM.
 
Old 08-26-2014, 10:53 AM   #2
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,724

Rep: Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705
Hi,

grep can be used for string matching. If that answer is not what you are looking for, then perhaps you could provide a little more information about what you are trying to do.

Evo2.
 
Old 08-26-2014, 11:21 AM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by evo2 View Post
grep can be used for string matching. If that answer is not what you are looking for, then perhaps you could provide a little more information about what you are trying to do.
OP had a detailed example. I was prepared to respond with a grep -v one-liner but then he edited his post to something different.

Daniel B. Martin
 
Old 08-26-2014, 01:04 PM   #4
Lucien Lachance
Member
 
Registered: May 2013
Posts: 82

Original Poster
Rep: Reputation: Disabled
I apologize, the tag should've been ruby. I can't use grep, I can only use a regex pattern in the gsub method. I just don't know how to begin targeting that text. I thought about targeting any word followed by a pipe symbol like so:
Code:
 line.text.gsub(/\t|\[edit\]|w+.\|/, '')
but there might be a better way of grabbing this.
 
Old 08-26-2014, 01:39 PM   #5
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,724

Rep: Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705
Hi,

why do you need to use a regex? What is wrong with using standard string substitution? Eg
Code:
line.text.sub('Cookbook | Ingredients | Recipes Ingredients','')
Evo2.
 
Old 08-26-2014, 03:01 PM   #6
Lucien Lachance
Member
 
Registered: May 2013
Posts: 82

Original Poster
Rep: Reputation: Disabled
Nothing, I guess I was hoping there was a way to keep it short because I may have to add in additional substitutions to making the patten quite long. I guess I overthought this...
 
Old 08-27-2014, 07:36 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Alone I am not sure I see the relevance of the question? Either evo2's suggestion or simply sub(/.*/,'') ... as you can see, without any context most solutions will work.
 
  


Reply

Tags
jquery, ruby



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Removing unwanted locales Stable61 *BSD 0 09-17-2006 04:15 AM
removing unwanted apps epona Linux - Newbie 1 01-23-2005 03:08 PM
Removing Unwanted Sound Driver toadatrix SUSE / openSUSE 0 10-15-2004 07:51 AM
Removing unwanted processes in Slackware Streams Slackware 2 09-13-2003 10:59 PM
Removing unwanted RH 7.2 packages!! CyberDrake Linux - Distributions 1 02-08-2002 05:33 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:18 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration