LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 10-31-2011, 06:01 PM   #1
sktimmy
LQ Newbie
 
Registered: Oct 2011
Posts: 1

Rep: Reputation: Disabled
replace space with newline and save result in a variable


Hi there!
My script is supposed to do the following:
1) wget google search result for $MYSEARCH
2) grep interesting parts with regexp (URIs in this case) $URIPATTERN

and i want this to happen with variables, so my hdd is not writing small data all the time (there are other parts in my script which download about 100-300 rather small files just to scan it's content. But i hope to adopt the solution to these parts by myself)

Here's what I've got:
Code:
DATAPB=`wget -O - -q -U "Mozilla" 'http://www.google.com/search?q='$MYSEARCH'&tbs=qdr:d'`
DATAPB=`echo $DATAPB | grep -o -e $URIPATTERN `
DATAPB=`echo $DATAPB | sed 's/ http/\nhttp/g'`
My problem is that the URIs in $DATAPB are seperated with one space but I want it to be divided in seperated lines. So I came up with line 3. But the result is not saved properly.

Code:
echo $DATAPB | sed 's/ http/\nhttp/g'
#Output:
http://uri1.result.com/abc
http://uri2.result.com/cde
http://uri3.result.com/fgh
works fine.
Code:
DATAPB=`echo $DATAPB | sed 's/ http/\nhttp/g'`
#Output:
echo $DATAPB
http://uri1.result.com/abc http://uri2.result.com/cde http://uri3.result.com/fgh
does nothing. URIs are still divided by spaces.

Do i need to double-escape the \n or something?

Thanks for your help.

PS:
none of the following works either:
Code:
DATAPB=$(echo $DATAPB | sed 's/ http/\nhttp/g')
DATAPB=$(echo -e $DATAPB | tr " " "\\n")
DATAPB=${DATAPB//http/\nhttp}
 
Old 11-01-2011, 03:09 AM   #2
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
What does the output of echo "$DATAPB" look like? bash will convert any consecutive whitespace characters (tabs, spaces and newlines) to a single space without the quotes
 
Old 11-01-2011, 04:37 AM   #3
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
Hi,

I ran the following command:
Code:
wget -O - -q -U "Mozilla" 'http://www.google.com/search?q='linux'&tbs=qdr:d'
Here is an excerpt of the data that I got in return:
Code:
<a href="http://www.howtoforge.com/installing-windows-software-with-wine-linux-mint-11" class=l>Installing Windows Software With Wine (<em>Linux</em> Mint 11) | HowtoForge <b>...</b></a></h3><div class="esc" id="poS12" style="display:none">You +1'd this publicly.&nbsp;<a href="#">Undo</a></div><div class="s"><span class="f std" >17 hours ago</span> - Installing Windows Software With Wine (<em>Linux</em> Mint 11)<br><span class=f><cite><span class=bc>www.howtoforge.com &rsaquo; <a href="/url?q=http://www.howtoforge.com/howtos&amp;sa=X&amp;ei=ArqvTpa7JY2KhQeEp-nXAg&amp;ved=0CFoQ6QUoADAM&amp;usg=AFQjCNFSiAx5T6Yz-VGC-q5t2g3fKcbTsA">Howtos</a> &rsaquo; <a href="/url?q=http://www.howtoforge.com/howtos/linux&amp;sa=X&amp;ei=ArqvTpa7JY2KhQeEp-nXAg&amp;ved=0CFsQ6QUoATAM&amp;usg=AFQjCNGZqNywPfdWyS1VJZe67rTKkG3lRQ">Linux</a> &rsaquo;
As you can see, the data does not look as you described it. Another problem is your 'grep'. You are using the '-o' option which means that 'grep' will only return the matching part, i.e. the result looks like this after 'grep':
Code:
http
http
http
The only unique pattern I see is '<a href=...>' to identify the results. So what you actually might want is something like:
Code:
DATAPB=$(wget -O - -q -U "Mozilla" 'http://www.google.com/search?q='linux'&tbs=qdr:d'|sed -nr 's/<a href="/\n&/gp'|grep 'http')
Notice, that in this case we pipe to grep after 'sed' is run. Normally, there is no need for such a combination. However, since 'sed' will produce additional lines it is easier to pipe to 'grep' than having 'sed' handle the newly created lines.
Also, have a look at this link here why you should avoid backticks for command substitution.

Hope this helps.
 
Old 11-01-2011, 09:58 AM   #4
huge
LQ Newbie
 
Registered: Sep 2005
Location: Inverness, Scotland
Distribution: Fedora Core
Posts: 6

Rep: Reputation: 0
You need a "here file"

Basically shell variables are not designed to take newlines.
You might want to use something like this:

DATAPB=$(wget -O - -q -U "Mozilla" 'http://www.google.com/search?q='$MYSEARCH'&tbs=qdr:d')

for x in $DATAPB
do
echo $x
done | some-command

You can define a shell function some-command and make it as complex as you like.
 
Old 11-01-2011, 11:44 AM   #5
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by huge View Post
Basically shell variables are not designed to take newlines.
AFAIK there was no such design intention. Many bash variable usages put newlines in variables including bash' own $IFS.
 
Old 11-01-2011, 11:56 AM   #6
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Quote:
Originally Posted by huge View Post
Basically shell variables are not designed to take newlines.
This is simply not true. A variable can contain anything except a null byte. What you have to worry about is that newlines (ascii character 012[octal]) are often considered special syntax by the shell and other commands. The shell processes them as either whitespace or command terminators, and other commands like grep and sed use them as input string delimiters.

Code:
$ variable='foobar
+ foobar
+ foobar'

$ echo "$variable"
foobar
foobar
foobar

$ echo $variable
foobar foobar foobar
Since I didn't quote the variable in the last command, the shell treated the newlines as whitespace and removed them during the word-splitting process. Each word was then passed as a separate argument to echo. Double-quoting preserves them as literal newlines.

See her for more on how the shell handles arguments and whitespace:
http://mywiki.wooledge.org/Arguments
http://mywiki.wooledge.org/WordSplitting
http://mywiki.wooledge.org/Quotes

To change spaces into newlines inside a variable, you can use a simple parameter expansion.

Code:
$ variable='foobar foobar foobar'

$ echo "${variable// /$'\n'}"
foobar
foobar
foobar
All space characters are simply substituted with newline characters as the variable is expanded. It uses the $'..' ansi-c quoting form, which expands certain backspace-escaped patterns like \n (newline) and \t (tab) into their literal ascii equivalents. See the QUOTING section of the bash man page.

BUT FINALLY, in cases like this, you really should be using an ARRAY to store the urls, rather than a scalar variable. Then you won't have to worry about dealing with newlines at all.

http://mywiki.wooledge.org/BashGuide/Arrays
http://mywiki.wooledge.org/BashFAQ/005

Assuming you can get your wget/grep command to output a clean list of urls, separated by whitespace of some kind (space-tab-newline), then you set them into an array like this:

Code:
urls=( $( wget.... | grep .... )  )

for i in "${urls[@]}" ; do
	mycommand "$i"
done




P.S. @huge; Please use [code][/code] tags around your code, to preserve formatting and to improve readability.

Last edited by David the H.; 11-01-2011 at 12:08 PM. Reason: modified example & formatting
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sed command to replace 7th tab with newline tonyfreeman Programming 4 03-11-2011 04:36 PM
[SOLVED] How to replace newline pattern in file by other newline pattern in a shell script XXLRay Linux - Software 9 11-29-2010 07:57 AM
remove <br> html tag and replace with newline punkska1977 Linux - General 3 12-05-2009 01:58 PM
Replace variable with user defined variable ce124 Programming 10 04-13-2007 09:29 AM
Find variable in template file replace w/date+ and save as jmanjohn61 Linux - General 14 12-13-2004 06:49 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 04:00 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration