LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   replace space with newline and save result in a variable (https://www.linuxquestions.org/questions/linux-general-1/replace-space-with-newline-and-save-result-in-a-variable-911151/)

sktimmy 10-31-2011 06:01 PM

replace space with newline and save result in a variable
 
Hi there!
My script is supposed to do the following:
1) wget google search result for $MYSEARCH
2) grep interesting parts with regexp (URIs in this case) $URIPATTERN

and i want this to happen with variables, so my hdd is not writing small data all the time (there are other parts in my script which download about 100-300 rather small files just to scan it's content. But i hope to adopt the solution to these parts by myself)

Here's what I've got:
Code:

DATAPB=`wget -O - -q -U "Mozilla" 'http://www.google.com/search?q='$MYSEARCH'&tbs=qdr:d'`
DATAPB=`echo $DATAPB | grep -o -e $URIPATTERN `
DATAPB=`echo $DATAPB | sed 's/ http/\nhttp/g'`

My problem is that the URIs in $DATAPB are seperated with one space but I want it to be divided in seperated lines. So I came up with line 3. But the result is not saved properly.

Code:

echo $DATAPB | sed 's/ http/\nhttp/g'
#Output:
http://uri1.result.com/abc
http://uri2.result.com/cde
http://uri3.result.com/fgh

works fine.
Code:

DATAPB=`echo $DATAPB | sed 's/ http/\nhttp/g'`
#Output:
echo $DATAPB
http://uri1.result.com/abc http://uri2.result.com/cde http://uri3.result.com/fgh

does nothing. URIs are still divided by spaces.

Do i need to double-escape the \n or something?

Thanks for your help.

PS:
none of the following works either:
Code:

DATAPB=$(echo $DATAPB | sed 's/ http/\nhttp/g')
DATAPB=$(echo -e $DATAPB | tr " " "\\n")
DATAPB=${DATAPB//http/\nhttp}


catkin 11-01-2011 03:09 AM

What does the output of echo "$DATAPB" look like? bash will convert any consecutive whitespace characters (tabs, spaces and newlines) to a single space without the quotes

crts 11-01-2011 04:37 AM

Hi,

I ran the following command:
Code:

wget -O - -q -U "Mozilla" 'http://www.google.com/search?q='linux'&tbs=qdr:d'
Here is an excerpt of the data that I got in return:
Code:

<a href="http://www.howtoforge.com/installing-windows-software-with-wine-linux-mint-11" class=l>Installing Windows Software With Wine (<em>Linux</em> Mint 11) | HowtoForge <b>...</b></a></h3><div class="esc" id="poS12" style="display:none">You +1'd this publicly.&nbsp;<a href="#">Undo</a></div><div class="s"><span class="f std" >17 hours ago</span> - Installing Windows Software With Wine (<em>Linux</em> Mint 11)<br><span class=f><cite><span class=bc>www.howtoforge.com &rsaquo; <a href="/url?q=http://www.howtoforge.com/howtos&amp;sa=X&amp;ei=ArqvTpa7JY2KhQeEp-nXAg&amp;ved=0CFoQ6QUoADAM&amp;usg=AFQjCNFSiAx5T6Yz-VGC-q5t2g3fKcbTsA">Howtos</a> &rsaquo; <a href="/url?q=http://www.howtoforge.com/howtos/linux&amp;sa=X&amp;ei=ArqvTpa7JY2KhQeEp-nXAg&amp;ved=0CFsQ6QUoATAM&amp;usg=AFQjCNGZqNywPfdWyS1VJZe67rTKkG3lRQ">Linux</a> &rsaquo;
As you can see, the data does not look as you described it. Another problem is your 'grep'. You are using the '-o' option which means that 'grep' will only return the matching part, i.e. the result looks like this after 'grep':
Code:

http
http
http

The only unique pattern I see is '<a href=...>' to identify the results. So what you actually might want is something like:
Code:

DATAPB=$(wget -O - -q -U "Mozilla" 'http://www.google.com/search?q='linux'&tbs=qdr:d'|sed -nr 's/<a href="/\n&/gp'|grep 'http')
Notice, that in this case we pipe to grep after 'sed' is run. Normally, there is no need for such a combination. However, since 'sed' will produce additional lines it is easier to pipe to 'grep' than having 'sed' handle the newly created lines.
Also, have a look at this link here why you should avoid backticks for command substitution.

Hope this helps.

huge 11-01-2011 09:58 AM

You need a "here file"
 
Basically shell variables are not designed to take newlines.
You might want to use something like this:

DATAPB=$(wget -O - -q -U "Mozilla" 'http://www.google.com/search?q='$MYSEARCH'&tbs=qdr:d')

for x in $DATAPB
do
echo $x
done | some-command

You can define a shell function some-command and make it as complex as you like.

catkin 11-01-2011 11:44 AM

Quote:

Originally Posted by huge (Post 4513253)
Basically shell variables are not designed to take newlines.

AFAIK there was no such design intention. Many bash variable usages put newlines in variables including bash' own $IFS.

David the H. 11-01-2011 11:56 AM

Quote:

Originally Posted by huge (Post 4513253)
Basically shell variables are not designed to take newlines.

This is simply not true. A variable can contain anything except a null byte. What you have to worry about is that newlines (ascii character 012[octal]) are often considered special syntax by the shell and other commands. The shell processes them as either whitespace or command terminators, and other commands like grep and sed use them as input string delimiters.

Code:

$ variable='foobar
+ foobar
+ foobar'

$ echo "$variable"
foobar
foobar
foobar

$ echo $variable
foobar foobar foobar

Since I didn't quote the variable in the last command, the shell treated the newlines as whitespace and removed them during the word-splitting process. Each word was then passed as a separate argument to echo. Double-quoting preserves them as literal newlines.

See her for more on how the shell handles arguments and whitespace:
http://mywiki.wooledge.org/Arguments
http://mywiki.wooledge.org/WordSplitting
http://mywiki.wooledge.org/Quotes

To change spaces into newlines inside a variable, you can use a simple parameter expansion.

Code:

$ variable='foobar foobar foobar'

$ echo "${variable// /$'\n'}"
foobar
foobar
foobar

All space characters are simply substituted with newline characters as the variable is expanded. It uses the $'..' ansi-c quoting form, which expands certain backspace-escaped patterns like \n (newline) and \t (tab) into their literal ascii equivalents. See the QUOTING section of the bash man page.

BUT FINALLY, in cases like this, you really should be using an ARRAY to store the urls, rather than a scalar variable. Then you won't have to worry about dealing with newlines at all.

http://mywiki.wooledge.org/BashGuide/Arrays
http://mywiki.wooledge.org/BashFAQ/005

Assuming you can get your wget/grep command to output a clean list of urls, separated by whitespace of some kind (space-tab-newline), then you set them into an array like this:

Code:

urls=( $( wget.... | grep .... )  )

for i in "${urls[@]}" ; do
        mycommand "$i"
done





P.S. @huge; Please use [code][/code] tags around your code, to preserve formatting and to improve readability.


All times are GMT -5. The time now is 11:12 AM.