[SOLVED] echo of a string variable is on broken into two lines

amateurscripter · 05-07-2012, 02:53 PM

Hello, I have a a file delimited by pipes. What I'm doing in the script is first getting a list of all the values in the 11th column. Well, first part(LIST) gets the last field in the 11th column which is delimited by ";". The second part(LIST2), breaks it down again like the first part, by 11th column and then break it down again by ";" and then echo any matches that's associate with the first search. What I'm trying to do is not important. Just trying to figure out why does the script break "L-US LIST" into two different lines when it's one item? I'm thinking that when a variable is stored in the variable(LIST2), if there's more than one value, it's delimited by spaces. So when the for loop goes thru them it interprets L-US LIST as two entities. Is this so?

Here's a sample file:

cat testfile.txt
INDY|SBF|||||N|N|N|TIME|4:L-MAKER;L-US LIST;L-PLAIN;L-INVT|||R|0
INDY|SIRI|||||N|N|N|TIME||||C|0
INDY|SORF|||||N|N|N|TIME|4:L-US LIST;L-PLAIN;L-INVS|||C|0
INDY|STANDARD|||||N|N|N|TIME|4:L-MAKER;L-US LIST;L-PLAIN;L-INVD|||A|0
INDY|SWIB|||||N|N|N|TIME||||F|0
INDY|SWIS|||||N|N|N|TIME||||A|0
INDY|TRUS|||||N|N|N|TIME|4:L-MAKER;L-PLAIN;L-INVD|||K|0
INDY|TUDOR|||||N|N|N|TIME||||C|0
INDY|TURNER|||||N|N|N|TIME|4:L-MAKER;L-US LIST;L-PLAIN;L-INVK|||A|0
INDY|UST|||||N|N|N|TIME|4:L-MAKER;L-PLAIN;L-INVK|||D|0

And here's the script:

cat probscript.sh
#!/bin/bash

sesID="INDY"

DIR="/home/testing/files"
LIST="`grep $sesID $DIR/testfile.txt|awk -F'|' '{print $11}'|cut -d':' -f2|awk -F';' '{print $NF}'`"

for sess in $LIST
do
LIST2="`grep $sesID $DIR/testfile.txt|grep -w "$sess"|awk -F'|' '{print $11}'|cut -d':' -f2|tr ";" "\n"`"

for sess2 in $LIST2

do
if [ $sess != $sess2 ]
then
#print out the sessions
echo "$sess $sess2"
fi
done

done

And here's part of the output, notice that L-US LIST is broken into two:
./probscript.sh
L-INVT L-MAKER
L-INVT L-US
L-INVT LIST
L-INVT L-PLAIN
L-INVS L-US
L-INVS LIST
L-INVS L-PLAIN
L-INVD L-MAKER

Can anybody tell me how to fix this in the variable(LIST2)? Thx for checking.

Tinkster · 05-07-2012, 03:11 PM

Can you please slap code tags around the script to make it readable?

That said: you're relying on ':' for a cut, but 4 lines of your input data don't have one.

amateurscripter · 05-07-2012, 03:38 PM

Thx. If there's no semi colon(

it would just skip it and not return anything. As you can see it works fine for all entries that doesn't have a space in the searched string. Do you know if a variable stores string delimited by a space between each string or if for loop sees "L-US LIST" it breaks it into two strings rathan than one?

suicidaleggroll · 05-07-2012, 03:50 PM

"for" will loop over spaces. You can quote your variables to avoid this, or temporarily change your IFS variable

Tinkster · 05-07-2012, 04:04 PM

Quote:

Originally Posted by amateurscripter

Thx. If there's no semi colon(

it would just skip it and not return anything. As you can see it works fine for all entries that doesn't have a space in the searched string. Do you know if a variable stores string delimited by a space between each string or if for loop sees "L-US LIST" it breaks it into two strings rathan than one?

Code:

cat list.awk
BEGIN{
  FS="|"
}
sess && /:/ {
  a=gensub(/^[0-9]+:([^;]+).*/, "\\1","1",$11)
  b=gensub(/^[0-9]+:.*;([^;]+)$/, "\\1","1",$11)
  print b"\t"a
}

Code:

awk --re-interval -v sess=INDY -f list.awk testfile.txt
L-INVT	L-MAKER
L-INVS	L-US LIST
L-INVD	L-MAKER
L-INVD	L-MAKER
L-INVK	L-MAKER
L-INVK	L-MAKER

EDIT: Hmmm ... not quite like your desired result ... can you try
and describe what you're doing in English?

Cheers,
Tink

amateurscripter · 08-22-2012, 11:44 AM

OK, I'll provide a file and the script and maybe you can tell me what i'm doing wrong.

Let me explain what it does. It looks at the file, first it extracts the first column that's delimited by pipes(|). Then it goes back into the file and extracts column 2(LIST). Then it uses column 2 values and extracts everything in column 24(LIST2). Then it compares the value in column 2(w/o the "L") to the values in LIST2. If they don't match, it echoes the session, LIST2 value, and then the LIST value(column 2).

Now everyting works well until there's a space between the LIST2 values, ie: line 4(L-TEMP LIS) and line 8(L-MY LIST). For some reason, it's broken into two as you can see from the output. I tried no quotes, double, single quotes but still can't get it stop seperating those two values. Any suggestions? It's driving me nuts, can't figure out why it's treating it as two values.

FILE:
$more /tmp/listfile.txt
SESSION_A|BZ|||||Z|N|Z|PAMM||Z||||Z||Z|0|Z|0|Z|N|2:L-EUROPE;L-BZ|||A|
SESSION_A|BRN|||||Z|N|Z|JAME||Z||||Z||Z|0|Z|0|Z|N|2:L-GPT;L-BRN|||A
SESSION_A|CAA|||||Z|N|Z|CARR||Z||||Z||Z|0|Z|0|Z|N|2:L-MASTER;L-CAA|||A
SESSION_A|NTR|||||Z|N|Z|SRVT||Z||||Z||Z|0|Z|0|Z|N|3:L-MASTER;L-TEMP LIS;L-NTR|||A
SESSION_B|DIA|||||Z|N|Z|DAVV||Z||||Z||Z|0|Z|0|Z|N|2:L-MASTER;L-DIA|||A
SESSION_B|EEX|||||Z|N|Z|RBB||Z||||Z||Z|0|Z|0|Z|N|2:L-MASTER;L-EEX|||A
SESSION_B|FRI|||||Z|N|Z|ABC||Z||||Z||Z|0|Z|0|Z|N|3:L-MASTER;L-VANILLA;L-FRI|||A
SESSION_B|VAR|||||Z|N|Z|TIM||Z||||Z||Z|0|Z|0|Z|N|3:L-MASTER;L-MY LIST;L-VAR|||A
SESSION_B|DEER|||||Z|N|Z|YEE||Z||||Z||Z|0|Z|0|Z|N|2:L-MASTER;L-DEER|||A
$

The script:
$ cat spacetest.sh
#!/bin/bash

DIR="/tmp"

sesIDs="`cat $DIR/listfile.txt |grep SESSION| awk -F'|' '{print $1}'|grep -v "#"|sort -u`"

for sessvalues in $sesIDs
do
LIST="`grep $sessvalues $DIR/listfile.txt |awk -F'|' '{print $2}'|grep -v "#"|sort -u`"

for alllistvalues in $LIST
do
LIST2="`grep $sessvalues $DIR/listfile.txt|grep -w "$alllistvalues"|awk -F'|' '{print $24}'|cut -d':' -f2|tr ";" "\n"`"

for listValue in $LIST2
do
NO_L="`echo $listValue |cut -d'-' -f2`"
if [ "$alllistvalues" != "$NO_L" ]
then
echo "$sessvalues $alllistvalues $listValue"
fi

done
done
done

$

The OUTPUT:

$ ./spacetest.sh
SESSION_A BRN L-GPT
SESSION_A BZ L-EUROPE
SESSION_A CAA L-MASTER
SESSION_A NTR L-MASTER
SESSION_A NTR L-TEMP
SESSION_A NTR LIS
SESSION_B DEER L-MASTER
SESSION_B DIA L-MASTER
SESSION_B EEX L-MASTER
SESSION_B FRI L-MASTER
SESSION_B FRI L-VANILLA
SESSION_B VAR L-MASTER
SESSION_B VAR L-MY
SESSION_B VAR LIST$

David the H. · 08-22-2012, 10:59 PM

Did you actually try to read the responses so far?

First, you were asked to please use ***[code][/code] tags*** around your code and data, to preserve formatting and to improve readability. Please do not use quote tags, bolding, colors, or other fancy formatting.

Second, you were advised about the Useless Use Of Cat, not to mention useless use of grep too. You are using an unnecessary chain of individual commands when a single awk script can do most of the work at once.

Now, I haven't looked deeply at all of your code yet, but I believe your big problem comes from the way you are storing the lists of values and looping through them:

Code:

LIST2="`grep $sessvalues $DIR/listfile.txt|grep -w "$alllistvalues"|awk -F'|' '{print $24}'|cut -d':' -f2|tr ";" "\n"`"

for listValue in $LIST2
do

The first line is storing the whole list in a single, scalar variable, which means that you have to have some way to break them up again in the loop. By not quoting the variable in the loop, you are relying on the shell's word-splitting to do the job. However the shell splits on whitespace by default.

That's why this page exists:
Don't Read Lines With For

Your best choice is to process the output in a while+read loop, instead of for. Another option would be to store the values in an array first (hint: you can use the new mapfile built-in for this), which can then be easily processed by for.

But in any case, it looks like you're going about it the hard way. This whole thing would be better served entirely in awk. It looks to me like you could just do something similar to this:

Code:

awk -F '[|:;]' '/[^#]/ { print $1 , $2 , $25 }' inputfile

If there are some variations involved that this can't handle, then give them to us and we'll see if we can work around them.

=====
Edit: I'm trying to read the description more carefully and work out an exact solution. It's rather confusing. Give me some time.

Edit2: Actually, it's still completely unclear to me what the desired output should be. Please post exactly how you want the output to look, and explain what each output field needs to be and what its relationship is to the rest of the input.
=====

And here are a few useful awk references:
http://www.grymoire.com/Unix/Awk.html
http://www.gnu.org/software/gawk/man...ode/index.html
http://www.pement.org/awk/awk1line.txt
http://www.catonmat.net/blog/awk-one...ined-part-one/

Finally, $(..) is highly recommended over `..`

David the H. · 08-23-2012, 02:10 AM

Ok, I think I've finally figured out what you are trying to do, and I have two solutions for you, in bash and awk. As I suspected, you were making it much too complicated. All that's required is a single loop.

First the bash solution:

Code:

#!/bin/bash

file="inputfile"

#loop over every line of the file, reading the fields into an array.
#(remember that bash arrays index from 0)
while IFS='|' read -a line; do

	#skip lines with # in them.
	[[ "${line[*]}" == *#* ]] && continue

	#split field 23 into an array of its own, removing the part before the colon first.
	IFS=';' read -a fields <<<"${line[23]#*:}"

	#loop through the "fields" array.
	for i in "${fields[@]}"; do

		#if the entry doesn't match field 1 (minus L-), then print the output line.
		if [[ ${line[1]} != ${i#L-} ]]; then

			echo "${line[0]} ${line[1]} $i"

		fi

	done

# input is from the file, output of the loop passes through sort
done <"$file" | sort

exit 0

And here's the awk script:

Code:

#!/usr/bin/awk -f

BEGIN{ FS="|" }					#set field separator

! /#/{						#ignores lines with # in them

sub( /^[^:]+:/ , "" , $24 )			#remove the "n:" part from field 24
split( $24 , a , ";" )				#split field 24 into an array

for ( i in a )					#loop through field 24
	{

	if ( "L-"$2 != a[i] )			#if the field doesn't match $2, print
		{ print $1 , $2 , a[i] }	
	
	}
}

The awk solution will likely be much faster, particularly on large files, but as it stands it does not sort the output. You'll have to run it through sort as you execute it.

Code:

/path/to/awkscript inputfile | sort

The script could be modified to do the sorting itself, but it would increase the complexity. It's easier this way overall.

Both of the above produce this output, given the input data in post #6.

Code:

SESSION_A BRN L-GPT
SESSION_A BZ L-EUROPE
SESSION_A CAA L-MASTER
SESSION_A NTR L-MASTER
SESSION_A NTR L-TEMP LIS
SESSION_B DEER L-MASTER
SESSION_B DIA L-MASTER
SESSION_B EEX L-MASTER
SESSION_B FRI L-MASTER
SESSION_B FRI L-VANILLA
SESSION_B VAR L-MASTER
SESSION_B VAR L-MY LIST

amateurscripter · 08-23-2012, 03:33 PM

Thanks David. Actually, I did figure out it was b/c of space delimitation later on yesterday when I echoed LIMIT2. But thank you for explaining it very clearly. I just checked online(for a while actually), but couldn't find any reading on your while loop syntax. I would love to understand what they mean so that I can use it later on. Do you know where I can read up on the follow:

[[ "${line[*]}" == *#* ]] && continue
#I guess the astrists around the pound means NOT. Where can i read up on that?

if [[ ${line[1]} != ${i#L-} ]];
#Here too, the pound is removing L-. I would like to understand how/why.

while IFS='|' read -a line;
#now how would I know this means to read the contents of file defined above? There's no mention of file anywere in the loop. I understand this is going to be an array based on file's content.

Again, thx a lot for this.

David the H. · 08-23-2012, 05:53 PM

Those tests are using a combination of parameter substitution and globbing.

The first one compares the entire array, as a single string, to the globbing pattern "*#*" ("*" means zero or more of any character), and the second one removes the string "L-" from the front of the $i value before comparing it to the value of array entry 1.

Note that globbing requires the improved [[..]] test that's only available in advanced shells like bash and ksh.

As for the while loop, see here:
http://mywiki.wooledge.org/BashFAQ/001

The file input is at the end of the loop, after the done keyword. You also find the loop output being piped into sort there.

Here are a few useful bash scripting references for you. I particularly recommend reading through the first ones:

http://mywiki.wooledge.org/BashGuide
http://mywiki.wooledge.org/BashFAQ
http://mywiki.wooledge.org/BashPitfalls
http://wiki.bash-hackers.org/scripting/newbie_traps
http://www.linuxcommand.org/index.php
http://tldp.org/LDP/Bash-Beginners-G...tml/index.html
http://www.tldp.org/LDP/abs/html/index.html
http://www.gnu.org/software/bash/manual/bashref.html
http://wiki.bash-hackers.org/start
http://ss64.com/bash/

(And how many times do we have to mention [code][/code] tags?)

amateurscripter · 08-24-2012, 03:07 PM

Thx again David, I'll read all of those links, appreciate it.