[SOLVED] Bash newbie

CodyK479 · 03-26-2011, 10:49 AM

Using awk I pull the first field of a random line from my datafile.

myvar1=`awk -F"\t" 'NR=='$randline' {printf "%s\n", $1}' myfile

This works fine. The problem is there will be empty lines at the end of the file. Rather than using awk
to filter out blank lines I would like to figure this out first.

So I test $myvar1 for a blank string after setting $randline to one that I know is blank:

test -z "$myvar1" && echo "true" || echo "false"

But, this returns "false"? So the string is not zero length. Why? It's a tab-separated file. Is awk storing the tab with the $1 field or something.

This is where I get headache. I try to echo my variable to see what it looks like.

echo "$myvar1"
outputs: nothing
echo "My variable is [$myvar1]"
outputs: [y variable is [

Why is the closing bracket at the beginning? What character could be stored in $myvar1 that would do such a thing and how did it get there?

kurumi · 03-26-2011, 10:54 AM

to filter out empty lines, use NF

Code:

awk -F"\t" 'NR=='$randline' && NF {printf "%s\n", $1}' myfile

CodyK479 · 03-26-2011, 11:41 AM

Thank you. But I would still like to understand why my strings are behaving this way.
Why wouldn't my variable be a zero-length string if it was pulled from a blank line?

Besides, I'm not sure that will work since $randline might be a blank line. Wouldn't I be filtering out the line I told Awk to print?

If it helps any, If the following is correct, then awk does not see any blank lines.
The following output 134, when there are really 132 non-blank lines. Well, they certainly seem blank 2 blank lines to me.

awk 'NF!=0 {print}' myfile | wc -l

CodyK479 · 03-26-2011, 11:52 AM

Actually, doing this outputs 132. So my two blank lines are considered to have 1 field each. Is that normal for a blank line? If so I don't like it. And is still doesn't explain the behavior when I try to echo it.

awk 'NF!=1 {print}' myfile | wc -l

grail · 03-26-2011, 12:16 PM

My stab in the dark here, without the original file, is that it is written and saved on a Windows machine and the unusual result of placing the square bracket at
the front is a direct result. Try running dos2unix or some such to make sure the file is clean and try your scripts again.

CodyK479 · 03-26-2011, 12:58 PM

Quote:

Originally Posted by grail

My stab in the dark here, without the original file, is that it is written and saved on a Windows machine

That's it. Thanks. It is created in Windows. I was afraid of that. It is uppose to be a standard ASCII text file.

Are there no bash native commands that could work?

David the H. · 03-27-2011, 06:55 AM

It probably is a standard ascii file. It's just that DOS/Windows and Unix use different line ending characters. Unix text files use the ascii LineFeed only, while Dos files use CarriageReturn+LineFeed. So you have to remove that extra CR before most *nix tools will work reliably.

There are also other line ending systems, such as the old Apple type, which is CR only.

http://en.wikipedia.org/wiki/Newline

carltm · 03-27-2011, 07:18 AM

Quote:

Originally Posted by CodyK479

Are there no bash native commands that could work?

Yes, the problem is that you're not seeing the non-printing characters.
Run "cat -vt myfile" and you'll see that there are ctrl-M characters on
the blank lines.

As mentioned this is normal for a DOS file, and running dos2unix on the
file will fix it. If you don't want to touch the original file, you
can pipe the output through "sed 's/^M$//'" to your command. The ^M is
created by typing ctrl-V and ctrl-M. It is not a caret and a capital m.