[SOLVED] grep won't return a number at end of line

jgrizich · 10-13-2012, 07:06 PM

I'm sure this is probably trivial, but it's driving me crazy. It is an exercise for a Linux installation/administration class using Fedora 14. The object is to find lines of text ending in 700 from a text file named datebook. All the lines contain text and numbers ending in a 5 digit number. For this exercise the entry "grep '700$' datebook" returns nothing. I've tried this on a number of text files with lines ending in a character and it works fine. I've also captured 15 lines of lottery numbers and grep will not return any output for an end of line search either. Can anyone give me a clue?

suicidaleggroll · 10-13-2012, 07:09 PM

Your command works fine for me:

Code:

> cat datebook
dsaf
dsag
fdsagfdsagtret
sdff700
sdflsd4
dsagfdf89efsdf
sfdsf0s0sa
sdfds801
sdfds700
sdfldsihflds811

> grep '700$' datebook
sdff700
sdfds700

Which leads me to believe it might be improper line terminators causing the problem. What is the output of

Code:

file datebook

?

jgrizich · 10-13-2012, 09:42 PM

Thanks for the quick response. file datebook returns: datebook: ASCII text, with CRLF line terminators the test file I downloaded that is numbers only returns test1: ASCII English text. It doesn't return the end of line numbers either. I would expect grep to find no problem with the datebook file. I'm still confused. Here are a few lines copied from datebook:

Fred Fardbarkle:674-843-1385:20 Parak Lane, DeLuth, MN 23850:4/12/23:780900
Lori Gortz:327-832-5728:3465 Mirlo Street, Peabody, MA 34756:10/2/65:35200
Paco Gutierrez:835-365-1284:454 Easy Street, Decatur, IL 75732:2/28/53:123500
Ephram Hardy:293-259-5395:235 CarltonLane, Joliet, IL 73858:8/12/20:56700
James Ikeda:834-938-8376:23445 Aster Ave., Allentown, NJ 83745:12/1/38:45000
Barbara Kertz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/1/46:268500
Lesley Kirstin:408-456-1234:4 Harvard Square, Boston, MA 02133:4/22/62:52600

suicidaleggroll · 10-13-2012, 09:47 PM

Quote:

Originally Posted by jgrizich

file datebook returns: datebook: ASCII text, with CRLF line terminators

It has DOS-style line terminators which grep doesn't interpret properly.

Run "dos2unix datebook", and then retry your grep.

jgrizich · 10-13-2012, 10:06 PM

Success!! So the instructor probably built the file in a windows environment and copied it into the filesystem. At my level the distinction flew right over my head. My reaction would be to think that plain text isn't platform dependant. Obviously that's not correct. Thank you for the lesson.

grail · 10-14-2012, 04:34 AM

Please mark as SOLVED once you have a working solution

David the H. · 10-14-2012, 09:34 AM

Quote:

Originally Posted by jgrizich

My reaction would be to think that plain text isn't platform dependant. Obviously that's not correct. Thank you for the lesson.

A "text" file is really just a series of bytes that the system is told to interpret as text characters according to some character encoding scheme, such as ascii, or utf-8 (The most common unicode encoding, and the Linux standard, ascii is also valid utf-8), or cp-1252 (the standard pre-unicode Windows encoding for English).

If you try to read a file created in one encoding with a program set for a different encoding, you tend to get "mojibake", garbled characters. The "interpretation" ends up wrong.

But the problem here is different. Even when the correct encoding is used, dos/windows and unix have traditionally inserted different characters to indicate the end of a "line" of text, in human terms. Unix uses the ascii LINE FEED character (LF, octal 012, often indicated by the backslash symbol '\n'). Dos uses the combination CARRIAGE RETURN+LINE FEED (CRLF, octal 015+012, '\r\n').

Therefore when a unix program reads a dos file, it ends up with an extra invisible CR at the end of each line, and when a dos program reads a unix file, it sees it as having no line endings at all, but with many non-printing LF characters interspersed in the file.

Incidentally, pre-OSX Apple systems used CR only for their line endings, meaning that both dos and unix would see a file as having a single continuous line, and invisible CR characters sprinkled throughout it. But with OSX they've switched to using the unix-style line format.

So bottom line; first make sure you're using the correct encoding, and if the file could have come from a different system, check that you have the correct line endings.

Habitual · 10-15-2012, 12:20 PM

Quote:

Originally Posted by jgrizich

...the instructor probably built the file in a windows environment ....

There may be "extra points" for pointing that out to the instructor. (quietly and non-publicly).