LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-04-2017, 09:46 AM   #1
andrew.comly
Member
 
Registered: Dec 2012
Distribution: Trisquel-Mini 7.0, Lubuntu 14.04, Debian lxde 8.0
Posts: 311
Blog Entries: 2

Rep: Reputation: 16
Question Advanced Bash Scripting, Ch 16 ex 23


TITLE
ABS: How to create a unix-formatted text file/CR-LF missing from unix-formatted text files. How to view code and check if every line ends in LF for Unix; CR-LF for DOS?

INTRO
I am reading Advanced Bash Scripting by Mendell Cooper. In this eBook there is a Chapter 16 External Filters, Programs and Commands. Within this chapter there is Example 16-23. du: DOS to UNIX text file conversion.

In this example the text gives a breifing on how lines end differently in DOS and UNIX-format text files:
Quote:
CR='\015' # Carriage return. CR-LF
# 015 is octal ASCII code for CR. CR-LF
# Lines in a DOS text file end in CR-LF. CR-LF
# Lines in a UNIX text file end in LF only. CR-LF
I find this quite strange, since in this ascii table
Code:
$ wget -c --no-check-certificate https://www.co.tt/files/asciitable.pdf
$ clamscan asciitable.pdf 
asciitable.pdf: OK
----------- SCAN SUMMARY -----------
Known viruses: 6299214
Engine version: 0.99.2
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.24 MB
Data read: 0.17 MB (ratio 1.44:1)
Time: 87.444 sec (1 m 27 s)
it gives the following Numeric Values:
  1. Line feed
    Decimal: 10
    Hex: 0A
  2. Carriage return
    Decimal: 13
    Hex: 0D

Also wikipedia agreed, thus making it seem that Advanced Bash Scripting erred by stating that "CR='\015'".

However, I then ran into a table on bluesock.org and I read the table under the heading common ascii codes to know, where I realized that ABS gave the value of CR in oct, but everywhere else I saw gave the value in Decimal, and that even though the values of Dec, Hex are usually used the value of Oct also exists sometimes as well.

Revision Suggestion01
I really think ABS could've first introduced the concept of dec/hex/oct to us.

Next I attempted to solve example 23. First, since I am on a linux platform, I'd imagine all I can do is create a unix format text file.
Attempt01
  1. touch textfile.txt
  2. I copied some text from a news article into that newly created file, as below:

Quote:
Gore Is Bigger Than Ever!
By Ben Smith
A crowd of nearly 500 in the Library Theater in Park City, Utah, stayed on through a standing ovation and into the question-and-answer session as Al Gore—one Kentucky reporter actually addressed him as “Mr. President,” to laughs and cheers—restated his warnings about the “planetary emergency,” global warming.
Mr. Gore is the star of a documentary entered in the Sundance Film Festival, An Inconvenient Truth, and all the questions were for him. The director, Davis Guggenheim, stood quietly to his left; a bit farther away was the woman who made the film happen, Hollywood Democratic power player Laurie David, the wife of comedian Larry David.
I then convert it with example 23 from a unix-format text file to a dos-format text file:
Code:
tr -d "CR-LF" < $1 > $NEWFILENAME
Only to find the exact same text above, and no lines end with "CR-LF". Was this the right way of doing it?

I then open this file up via vim :%!xxd to view the hex, and I see the following data:
Code:
	1	0000000: 5361 6d70 6c65 2d20 5365 6c66 2d6d 6164  Sample- Self-mad
  2 0000010: 6520 444f 532d 666f 726d 6174 2074 6578  e DOS-format tex
  3 0000020: 7420 6669 6c65 2e09 0909 0909 0909 0909  t file..........
  4 0000030: 0909 0909 0943 522d 4c46 0a09 0909 0909  .....CR-LF......
  5 0000040: 0909 0909 0909 0909 0909 0909 0909 0909  ................
  6 0000050: 0909 0909 0909 0909 0909 0909 4352 2d4c  ............CR-L
  7 0000060: 460a 4352 3d27 5c30 3135 2709 0923 2043  F.CR='\015'..# C
  8 0000070: 6172 7269 6167 6520 7265 7475 726e 2e09  arriage return..
  9 0000080: 0909 0909 0909 0909 0909 0909 0909 0909  ................
 10 0000090: 0943 522d 4c46 0a09 0909 0909 0923 2030  .CR-LF.......# 0
 11 00000a0: 3135 2069 7320 6f63 7461 6c20 4153 4349  15 is octal ASCI
 12 00000b0: 4920 636f 6465 2066 6f72 2043 522e 0909  I code for CR...
 13 00000c0: 0909 0909 0909 0909 0943 522d 4c46 0a09  .........CR-LF..
 14 00000d0: 0909 0909 0923 204c 696e 6573 2069 6e20  .....# Lines in
 15 00000e0: 6120 444f 5320 7465 7874 2066 696c 6520  a DOS text file
 16 00000f0: 656e 6420 696e 2043 522d 4c46 2e09 0909  end in CR-LF....
 17 0000100: 0909 0909 4352 2d4c 460a 0909 0909 0909  ....CR-LF.......
 18 0000110: 2320 4c69 6e65 7320 696e 2061 2055 4e49  # Lines in a UNI
 19 0000120: 5820 7465 7874 2066 696c 6520 656e 6420  X text file end
 20 0000130: 696e 204c 4620 6f6e 6c79 2e09 0909 0909  in LF only......
 21 0000140: 0943 522d 4c46 0a09 0909 0909 0909 0909  .CR-LF..........
 22 0000150: 0909 0909 0909 0909 0909 0909 0909 0909  ................
 23 0000160: 0909 0909 0909 0909 4352 2d4c 460a 4920  ........CR-LF.I
 24 0000170: 646f 6e27 7420 6265 6c65 6976 6520 696e  don't beleive in
 25 0000180: 2053 616e 7461 2043 6c61 7573 2e09 0909   Santa Claus....
 26 0000190: 0909 0909 0909 0909 0909 0909 0909 0943  ...............C
 27 00001a0: 522d 4c46 0a44 6f20 444f 532d 666f 726d  R-LF.Do DOS-form
 28 00001b0: 6174 2074 6578 7420 6669 6c65 7320 6578  at text files ex
 29 00001c0: 6973 743f 2020 0909 0909 0909 0909 0909  ist?  ..........
 30 00001d0: 0909 0909 0909 0943 522d 4c46 0a43 616e  .......CR-LF.Can
 31 00001e0: 2074 6865 2068 756d 616e 2065 7965 2073   the human eye s
 32 00001f0: 6565 2074 6865 2064 6966 6665 7265 6e63  ee the differenc
 33 0000200: 6520 6265 7477 6565 6e20 6120 444f 532d  e between a DOS-
 34 0000210: 666f 726d 6174 2074 6578 7420 6669 6c65  format text file
 35 0000220: 2061 6e64 2061 2055 4e49 582d 666f 726d   and a UNIX-form
 36 0000230: 6174 2074 6578 7420 6669 6c65 3f09 0909  at text file?...
 37 0000240: 4352 2d4c 460a 0a47 6f72 6520 4973 2042  CR-LF..Gore Is B
 38 0000250: 6967 6765 7220 5468 616e 2045 7665 7221  igger Than Ever!
 39 0000260: 0a42 7920 4265 6e20 536d 6974 680a 4120  .By Ben Smith.A
 40 0000270: 6372 6f77 6420 6f66 206e 6561 726c 7920  crowd of nearly
 41 0000280: 3530 3020 696e 2074 6865 204c 6962 7261  500 in the Libra
 42 0000290: 7279 2054 6865 6174 6572 2069 6e20 5061  ry Theater in Pa
 43 00002a0: 726b 2043 6974 792c 2055 7461 682c 2073  rk City, Utah, s
 44 00002b0: 7461 7965 6420 6f6e 2074 6872 6f75 6768  tayed on through
 45 00002c0: 2061 2073 7461 6e64 696e 6720 6f76 6174   a standing ovat
 46 00002d0: 696f 6e20 616e 6420 696e 746f 2074 6865  ion and into the
 47 00002e0: 2071 7565 7374 696f 6e2d 616e 642d 616e   question-and-an
 48 00002f0: 7377 6572 2073 6573 7369 6f6e 2061 7320  swer session as
 49 0000300: 416c 2047 6f72 65e2 8094 6f6e 6520 4b65  Al Gore...one Ke
 50 0000310: 6e74 7563 6b79 2072 6570 6f72 7465 7220  ntucky reporter
 51 0000320: 6163 7475 616c 6c79 2061 6464 7265 7373  actually address
 52 0000330: 6564 2068 696d 2061 7320 e280 9c4d 722e  ed him as ...Mr.
 53 0000340: 2050 7265 7369 6465 6e74 2ce2 809d 2074   President,... t
 54 0000350: 6f20 6c61 7567 6873 2061 6e64 2063 6865  o laughs and che
 55 0000360: 6572 73e2 8094 7265 7374 6174 6564 2068  ers...restated h
 56 0000370: 6973 2077 6172 6e69 6e67 7320 6162 6f75  is warnings abou
 57 0000380: 7420 7468 6520 e280 9c70 6c61 6e65 7461  t the ...planeta
 58 0000390: 7279 2065 6d65 7267 656e 6379 2ce2 809d  ry emergency,...
 59 00003a0: 2067 6c6f 6261 6c20 7761 726d 696e 672e   global warming.
 60 00003b0: 0a4d 722e 2047 6f72 6520 6973 2074 6865  .Mr. Gore is the
 61 00003c0: 2073 7461 7220 6f66 2061 2064 6f63 756d   star of a docum
 62 00003d0: 656e 7461 7279 2065 6e74 6572 6564 2069  entary entered i
 63 00003e0: 6e20 7468 6520 5375 6e64 616e 6365 2046  n the Sundance F
 64 00003f0: 696c 6d20 4665 7374 6976 616c 2c20 416e  ilm Festival, An
 65 0000400: 2049 6e63 6f6e 7665 6e69 656e 7420 5472   Inconvenient Tr
 66 0000410: 7574 682c 2061 6e64 2061 6c6c 2074 6865  uth, and all the
 67 0000420: 2071 7565 7374 696f 6e73 2077 6572 6520   questions were
 68 0000430: 666f 7220 6869 6d2e 2054 6865 2064 6972  for him. The dir
 69 0000440: 6563 746f 722c 2044 6176 6973 2047 7567  ector, Davis Gug
 70 0000450: 6765 6e68 6569 6d2c 2073 746f 6f64 2071  genheim, stood q
 71 0000460: 7569 6574 6c79 2074 6f20 6869 7320 6c65  uietly to his le
 72 0000470: 6674 3b20 6120 6269 7420 6661 7274 6865  ft; a bit farthe
 73 0000480: 7220 6177 6179 2077 6173 2074 6865 2077  r away was the w
 74 0000490: 6f6d 616e 2077 686f 206d 6164 6520 7468  oman who made th
 75 00004a0: 6520 6669 6c6d 2068 6170 7065 6e2c 2048  e film happen, H
 76 00004b0: 6f6c 6c79 776f 6f64 2044 656d 6f63 7261  ollywood Democra
 77 00004c0: 7469 6320 706f 7765 7220 706c 6179 6572  tic power player
 78 00004d0: 204c 6175 7269 6520 4461 7669 642c 2074   Laurie David, t
 79 00004e0: 6865 2077 6966 6520 6f66 2063 6f6d 6564  he wife of comed
 80 00004f0: 6961 6e20 4c61 7272 7920 4461 7669 642e  ian Larry David.
 81 0000500: 0a
So where is the 0012 and 0015 we were promised at the end of each line?? The only pattern I see somewhat near the end of each line is "0943 522d". What does this mean???

TEXT FILE TYPE
I even checked what kind of text file the textfile was:
Code:
$ file DOS-format_sampleTextFile.txt
DOS-format_sampleTextFile.txt: UTF-8 Unicode text, with very long lines
Attempt02:
Code:
od -tx1 DOSUNIX_textfile.txt >DOSUNIX_textfile_oct.txt
DOSUNIX_textfile_oct.txt
Code:
000000   S   a   m   p   l   e   -  sp   S   e   l   f   -   m   a   d
        53  61  6d  70  6c  65  2d  20  53  65  6c  66  2d  6d  61  64
        53  61  6d  70  6c  65  2d  20  53  65  6c  66  2d  6d  61  64
000010   e  sp   D   O   S   -   f   o   r   m   a   t  sp   t   e   x
        65  20  44  4f  53  2d  66  6f  72  6d  61  74  20  74  65  78
        65  20  44  4f  53  2d  66  6f  72  6d  61  74  20  74  65  78
000020   t  sp   f   i   l   e   .  ht  ht  ht  ht  ht  ht  ht  ht  ht
        74  20  66  69  6c  65  2e  09  09  09  09  09  09  09  09  09
        74  20  66  69  6c  65  2e  09  09  09  09  09  09  09  09  09
000030  ht  ht  ht  ht  ht   C   R   -   L   F  nl  ht  ht  ht  ht  ht
        09  09  09  09  09  43  52  2d  4c  46  0a  09  09  09  09  09
        09  09  09  09  09  43  52  2d  4c  46  0a  09  09  09  09  09
000040  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht
        09  09  09  09  09  09  09  09  09  09  09  09  09  09  09  09
        09  09  09  09  09  09  09  09  09  09  09  09  09  09  09  09
000050  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht   C   R   -   L
        09  09  09  09  09  09  09  09  09  09  09  09  43  52  2d  4c
        09  09  09  09  09  09  09  09  09  09  09  09  43  52  2d  4c
000060   F  nl   C   R   =   '   \   0   1   5   '  ht  ht   #  sp   C
        46  0a  43  52  3d  27  5c  30  31  35  27  09  09  23  20  43
        46  0a  43  52  3d  27  5c  30  31  35  27  09  09  23  20  43
000070   a   r   r   i   a   g   e  sp   r   e   t   u   r   n   .  ht
        61  72  72  69  61  67  65  20  72  65  74  75  72  6e  2e  09
        61  72  72  69  61  67  65  20  72  65  74  75  72  6e  2e  09
000080  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht
        09  09  09  09  09  09  09  09  09  09  09  09  09  09  09  09
        09  09  09  09  09  09  09  09  09  09  09  09  09  09  09  09
000090  ht   C   R   -   L   F  nl  ht  ht  ht  ht  ht  ht   #  sp   0
        09  43  52  2d  4c  46  0a  09  09  09  09  09  09  23  20  30
        09  43  52  2d  4c  46  0a  09  09  09  09  09  09  23  20  30
0000a0   1   5  sp   i   s  sp   o   c   t   a   l  sp   A   S   C   I
        31  35  20  69  73  20  6f  63  74  61  6c  20  41  53  43  49
        31  35  20  69  73  20  6f  63  74  61  6c  20  41  53  43  49
0000b0   I  sp   c   o   d   e  sp   f   o   r  sp   C   R   .  ht  ht
        49  20  63  6f  64  65  20  66  6f  72  20  43  52  2e  09  09
        49  20  63  6f  64  65  20  66  6f  72  20  43  52  2e  09  09
0000c0  ht  ht  ht  ht  ht  ht  ht  ht  ht   C   R   -   L   F  nl  ht
        09  09  09  09  09  09  09  09  09  43  52  2d  4c  46  0a  09
        09  09  09  09  09  09  09  09  09  43  52  2d  4c  46  0a  09
0000d0  ht  ht  ht  ht  ht   #  sp   L   i   n   e   s  sp   i   n  sp
        09  09  09  09  09  23  20  4c  69  6e  65  73  20  69  6e  20
        09  09  09  09  09  23  20  4c  69  6e  65  73  20  69  6e  20
0000e0   a  sp   D   O   S  sp   t   e   x   t  sp   f   i   l   e  sp
        61  20  44  4f  53  20  74  65  78  74  20  66  69  6c  65  20
        61  20  44  4f  53  20  74  65  78  74  20  66  69  6c  65  20
0000f0   e   n   d  sp   i   n  sp   C   R   -   L   F   .  ht  ht  ht
        65  6e  64  20  69  6e  20  43  52  2d  4c  46  2e  09  09  09
        65  6e  64  20  69  6e  20  43  52  2d  4c  46  2e  09  09  09
000100  ht  ht  ht  ht   C   R   -   L   F  nl  ht  ht  ht  ht  ht  ht
        09  09  09  09  43  52  2d  4c  46  0a  09  09  09  09  09  09
        09  09  09  09  43  52  2d  4c  46  0a  09  09  09  09  09  09
000110   #  sp   L   i   n   e   s  sp   i   n  sp   a  sp   U   N   I
        23  20  4c  69  6e  65  73  20  69  6e  20  61  20  55  4e  49
        23  20  4c  69  6e  65  73  20  69  6e  20  61  20  55  4e  49
000120   X  sp   t   e   x   t  sp   f   i   l   e  sp   e   n   d  sp
        58  20  74  65  78  74  20  66  69  6c  65  20  65  6e  64  20
        58  20  74  65  78  74  20  66  69  6c  65  20  65  6e  64  20
000130   i   n  sp   L   F  sp   o   n   l   y   .  ht  ht  ht  ht  ht
        69  6e  20  4c  46  20  6f  6e  6c  79  2e  09  09  09  09  09
        69  6e  20  4c  46  20  6f  6e  6c  79  2e  09  09  09  09  09
000140  ht   C   R   -   L   F  nl  ht  ht  ht  ht  ht  ht  ht  ht  ht
        09  43  52  2d  4c  46  0a  09  09  09  09  09  09  09  09  09
        09  43  52  2d  4c  46  0a  09  09  09  09  09  09  09  09  09
000150  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht
        09  09  09  09  09  09  09  09  09  09  09  09  09  09  09  09
        09  09  09  09  09  09  09  09  09  09  09  09  09  09  09  09
000160  ht  ht  ht  ht  ht  ht  ht  ht   C   R   -   L   F  nl   I  sp
        09  09  09  09  09  09  09  09  43  52  2d  4c  46  0a  49  20
        09  09  09  09  09  09  09  09  43  52  2d  4c  46  0a  49  20
000170   d   o   n   '   t  sp   b   e   l   e   i   v   e  sp   i   n
        64  6f  6e  27  74  20  62  65  6c  65  69  76  65  20  69  6e
        64  6f  6e  27  74  20  62  65  6c  65  69  76  65  20  69  6e
000180  sp   S   a   n   t   a  sp   C   l   a   u   s   .  ht  ht  ht
        20  53  61  6e  74  61  20  43  6c  61  75  73  2e  09  09  09
        20  53  61  6e  74  61  20  43  6c  61  75  73  2e  09  09  09
000190  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht   C
        09  09  09  09  09  09  09  09  09  09  09  09  09  09  09  43
        09  09  09  09  09  09  09  09  09  09  09  09  09  09  09  43
0001a0   R   -   L   F  nl   D   o  sp   D   O   S   -   f   o   r   m
        52  2d  4c  46  0a  44  6f  20  44  4f  53  2d  66  6f  72  6d
        52  2d  4c  46  0a  44  6f  20  44  4f  53  2d  66  6f  72  6d
0001b0   a   t  sp   t   e   x   t  sp   f   i   l   e   s  sp   e   x
        61  74  20  74  65  78  74  20  66  69  6c  65  73  20  65  78
        61  74  20  74  65  78  74  20  66  69  6c  65  73  20  65  78
0001c0   i   s   t   ?  sp  sp  ht  ht  ht  ht  ht  ht  ht  ht  ht  ht
        69  73  74  3f  20  20  09  09  09  09  09  09  09  09  09  09
        69  73  74  3f  20  20  09  09  09  09  09  09  09  09  09  09
0001d0  ht  ht  ht  ht  ht  ht  ht   C   R   -   L   F  nl   C   a   n
        09  09  09  09  09  09  09  43  52  2d  4c  46  0a  43  61  6e
        09  09  09  09  09  09  09  43  52  2d  4c  46  0a  43  61  6e
0001e0  sp   t   h   e  sp   h   u   m   a   n  sp   e   y   e  sp   s
        20  74  68  65  20  68  75  6d  61  6e  20  65  79  65  20  73
        20  74  68  65  20  68  75  6d  61  6e  20  65  79  65  20  73
0001f0   e   e  sp   t   h   e  sp   d   i   f   f   e   r   e   n   c
        65  65  20  74  68  65  20  64  69  66  66  65  72  65  6e  63
        65  65  20  74  68  65  20  64  69  66  66  65  72  65  6e  63
000200   e  sp   b   e   t   w   e   e   n  sp   a  sp   D   O   S   -
        65  20  62  65  74  77  65  65  6e  20  61  20  44  4f  53  2d
        65  20  62  65  74  77  65  65  6e  20  61  20  44  4f  53  2d
000210   f   o   r   m   a   t  sp   t   e   x   t  sp   f   i   l   e
        66  6f  72  6d  61  74  20  74  65  78  74  20  66  69  6c  65
        66  6f  72  6d  61  74  20  74  65  78  74  20  66  69  6c  65
000220  sp   a   n   d  sp   a  sp   U   N   I   X   -   f   o   r   m
        20  61  6e  64  20  61  20  55  4e  49  58  2d  66  6f  72  6d
        20  61  6e  64  20  61  20  55  4e  49  58  2d  66  6f  72  6d
000230   a   t  sp   t   e   x   t  sp   f   i   l   e   ?  ht  ht  ht
        61  74  20  74  65  78  74  20  66  69  6c  65  3f  09  09  09
        61  74  20  74  65  78  74  20  66  69  6c  65  3f  09  09  09
000240   C   R   -   L   F  nl  nl   G   o   r   e  sp   I   s  sp   B
        43  52  2d  4c  46  0a  0a  47  6f  72  65  20  49  73  20  42
        43  52  2d  4c  46  0a  0a  47  6f  72  65  20  49  73  20  42
000250   i   g   g   e   r  sp   T   h   a   n  sp   E   v   e   r   !
        69  67  67  65  72  20  54  68  61  6e  20  45  76  65  72  21
        69  67  67  65  72  20  54  68  61  6e  20  45  76  65  72  21
000260  nl   B   y  sp   B   e   n  sp   S   m   i   t   h  nl   A  sp
        0a  42  79  20  42  65  6e  20  53  6d  69  74  68  0a  41  20
        0a  42  79  20  42  65  6e  20  53  6d  69  74  68  0a  41  20
000270   c   r   o   w   d  sp   o   f  sp   n   e   a   r   l   y  sp
        63  72  6f  77  64  20  6f  66  20  6e  65  61  72  6c  79  20
        63  72  6f  77  64  20  6f  66  20  6e  65  61  72  6c  79  20
000280   5   0   0  sp   i   n  sp   t   h   e  sp   L   i   b   r   a
        35  30  30  20  69  6e  20  74  68  65  20  4c  69  62  72  61
        35  30  30  20  69  6e  20  74  68  65  20  4c  69  62  72  61
000290   r   y  sp   T   h   e   a   t   e   r  sp   i   n  sp   P   a
        72  79  20  54  68  65  61  74  65  72  20  69  6e  20  50  61
        72  79  20  54  68  65  61  74  65  72  20  69  6e  20  50  61
0002a0   r   k  sp   C   i   t   y   ,  sp   U   t   a   h   ,  sp   s
        72  6b  20  43  69  74  79  2c  20  55  74  61  68  2c  20  73
        72  6b  20  43  69  74  79  2c  20  55  74  61  68  2c  20  73
0002b0   t   a   y   e   d  sp   o   n  sp   t   h   r   o   u   g   h
        74  61  79  65  64  20  6f  6e  20  74  68  72  6f  75  67  68
        74  61  79  65  64  20  6f  6e  20  74  68  72  6f  75  67  68
0002c0  sp   a  sp   s   t   a   n   d   i   n   g  sp   o   v   a   t
        20  61  20  73  74  61  6e  64  69  6e  67  20  6f  76  61  74
        20  61  20  73  74  61  6e  64  69  6e  67  20  6f  76  61  74
0002d0   i   o   n  sp   a   n   d  sp   i   n   t   o  sp   t   h   e
        69  6f  6e  20  61  6e  64  20  69  6e  74  6f  20  74  68  65
        69  6f  6e  20  61  6e  64  20  69  6e  74  6f  20  74  68  65
0002e0  sp   q   u   e   s   t   i   o   n   -   a   n   d   -   a   n
        20  71  75  65  73  74  69  6f  6e  2d  61  6e  64  2d  61  6e
        20  71  75  65  73  74  69  6f  6e  2d  61  6e  64  2d  61  6e
0002f0   s   w   e   r  sp   s   e   s   s   i   o   n  sp   a   s  sp
        73  77  65  72  20  73  65  73  73  69  6f  6e  20  61  73  20
        73  77  65  72  20  73  65  73  73  69  6f  6e  20  61  73  20
000300   A   l  sp   G   o   r   e   b nul dc4   o   n   e  sp   K   e
        41  6c  20  47  6f  72  65  e2  80  94  6f  6e  65  20  4b  65
        41  6c  20  47  6f  72  65  e2  80  94  6f  6e  65  20  4b  65
000310   n   t   u   c   k   y  sp   r   e   p   o   r   t   e   r  sp
        6e  74  75  63  6b  79  20  72  65  70  6f  72  74  65  72  20
        6e  74  75  63  6b  79  20  72  65  70  6f  72  74  65  72  20
000320   a   c   t   u   a   l   l   y  sp   a   d   d   r   e   s   s
        61  63  74  75  61  6c  6c  79  20  61  64  64  72  65  73  73
        61  63  74  75  61  6c  6c  79  20  61  64  64  72  65  73  73
000330   e   d  sp   h   i   m  sp   a   s  sp   b nul  fs   M   r   .
        65  64  20  68  69  6d  20  61  73  20  e2  80  9c  4d  72  2e
        65  64  20  68  69  6d  20  61  73  20  e2  80  9c  4d  72  2e
000340  sp   P   r   e   s   i   d   e   n   t   ,   b nul  gs  sp   t
        20  50  72  65  73  69  64  65  6e  74  2c  e2  80  9d  20  74
        20  50  72  65  73  69  64  65  6e  74  2c  e2  80  9d  20  74
000350   o  sp   l   a   u   g   h   s  sp   a   n   d  sp   c   h   e
        6f  20  6c  61  75  67  68  73  20  61  6e  64  20  63  68  65
        6f  20  6c  61  75  67  68  73  20  61  6e  64  20  63  68  65
000360   e   r   s   b nul dc4   r   e   s   t   a   t   e   d  sp   h
        65  72  73  e2  80  94  72  65  73  74  61  74  65  64  20  68
        65  72  73  e2  80  94  72  65  73  74  61  74  65  64  20  68
000370   i   s  sp   w   a   r   n   i   n   g   s  sp   a   b   o   u
        69  73  20  77  61  72  6e  69  6e  67  73  20  61  62  6f  75
        69  73  20  77  61  72  6e  69  6e  67  73  20  61  62  6f  75
000380   t  sp   t   h   e  sp   b nul  fs   p   l   a   n   e   t   a
        74  20  74  68  65  20  e2  80  9c  70  6c  61  6e  65  74  61
        74  20  74  68  65  20  e2  80  9c  70  6c  61  6e  65  74  61
000390   r   y  sp   e   m   e   r   g   e   n   c   y   ,   b nul  gs
        72  79  20  65  6d  65  72  67  65  6e  63  79  2c  e2  80  9d
        72  79  20  65  6d  65  72  67  65  6e  63  79  2c  e2  80  9d
0003a0  sp   g   l   o   b   a   l  sp   w   a   r   m   i   n   g   .
        20  67  6c  6f  62  61  6c  20  77  61  72  6d  69  6e  67  2e
        20  67  6c  6f  62  61  6c  20  77  61  72  6d  69  6e  67  2e
0003b0  nl   M   r   .  sp   G   o   r   e  sp   i   s  sp   t   h   e
        0a  4d  72  2e  20  47  6f  72  65  20  69  73  20  74  68  65
        0a  4d  72  2e  20  47  6f  72  65  20  69  73  20  74  68  65
0003c0  sp   s   t   a   r  sp   o   f  sp   a  sp   d   o   c   u   m
        20  73  74  61  72  20  6f  66  20  61  20  64  6f  63  75  6d
        20  73  74  61  72  20  6f  66  20  61  20  64  6f  63  75  6d
0003d0   e   n   t   a   r   y  sp   e   n   t   e   r   e   d  sp   i
        65  6e  74  61  72  79  20  65  6e  74  65  72  65  64  20  69
        65  6e  74  61  72  79  20  65  6e  74  65  72  65  64  20  69
0003e0   n  sp   t   h   e  sp   S   u   n   d   a   n   c   e  sp   F
        6e  20  74  68  65  20  53  75  6e  64  61  6e  63  65  20  46
        6e  20  74  68  65  20  53  75  6e  64  61  6e  63  65  20  46
0003f0   i   l   m  sp   F   e   s   t   i   v   a   l   ,  sp   A   n
        69  6c  6d  20  46  65  73  74  69  76  61  6c  2c  20  41  6e
        69  6c  6d  20  46  65  73  74  69  76  61  6c  2c  20  41  6e
000400  sp   I   n   c   o   n   v   e   n   i   e   n   t  sp   T   r
        20  49  6e  63  6f  6e  76  65  6e  69  65  6e  74  20  54  72
        20  49  6e  63  6f  6e  76  65  6e  69  65  6e  74  20  54  72
000410   u   t   h   ,  sp   a   n   d  sp   a   l   l  sp   t   h   e
        75  74  68  2c  20  61  6e  64  20  61  6c  6c  20  74  68  65
        75  74  68  2c  20  61  6e  64  20  61  6c  6c  20  74  68  65
000420  sp   q   u   e   s   t   i   o   n   s  sp   w   e   r   e  sp
        20  71  75  65  73  74  69  6f  6e  73  20  77  65  72  65  20
        20  71  75  65  73  74  69  6f  6e  73  20  77  65  72  65  20
000430   f   o   r  sp   h   i   m   .  sp   T   h   e  sp   d   i   r
        66  6f  72  20  68  69  6d  2e  20  54  68  65  20  64  69  72
        66  6f  72  20  68  69  6d  2e  20  54  68  65  20  64  69  72
000440   e   c   t   o   r   ,  sp   D   a   v   i   s  sp   G   u   g
        65  63  74  6f  72  2c  20  44  61  76  69  73  20  47  75  67
        65  63  74  6f  72  2c  20  44  61  76  69  73  20  47  75  67
000450   g   e   n   h   e   i   m   ,  sp   s   t   o   o   d  sp   q
        67  65  6e  68  65  69  6d  2c  20  73  74  6f  6f  64  20  71
        67  65  6e  68  65  69  6d  2c  20  73  74  6f  6f  64  20  71
000460   u   i   e   t   l   y  sp   t   o  sp   h   i   s  sp   l   e
        75  69  65  74  6c  79  20  74  6f  20  68  69  73  20  6c  65
        75  69  65  74  6c  79  20  74  6f  20  68  69  73  20  6c  65
000470   f   t   ;  sp   a  sp   b   i   t  sp   f   a   r   t   h   e
        66  74  3b  20  61  20  62  69  74  20  66  61  72  74  68  65
        66  74  3b  20  61  20  62  69  74  20  66  61  72  74  68  65
000480   r  sp   a   w   a   y  sp   w   a   s  sp   t   h   e  sp   w
        72  20  61  77  61  79  20  77  61  73  20  74  68  65  20  77
        72  20  61  77  61  79  20  77  61  73  20  74  68  65  20  77
000490   o   m   a   n  sp   w   h   o  sp   m   a   d   e  sp   t   h
        6f  6d  61  6e  20  77  68  6f  20  6d  61  64  65  20  74  68
        6f  6d  61  6e  20  77  68  6f  20  6d  61  64  65  20  74  68
0004a0   e  sp   f   i   l   m  sp   h   a   p   p   e   n   ,  sp   H
        65  20  66  69  6c  6d  20  68  61  70  70  65  6e  2c  20  48
        65  20  66  69  6c  6d  20  68  61  70  70  65  6e  2c  20  48
0004b0   o   l   l   y   w   o   o   d  sp   D   e   m   o   c   r   a
        6f  6c  6c  79  77  6f  6f  64  20  44  65  6d  6f  63  72  61
        6f  6c  6c  79  77  6f  6f  64  20  44  65  6d  6f  63  72  61
0004c0   t   i   c  sp   p   o   w   e   r  sp   p   l   a   y   e   r
        74  69  63  20  70  6f  77  65  72  20  70  6c  61  79  65  72
        74  69  63  20  70  6f  77  65  72  20  70  6c  61  79  65  72
0004d0  sp   L   a   u   r   i   e  sp   D   a   v   i   d   ,  sp   t
        20  4c  61  75  72  69  65  20  44  61  76  69  64  2c  20  74
        20  4c  61  75  72  69  65  20  44  61  76  69  64  2c  20  74
0004e0   h   e  sp   w   i   f   e  sp   o   f  sp   c   o   m   e   d
        68  65  20  77  69  66  65  20  6f  66  20  63  6f  6d  65  64
        68  65  20  77  69  66  65  20  6f  66  20  63  6f  6d  65  64
0004f0   i   a   n  sp   L   a   r   r   y  sp   D   a   v   i   d   .
        69  61  6e  20  4c  61  72  72  79  20  44  61  76  69  64  2e
        69  61  6e  20  4c  61  72  72  79  20  44  61  76  69  64  2e
000500  nl
        0a
        0a
000501
Even though I am closer, this is simply the worst designed problem I have encountered in ABS!!

I can't see clearly via hex/dec/oct view to verify that the data was in fact transered from DOS-format text file to unix-format text file? What's worse is that how am I supposed to create a DOS-format text file being in order to learn linux/unix we have to learn by doing, and that I am using a linux platform to do this? This problem simply doesn't make any sense. It may run w/o any errors, but there is no way of checking it.

Last edited by andrew.comly; 07-04-2017 at 10:04 AM. Reason: incomplete
 
Old 07-04-2017, 09:51 AM   #2
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 21 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925
In the hex view, 43 52 2d 4c 46 is "CR-LF" in ASCII.

The 0a is the *nix LF.

You have misunderstood exercise 23.

Last edited by hydrurga; 07-04-2017 at 10:02 AM.
 
1 members found this post helpful.
Old 07-04-2017, 10:25 AM   #3
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,862
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
@OP: Could you please compress your question into a single sentence? You want to know the code of CR and LF?
CR: 13(base10) 0x0D(base16) 015(base8) \r(escapesequence)
LF: 10(base10) 0x0A(base16) 012(base8) \n(escapesequence)
 
1 members found this post helpful.
Old 07-04-2017, 10:58 AM   #4
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,700

Rep: Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895
In bash an octal number is represented when there is a leading zero and a hex number is proceeded by a 0x. So as you indicated CR = 13 decimal = 015 octal = 0xD hex. The ASCII table you linked does not include octal numbers.

Code:
tr -d "CR-LF" < $1 > $NEWFILENAME
Did you create a script? Are you looking at the original or new file? The above command should of removed the CR-LF characters in the file and not actually changed the line feed which is a 0x0A from the included hex dump.

Example 23 from the ABS does use the octal code.
tr -d "\015" < oldfile > newfile

0943 522d are 16 bit hex digits which represent 8 bit ASCII codes. 09 is unprintable and represented by a . but the rest is:
0943 522d 4c46
. C R - L F

You need to create a DOS text file then run the command. You can do that in any number of ways. Then try running the above command.
 
1 members found this post helpful.
Old 07-04-2017, 11:39 AM   #5
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 21 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925
Just to add a comment, there are a few things that don't ring true with the sequence of events that you describe, including:

. The command:

Code:
tr -d "CR-LF" < $1 > $NEWFILENAME
should produce the error "tr: range-endpoints of 'R-L' are in reverse collating sequence order". That is because this is not doing what you think it should. "CR-LF" is a string containing a range.

You need to look back at example 23 - it achieved the translation from Windows to *nix line endings by deleting the CR. You are trying to do the opposite. How is a delete the opposite of a delete? You should be adding in CR's, not trying to delete anything.

. You say that you pasted in the Ben Smith text to a new file and ran your script on it. Then you open it with a hex editor and the text is now a whole load of other stuff (including Santa Claus) before the Ben Smith article.

If you don't mind me saying, it's all in a bit of a mess. You need to go back to square one and start this example again.
 
4 members found this post helpful.
Old 07-05-2017, 08:41 PM   #6
andrew.comly
Member
 
Registered: Dec 2012
Distribution: Trisquel-Mini 7.0, Lubuntu 14.04, Debian lxde 8.0
Posts: 311

Original Poster
Blog Entries: 2

Rep: Reputation: 16
Question I'm clueless.

Quote:
Originally Posted by NevemTeve View Post
@OP: Could you please compress your question into a single sentence? You want to know the code of CR and LF?
I only know that I don't know, but I don't know what exactly I don't know.

Where am I? Where is this place "Don't know"? How do I arrive from where ever I am now to the place "Don't know"? Only once I get there I can experience it, and only then can I fully describe "Don't know" to others.

The 1st 3 random question that comes into my mind are:
  1. After running this program on a file (*.txt) that you copied text into in trisquel 7.0, how do I really know it worked? Where is the proof?
  2. Wouldn't creating a file with text in it on Trisquel 7.0, since "Trisquel" is linux, create a unix format text file?
  3. How to adjust the code
    Code:
    tr -d $CR < $1 > $NEWFILENAME
    to go the other way and convert a unix file with text in it to a DOS file with text in it?

Last edited by andrew.comly; 07-06-2017 at 03:59 AM. Reason: Dao
 
Old 07-05-2017, 09:07 PM   #7
andrew.comly
Member
 
Registered: Dec 2012
Distribution: Trisquel-Mini 7.0, Lubuntu 14.04, Debian lxde 8.0
Posts: 311

Original Poster
Blog Entries: 2

Rep: Reputation: 16
Question 1) script: Yes; 2) create DOS text file

Quote:
Originally Posted by michaelk View Post
Did you create a script?
Yes, for your reference the script I emulated (from ABS) is below:
Code:
#!/bin/bash
#GNU bash, version 4.3.11(1)-release (i686-pc-linux-gnu)
#Du.sh: DOS to UNIX text file converter

E_WRONGARGS=85

#VERIFY CMD LINE ARG PROVIDED BY USER
	if [ -z "$1" ]			
		then
			echo "Usage: `basename $0` filename-to-convert"
			exit $E_WRONGARGS
	fi

NEWFILENAME=$1.unx

CR='\015'		# Carriage return.
						# 015 is octal ASCII code for CR.
						# Lines in a DOS text file end in CR-LF.
						# Lines in a UNIX text file end in LF only.

tr -d $CR < $1 > $NEWFILENAME
# Delete CR's and write to new file.

echo "Original DOS text file is \"$1\"."
echo "Converted UNIX text file is \"$NEWFILENAME\"."

exit 0

Quote:
Originally Posted by michaelk View Post
You need to create a DOS text file then run the command. You can do that in any number of ways. Then try running the above command.
How does one create a DOS text file on linux? Can you do it from the terminal?

Last edited by andrew.comly; 07-05-2017 at 09:10 PM. Reason: clarity
 
Old 07-05-2017, 09:26 PM   #8
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,700

Rep: Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895
Depends on the text editor.

For example the gedit save window has a line ending pull down box where you can select unix or dos. vim you can set the file format using the command :set ff=dos.
 
1 members found this post helpful.
Old 07-06-2017, 12:05 AM   #9
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,862
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
Yes 'tr -d $CR' can remove CR characters; I think 'sed' can add them (untested):
Code:
sed 's/$/'$CR'/' inputfile >outputfile

Last edited by NevemTeve; 07-06-2017 at 12:06 AM.
 
1 members found this post helpful.
Old 07-06-2017, 03:53 AM   #10
andrew.comly
Member
 
Registered: Dec 2012
Distribution: Trisquel-Mini 7.0, Lubuntu 14.04, Debian lxde 8.0
Posts: 311

Original Poster
Blog Entries: 2

Rep: Reputation: 16
Lightbulb Now I can proove that ex 23 truely works.

Quote:
Originally Posted by michaelk View Post
For example the gedit save window has a line ending pull down box where you can select unix or dos. vim you can set the file format using the command :set ff=dos.
From geany, I made a new file containg the text:
Quote:
See the dog.
See the bird.
Smoke coal cigarettes.
Hat
shirt
socks
shoes
I then converted it into DOS:
  1. Menu Bar (Hold down Alt)
  2. Document
  3. Set Line Endings
  4. Convert and Set to CR/LF (WIN)
  5. Save As DOSUNIX_textfileMS.txt

Code:
$ od DOSUNIX_textfileMS.txt 
000000   S   e   e  sp   t   h   e  sp   d   o   g   .  cr  nl   S   e
        53  65  65  20  74  68  65  20  64  6f  67  2e  0d  0a  53  65
000010   e  sp   t   h   e  sp   b   i   r   d   .  cr  nl   S   m   o
        65  20  74  68  65  20  62  69  72  64  2e  0d  0a  53  6d  6f
000020   k   e  sp   c   o   a   l  sp   c   i   g   a   r   e   t   t
        6b  65  20  63  6f  61  6c  20  63  69  67  61  72  65  74  74
000030   e   s   .  cr  nl   H   a   t  cr  nl   s   h   i   r   t  cr
        65  73  2e  0d  0a  48  61  74  0d  0a  73  68  69  72  74  0d
000040  nl   s   o   c   k   s  cr  nl   s   h   o   e   s  cr  nl
        0a  73  6f  63  6b  73  0d  0a  73  68  6f  65  73  0d  0a
00004f
$ ./Ch16-23DosToUnix_TextConv.sh DOSUNIX_textfileMS.txt 
Original DOS text file is "DOSUNIX_textfileMS.txt".
Converted UNIX text file is "DOSUNIX_textfileMS.txt.unx".
$ od DOSUNIX_textfileMS.txt.unx 
000000   S   e   e  sp   t   h   e  sp   d   o   g   .  nl   S   e   e
        53  65  65  20  74  68  65  20  64  6f  67  2e  0a  53  65  65
000010  sp   t   h   e  sp   b   i   r   d   .  nl   S   m   o   k   e
        20  74  68  65  20  62  69  72  64  2e  0a  53  6d  6f  6b  65
000020  sp   c   o   a   l  sp   c   i   g   a   r   e   t   t   e   s
        20  63  6f  61  6c  20  63  69  67  61  72  65  74  74  65  73
000030   .  nl   H   a   t  nl   s   h   i   r   t  nl   s   o   c   k
        2e  0a  48  61  74  0a  73  68  69  72  74  0a  73  6f  63  6b
000040   s  nl   s   h   o   e   s  nl
        73  0a  73  68  6f  65  73  0a
000048
One can clearly see from the above octal dump that all newlines were converted from " cr nl" to " nl".

Thanks so much michaelk!!!

Last edited by andrew.comly; 07-06-2017 at 04:32 AM. Reason: misinterpreted "." to be octal rather than text
 
Old 07-06-2017, 04:17 AM   #11
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 21 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925
Nice one, Andrew.

Do you want to try doing the reverse now, converting *nix line endings back into DOS/Windows line endings?

By the way, if you want to know why CR and LF are used, it comes from using typewriters, where at the end of a line you would push the piece that carries the paper (the "carriage") back to the start (CR) then rotate it ready for the next line (LF), often in the same movement.

Also, if you're interested, those 09s you had earlier in the thread were tabs.
 
1 members found this post helpful.
Old 07-06-2017, 08:12 AM   #12
andrew.comly
Member
 
Registered: Dec 2012
Distribution: Trisquel-Mini 7.0, Lubuntu 14.04, Debian lxde 8.0
Posts: 311

Original Poster
Blog Entries: 2

Rep: Reputation: 16
Arrow can reverse, but don't understand solution found

sed is a stream editor for filtering and transforming text, and by the initial trials below, it appears futile in detecting octal data. Again, the file A.TXT is a UNIX-formatted textfile.

Attempt01
Code:
$ sed s/0d/'0d  0a'/g $(od A.TXT)sed: can't read 000000: No such file or directory
sed: can't read S: No such file or directory
sed: can't read e: No such file or directory
sed: can't read e: No such file or directory
sed: can't read sp: No such file or directory
sed: can't read t: No such file or directory
sed: can't read h: No such file or directory
sed: can't read e: No such file or directory
sed: can't read sp: No such file or directory
sed: can't read d: No such file or directory
sed: can't read o: No such file or directory
sed: can't read g: No such file or directory
sed: can't read nl: No such file or directory
sed: can't read s: No such file or directory
sed: can't read e: No such file or directory
sed: can't read e: No such file or directory
sed: can't read sp: No such file or directory
sed: can't read 53: No such file or directory
sed: can't read 65: No such file or directory
sed: can't read 65: No such file or directory
sed: can't read 20: No such file or directory
sed: can't read 74: No such file or directory
sed: can't read 68: No such file or directory
sed: can't read 65: No such file or directory
sed: can't read 20: No such file or directory
sed: can't read 64: No such file or directory
sed: can't read 6f: No such file or directory
sed: can't read 67: No such file or directory
sed: can't read 0a: No such file or directory
...
Attempt02
Code:
$ grep '0a' A.TXT 
$ grep 'bird' A.TXT 
see the bird
Attempt03
Code:
$ sed s/0d/'0d  0a'/g A.TXT >C.TXT  
$ od C.TXT 
000000   S   e   e  sp   t   h   e  sp   d   o   g  nl   s   e   e  sp
        53  65  65  20  74  68  65  20  64  6f  67  0a  73  65  65  20
000010   t   h   e  sp   s   o   c   k   s  nl   s   e   e  sp   t   h
        74  68  65  20  73  6f  63  6b  73  0a  73  65  65  20  74  68
000020   e  sp   b   i   r   d  nl   s   e   e  sp   t   h   e  sp   c
        65  20  62  69  72  64  0a  73  65  65  20  74  68  65  20  63
000030   o   a   l  nl   s   e   e  sp   D   o   n   a   l   d  sp   T
        6f  61  6c  0a  73  65  65  20  44  6f  6e  61  6c  64  20  54
000040   r   u   m   p  nl
        72  75  6d  70  0a
000045
But then after a second internet search I found this post tutorial where they use sed.

Attempt04
Code:
$ sed 's/$'"/`echo \\\r`/" A.TXT > output.txt
$ od output.txt 
000000   S   e   e  sp   t   h   e  sp   d   o   g  cr  nl   s   e   e
        53  65  65  20  74  68  65  20  64  6f  67  0d  0a  73  65  65
000010  sp   t   h   e  sp   s   o   c   k   s  cr  nl   s   e   e  sp
        20  74  68  65  20  73  6f  63  6b  73  0d  0a  73  65  65  20
000020   t   h   e  sp   b   i   r   d  cr  nl   s   e   e  sp   t   h
        74  68  65  20  62  69  72  64  0d  0a  73  65  65  20  74  68
000030   e  sp   c   o   a   l  cr  nl   s   e   e  sp   D   o   n   a
        65  20  63  6f  61  6c  0d  0a  73  65  65  20  44  6f  6e  61
000040   l   d  sp   T   r   u   m   p  cr  nl
        6c  64  20  54  72  75  6d  70  0d  0a
00004a
$ od A.TXT 
000000   S   e   e  sp   t   h   e  sp   d   o   g  nl   s   e   e  sp
        53  65  65  20  74  68  65  20  64  6f  67  0a  73  65  65  20
000010   t   h   e  sp   s   o   c   k   s  nl   s   e   e  sp   t   h
        74  68  65  20  73  6f  63  6b  73  0a  73  65  65  20  74  68
000020   e  sp   b   i   r   d  nl   s   e   e  sp   t   h   e  sp   c
        65  20  62  69  72  64  0a  73  65  65  20  74  68  65  20  63
000030   o   a   l  nl   s   e   e  sp   D   o   n   a   l   d  sp   T
        6f  61  6c  0a  73  65  65  20  44  6f  6e  61  6c  64  20  54
000040   r   u   m   p  nl
        72  75  6d  70  0a
000045
From the above, you can see that it worked. But the sourceforge site mentioned no explination of why the sed command worked.
What is the secret code "\\\r"? It's not in the sed manual file. Can we replace this with something non-secret in the ascii table? The following commands that utilize non-secret codes for CR-NL fail:

Attempt01: Oct
Code:
$ sed 's/$'"/`echo "015  012"`/" A.TXT > output1.txt
$ od output1.txt 
000000   S   e   e  sp   t   h   e  sp   d   o   g   0   1   5  sp  sp
        53  65  65  20  74  68  65  20  64  6f  67  30  31  35  20  20
000010   0   1   2  nl   s   e   e  sp   t   h   e  sp   s   o   c   k
        30  31  32  0a  73  65  65  20  74  68  65  20  73  6f  63  6b
000020   s   0   1   5  sp  sp   0   1   2  nl   s   e   e  sp   t   h
        73  30  31  35  20  20  30  31  32  0a  73  65  65  20  74  68
000030   e  sp   b   i   r   d   0   1   5  sp  sp   0   1   2  nl   s
        65  20  62  69  72  64  30  31  35  20  20  30  31  32  0a  73
000040   e   e  sp   t   h   e  sp   c   o   a   l   0   1   5  sp  sp
        65  65  20  74  68  65  20  63  6f  61  6c  30  31  35  20  20
000050   0   1   2  nl   s   e   e  sp   D   o   n   a   l   d  sp   T
        30  31  32  0a  73  65  65  20  44  6f  6e  61  6c  64  20  54
000060   r   u   m   p   0   1   5  sp  sp   0   1   2  nl
        72  75  6d  70  30  31  35  20  20  30  31  32  0a
00006d
Attempt02: Hex
Code:
$ sed 's/$'"/`echo 0d  0a`/" A.TXT > output2.txt
$ od output2.txt 000000   S   e   e  sp   t   h   e  sp   d   o   g   0   d  sp   0   a
        53  65  65  20  74  68  65  20  64  6f  67  30  64  20  30  61
000010  nl   s   e   e  sp   t   h   e  sp   s   o   c   k   s   0   d
        0a  73  65  65  20  74  68  65  20  73  6f  63  6b  73  30  64
000020  sp   0   a  nl   s   e   e  sp   t   h   e  sp   b   i   r   d
        20  30  61  0a  73  65  65  20  74  68  65  20  62  69  72  64
000030   0   d  sp   0   a  nl   s   e   e  sp   t   h   e  sp   c   o
        30  64  20  30  61  0a  73  65  65  20  74  68  65  20  63  6f
000040   a   l   0   d  sp   0   a  nl   s   e   e  sp   D   o   n   a
        61  6c  30  64  20  30  61  0a  73  65  65  20  44  6f  6e  61
000050   l   d  sp   T   r   u   m   p   0   d  sp   0   a  nl
        6c  64  20  54  72  75  6d  70  30  64  20  30  61  0a
00005e
Both failures are really bad, the new file isn't in either DOS format nor UNIX format.

Anyway, even though this command works, it utilzes the secret code \\\r, and
sed 's/$'"/`echo \\\r`/" A.TXT > output.txt is quite a messy statement, my analysis below:
  1. The first "'" (strong quote). What is it doing here?? (I understand this)
  2. The first "/", natually following the sed s/str1/str2 syntax this introduces "str1", or the string we want to replace. (I understand this)
  3. Even though the first "$" clearly means the end of a line, so we know the object that we want to replace is the end of the line "$", but just look at the sloppy mess after this! It doesn't even have a second forward slash"/" (I understand this)
  4. I can understand the `echo ...` syntax, but why the weak quotes("")?
  5. Why the ending "/" (forward slash)?

Last edited by andrew.comly; 07-07-2017 at 11:41 AM. Reason: expand
 
Old 07-07-2017, 03:12 PM   #13
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,882
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
I see this all jumping around left, right, and various directions.

What do I mean by that? The original misperception of how the value is represented, Hex, Octal, and the fact that something showing in printable ASCII is not the actual values of the data.

Then I see tr, and sed. But also see that you seem to be still attempting this.

My recommendations are to take a step back, stop worrying about a consensus.

First is that you have figured out an affirmative method to verify the contents of a file, using octal dump. Great. Please keep doing that.

Next, you fully understand the various conventions, and also understand the agreed upon definitions of the control characters, as well as understand the format requirements for Unix/Linux versus DOS. Nothing ever changes there either, therefore no need to second guess.

And you are trying various conversion techniques.

One of which you found that you put in the actual strings of CR and LF and found that you got the strings for "CR" and "LF" versus modifications to the correct ASCII control codes you expected. However, please realize that you can verify that, did verify that, and further did learn that the shell or the tools you've used accept different conventions for octal, hex, or decimal values for these.

That's all there really is here by the way. Just understanding how the "language" uses and interprets the non-printable control characters.

And this is something I run up against all the time. I do embedded programming, 8-bit processors. The UARTS the SPI, the I2C interfaces all act certain ways, when I receive an actual ASCII stream, but for a bit-level protocol, I have to "compress", because what happens is if the data really is supposed to be 0x01, meaning 0000 0001 in binary? THAT is not printable and many times a serial protocol will then represent that in ASCII, and it will be "01", which in computer storage will be 0x30 0x31, or 0011 0000 0011 0001.

So therefore I have to understand how the data is coming, how I need to send any data out, and be consistent.

Same problem here. Figure out the requirements for tr, the requirements for sed, and how BASH treats this data. Because in the script part, you may have to test using a certain set of syntax/guidelines and when in give those arguments to the tools, you may have to use a different set of syntax or guidelines.

Perhaps you know of Makefiles? For instance a BASH script does not like actual TAB characters, they seem to cause a lot of problems. Meanwhile the syntax of a Makefile requires TAB characters, and if you choose to use 8-white spaces, it really doesn't work. A typical thing which sometimes happens is when copying or editing a file on a different system and passing it back, the tabs get converted to spaces, and then suddenly the make will not work.

Just some thoughts on how I approach something like this. I convince myself that it is not rocket science, for starters. I next confirm my test validation methods. And piecemeal I verify each portion of it all. I.e. I determine and confirm the arguments I need to give to sed or tr and then never change them until I can test alternatives safely. I get the whole solution working, and then vary one little item at a time.
 
3 members found this post helpful.
Old 07-07-2017, 03:35 PM   #14
wpeckham
LQ Guru
 
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, VSIDO, tinycore, Q4OS,Manjaro
Posts: 5,622

Rep: Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695
Agree with last post, this is getting to be a mess.

May I suggest that, rather than try to reinvent the horse, someone mention unix2dos and dos2unix the paired utilities for doing the text conversion mentioned CORRECTLY.

If you MUST reinvent the horse, at least use the correct tools to generate and control group to make sure you have not accidentally invented the goat.

This is pretty easy, but confusion is being made much harder than it should be.
 
Old 07-07-2017, 03:49 PM   #15
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 21 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925
Quote:
Originally Posted by andrew.comly View Post
But the sourceforge site mentioned no explination of why the sed command worked. What is the secret code "\\\r"? It's not in the sed manual file. Can we replace this with something non-secret in the ascii table?
An explanation of sorts can be found here:

https://unix.stackexchange.com/quest...his-regex-mean
 
1 members found this post helpful.
  


Reply

Tags
binary, text processing, unix



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Shell Scripting Part I: Getting started with bash scripting LXer Syndicated Linux News 0 04-29-2015 08:03 AM
[To share Bash knowledge]Notes for Advanced Bash-Scripting Version 10 (Latest) jcky Programming 4 07-31-2014 09:24 AM
LXer: Shell scripting for system administrators: advanced techniques LXer Syndicated Linux News 0 12-14-2010 05:40 PM
Advanced PSFTP scripting Singing Banzo Linux - Software 2 07-16-2007 02:57 PM
Advanced scripting help Shinobi Linux - General 2 03-25-2005 11:43 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:03 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration