"file" command not working properly for UTF8 files
I have some files which are UTF8 and have first line as blank. It shows as data file by "file" command. If I remove the blank line, it shows it as UTF8 text file. I have to choose displayable text files from many files, so I used "file" command.
Do anyone have any idea how to do it with any other command? Note: I do not have permission to attach file. So not able to attach sample file here. |
Quote:
Quote:
You don't supply any details about the system (versions etc). I would assume this is on a Linux system? Quote:
|
I am working on a Linux system.Here is the full file hexdump:
0000000 470a 4f4c 4547 2052 202b 4942 4b52 2045 0000010 4d47 4842 5320 5359 4554 544d 4345 4e48 0000020 4b49 450a 5058 504f 5241 204b 3131 310a 0000030 3235 3633 4a20 4341 424f 4453 524f 0a46 0000040 6547 6d72 6e61 0a79 3032 3930 312d 2d30 0000050 3732 570a 6369 7468 6769 2065 6e49 6f66 0000060 6d72 7461 6f69 656e 206e c366 72bc 7520 0000070 736e 7265 2065 754b 646e 6e65 0a3a 530a 0000080 6369 6568 6872 6965 7374 6164 6574 626e 0000090 c36c 74a4 6574 2072 6567 c36d c3a4 209f 00000a0 6556 6f72 6472 756e 676e 2820 4745 2029 00000b0 724e 202e 3931 3730 322f 3030 2e36 0a0a 00000c0 6553 7268 6720 6565 7268 6574 4420 6d61 00000d0 6e65 7520 646e 4820 7265 6572 2c6e 0a0a 00000e0 6553 7268 6720 6565 7268 6574 2072 754b 00000f0 646e 2c65 7720 7269 6420 6e61 656b 206e 0000100 6849 656e 206e 6562 7473 6e65 2073 c366 0000110 72bc 4920 7268 2065 6542 7473 6c65 756c 0000120 676e 202e 6953 2065 7265 6168 746c 6e65 0000130 6620 bcc3 0a72 656a 6564 2073 7250 646f 0000140 6b75 2074 6965 206e 6953 6863 7265 6568 0000150 7469 6473 7461 6e65 6c62 7461 2c74 6420 0000160 7361 4920 7268 6e65 6e20 7461 6f69 616e 0000170 656c 206e 6547 6573 7a74 6e65 6520 746e 0000180 7073 6972 6863 2e74 0a0a 6144 2073 6953 0000190 6863 7265 6568 7469 6473 7461 6e65 6c62 00001a0 7461 2074 6e65 6874 a4c3 746c 7720 6369 00001b0 7468 6769 2065 6e49 6f66 6d72 7461 6f69 00001c0 656e 206e 757a 206d 6547 7573 646e 6568 00001d0 7469 7373 6863 7475 207a 6e75 0a64 757a 00001e0 2072 7250 646f 6b75 7374 6369 6568 6872 00001f0 6965 2e74 5720 7269 7220 7461 6e65 202c 0000200 6f76 2072 6945 736e 7461 207a 6564 2073 0000210 614d 6574 6972 6c61 2073 6164 2073 6144 0000220 6574 626e 616c 7474 6520 6e69 6567 6568 0000230 646e 7a0a 2075 656c 6573 206e 6e75 2064 0000240 7365 6120 206e 6c61 656c 5020 7265 6f73 0000250 656e 2c6e 6420 6569 6d20 7469 6420 6d65 0000260 5020 6f72 7564 746b 7520 676d 6865 6e65 0000270 7520 646e 6620 bcc3 2072 6573 6e69 6e65 0000280 450a 6e69 6173 7a74 7620 7265 6e61 7774 0000290 726f 6c74 6369 2068 6973 646e 202c 6577 00002a0 7469 7265 757a 656c 7469 6e65 0a2e 530a 00002b0 6c6f 746c 6e65 5320 6569 7520 736e 7265 00002c0 2065 7250 646f 6b75 6574 7720 6965 6574 00002d0 7672 7265 616b 6675 6e65 202c 6973 646e 00002e0 5320 6569 7620 7265 ef70 82ac 6369 7468 00002f0 7465 202c 6849 6572 206e 754b 646e 6e65 0000300 6520 6e69 0a65 6f4b 6970 2065 6964 7365 0000310 7365 5320 6369 6568 6872 6965 7374 6164 0000320 6574 626e 616c 7474 7365 7a20 2075 bcc3 0000330 6562 6c72 7361 6573 2e6e 0a0a 6942 7474 0000340 2065 6168 6562 206e 6953 2065 6556 7372 0000350 c374 6ea4 6e64 7369 202c 6164 7373 6420 0000360 6569 6573 2073 6353 7268 6965 6562 206e 0000370 616d 6373 6968 656e 6c6c 6520 7372 6574 0000380 6c6c 2074 7577 6472 2065 6e75 0a64 6f73 0000390 696d 2074 696e 6863 2074 6e75 6574 7372 00003a0 6863 6972 6265 6e65 6920 7473 0a2e 4d0a 00003b0 7469 6620 6572 6e75 6c64 6369 6568 206e 00003c0 7247 bcc3 9fc3 6e65 490a 7268 4820 7265 00003d0 7473 6c65 656c 2f72 694c 6665 7265 6e61 00003e0 0a74 0a0a 00003e4 |
By coincidence the first three characters in the file (0Ah 47h 4Ch = "<lf>GL") are the same as the header that was used on the Apple II binary files, which is why it is being identified as an Apple II binary file. If I remember rightly, Apple used a <cr> as their line separator in text files, so it would not have been ambiguous (whereas Unix-like systems use <lf>).
The brute force solution is to turn off checking for all the sequences in the magic file: Code:
file -esoft * Code:
cp /usr/share/file/magic magic Code:
0 string \x0aGL Binary II (apple ][) data Code:
file -mmagic * |
Quote:
|
Quote:
|
All times are GMT -5. The time now is 11:53 PM. |