How to check if files are binaries?
I am wondering if there is a way to check whether a number of files in a directory are binaries or not?
For example a folder with a mix of pictures, programs and music. |
something like this:
Quote:
|
Try the file command, e.g.:
file <filename> If its text, the output will include the word text like this: [prompt]$ file a a: ASCII text (in this case it was a bash shell script, but still text of course). Or for a "binary" file: [prompt]$ file a a: ELF 32-bit LSB executable, ARM, version 1 (SYSV), for GNU/Linux 2.6.16, dynamically linked (uses shared libs), not stripped Anyway, since it doesn't include the word "text" its "binary". |
Say they were like 200 files, how would you discard the ones that are binaries?
Don't media files quality as non text files too? |
AAMOI MrTux, what is the purpose of this exercise? That might make it easier.
BTW I checked a video and I got: Quote:
|
Rewording your requirement to "show only text files", try this...
Code:
file path/to/check/* | sed -n '/text/p' |
Code:
for f in /path/to/* |
Quote:
Remaining files should be fewer, and shall be quicker for a searching script to handle. You can also manually examine/compare among remainder files using: Code:
--$ file <filename> Code:
--$ strings <filename> Code:
--$./<filename> Code:
--$ cat <filename> Hope that helps. Good luck. m.m. |
You could also look into using the --mime-type / --mime-encoding options for file.
|
The purpose of it is to understand a bloody question I was asked on a test, which was to delete all files that are not binaries going through 3 folders and making sure everything else is still in place.
Also to search inside the files and check if they have text inside containing "password" or "CONFIDENTIAL" and delete the ones that have CONFIDENTIAL text in them. |
Well, if that is the question it is a little vague in that a file may not be binary and not be a text file. So if taken at face value, you can use the file command to test firstly if not a binary
and then secondly if it is a text file, search for the chosen words, otherwise delete the file. |
Well, technically all files are binary with text files being those that only contain ASCII or unicode characters i.e. what we consider printable (including CR/LF etc). So although the file command will print out text information as illustrated above for a video file it is still binary. I classify files as either binary or text.
I assume that part 2 of the question was to search the remaining binary files and delete those that contained the keywords password etc. |
Quote:
|
I am wondering if the intent of the question is to probe understanding of the 'grep' command.
If a text file is defined as containing only characters from the ASCII character set, then Code:
LC_ALL=C grep -I -l -d skip '[[:alnum:]]' ./* Using the '-a' option to grep allows searching for a text string in a binary file. Code:
rm $(grep -a -l -d skip -i 'CONFIDENTIAL' ./*) |
Would you consider a PDF file to be text or binary?
Especially one with embedded images? How about a text file, but written in a non-Latin encoding? |
All times are GMT -5. The time now is 10:47 PM. |