LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   How to check if files are binaries? (https://www.linuxquestions.org/questions/linux-newbie-8/how-to-check-if-files-are-binaries-4175571756/)

MrTux 02-08-2016 09:20 PM

How to check if files are binaries?
 
I am wondering if there is a way to check whether a number of files in a directory are binaries or not?

For example a folder with a mix of pictures, programs and music.

timl 02-08-2016 09:27 PM

something like this:

Quote:

[tim@dragon ~]$ file /usr/bin/gcc
/usr/bin/gcc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=a29efcfd6b3e61e694a2f96057c349727be25cc0, stripped

fanfoot 02-08-2016 09:37 PM

Try the file command, e.g.:

file <filename>

If its text, the output will include the word text like this:

[prompt]$ file a
a: ASCII text

(in this case it was a bash shell script, but still text of course). Or for a "binary" file:

[prompt]$ file a
a: ELF 32-bit LSB executable, ARM, version 1 (SYSV), for GNU/Linux 2.6.16, dynamically linked (uses shared libs), not stripped

Anyway, since it doesn't include the word "text" its "binary".

MrTux 02-08-2016 09:43 PM

Say they were like 200 files, how would you discard the ones that are binaries?

Don't media files quality as non text files too?

timl 02-08-2016 11:42 PM

AAMOI MrTux, what is the purpose of this exercise? That might make it easier.

BTW I checked a video and I got:

Quote:

RIFF (little-endian) data, AVI, 640 x 272, 25.00 fps, video: XviD, audio: MPEG-1 Layer 3 (stereo, 48000 Hz)
So neither text nor binary

astrogeek 02-09-2016 12:53 AM

Rewording your requirement to "show only text files", try this...

Code:

file path/to/check/* | sed -n '/text/p'
But as a previous poster said, a better description of what you actually want to do would help.

MadeInGermany 02-09-2016 08:28 AM

Code:

for f in /path/to/*
do
  perl -e 'exit (-T $ARGV[0])' "$f"
  if [ $? -eq 1 ]
  then
    echo "$f is a text file"
  fi
done


malekmustaq 02-09-2016 10:14 AM

Quote:

I am wondering if there is a way to check whether a number of files in a directory are binaries or not?
If you truly mean "a number" you alone can tell the (count) number after segregating all the files in a folder. Your task requires two operations: Identify and Segregate, then Count. If I were facing same problems I would employ isolation of files from what are easily known. First I use an Image Browser to collect all thumb-nailed images, video and audio files, and put them into a separate folder. Next, collect all archive files like *tgz, *zip, *gz, *odt, *odf, etc. and put them to another folder.

Remaining files should be fewer, and shall be quicker for a searching script to handle.

You can also manually examine/compare among remainder files using:
Code:

--$ file <filename>
or
Code:

--$ strings <filename>
normally a binary contains declaration of runtime library.so and also contains echo strings. This way you can tell if it is a binary. There are though other files you can hardly tell that way but there is another method: run it or cat it
Code:

--$./<filename>
or
Code:

--$ cat <filename>
Finally, there are many fora where bash-experts gather, go and ask for a helpful script to do that.

Hope that helps. Good luck.

m.m.

grail 02-09-2016 10:40 AM

You could also look into using the --mime-type / --mime-encoding options for file.

MrTux 02-09-2016 01:20 PM

The purpose of it is to understand a bloody question I was asked on a test, which was to delete all files that are not binaries going through 3 folders and making sure everything else is still in place.

Also to search inside the files and check if they have text inside containing "password" or "CONFIDENTIAL" and delete the ones that have CONFIDENTIAL text in them.

grail 02-09-2016 02:12 PM

Well, if that is the question it is a little vague in that a file may not be binary and not be a text file. So if taken at face value, you can use the file command to test firstly if not a binary
and then secondly if it is a text file, search for the chosen words, otherwise delete the file.

michaelk 02-09-2016 03:08 PM

Well, technically all files are binary with text files being those that only contain ASCII or unicode characters i.e. what we consider printable (including CR/LF etc). So although the file command will print out text information as illustrated above for a video file it is still binary. I classify files as either binary or text.

I assume that part 2 of the question was to search the remaining binary files and delete those that contained the keywords password etc.

jpollard 02-09-2016 05:43 PM

Quote:

Originally Posted by timl (Post 5497395)
AAMOI MrTux, what is the purpose of this exercise? That might make it easier.

BTW I checked a video and I got:



So neither text nor binary

No - those are binary files.

allend 02-09-2016 07:37 PM

I am wondering if the intent of the question is to probe understanding of the 'grep' command.
If a text file is defined as containing only characters from the ASCII character set, then
Code:

LC_ALL=C grep -I -l -d skip  '[[:alnum:]]' ./*
would list the text files in the current directory. (I have included LC_ALL=C as a safeguard).

Using the '-a' option to grep allows searching for a text string in a binary file.
Code:

rm $(grep -a -l -d skip -i 'CONFIDENTIAL' ./*)
would delete all files containing "CONFIDENTIAL" regardless of case.

JeremyBoden 02-09-2016 07:56 PM

Would you consider a PDF file to be text or binary?
Especially one with embedded images?

How about a text file, but written in a non-Latin encoding?


All times are GMT -5. The time now is 10:47 PM.