[SOLVED] ASCII clarification

Ronin-8 · 04-27-2010, 09:00 PM

Hello all,

I'm not sure if this is an appropiate question to be posting on this site. It's not a Linux specific question but since I use Linux I thought it would be okay.

I'm reading up on ASCII and was wondering if someone would be able to tell me if I have it right. I'm not sure if I'm 100% correct but this is what I've picked up so far:

A file is stored as a long list of bytes.

A byte has 256 combinations

A single byte represents any one of the 256 characters

Each character is a byte, so:

"A" 01000001 is a byte
"?" 00111111 is a byte
"3" 00000011 is a byte

A file name can be up to 256 characters, so that means up to 256 bytes.

If a file contained only one word like "Hello" then the size of that file would be 5 bytes.

If I'm right about what I've learned so far, I guess what I would like to know is where do Decimal, Hexidecimal and Octal numbers come into the picture?

Sorry if this is obvious but I'm just starting to learn both Linux and computers and would like to have a clear understanding of how this works.

Thank You.

David the H. · 04-27-2010, 11:07 PM

AIUI, the octal, decimal and hexadecimal entries are simply alternate ways to represent the byte sequences in more human-accessible forms. Depending on the programming environment, one base can be more convenient to use than another, otherwise they are all equivalent. The wikipedia entry on hexadecimal explains it like this:

Quote:

Each hexadecimal digit represents four binary digits (bits) (also called a "nibble"), and the primary use of hexadecimal notation is as a human-friendly representation of binary coded values in computing and digital electronics. For example, byte values can range from 0 to 255 (decimal) but may be more conveniently represented as two hexadecimal digits in the range 00 through FF. Hexadecimal is also commonly used to represent computer memory addresses.

It's fairly easy to convert byte sequences between binary, octal, and hexidecimal bases, which is why they're all commonly used in programming. But converting to and from decimal is a bit trickier, and it's mostly used when something needs to be human-readable.

pixellany · 04-27-2010, 11:31 PM

As you learn things, keep focussed on the groupings of definitions. For example, these are all names for number systems:
binary
octal
decimal
hexadecimal

None of these has anything to do with the definitions of
bit
nibble
byte

And neither group has anything to do with:
ascii
unicode
ebcdic
and other character encoding schemes

To take one cut thru this, let's first define a "byte" by its number of bits:
1000 in binary
10 in octal
8 in decimal or hex

but it's **meaning** may be different in ascii, unicode, or ebcdic

So there are at least 3 ways to define something:
What is it?
How is it measured?
What does it do?

MTK358 · 04-28-2010, 07:17 AM

binary, hex, etc. are just ways of representing numbers. Remember, it's still the same value, just represented in a different way.

bit, byte, etc. have to do with the way computers store numbers (computers use binary):
a "bit" is a binary digit, a 1 or 0.
a "byte" is an 8-bit binary number. A lot of the computer's design is byte-centric. The RAM is basically an array of byte-size storage cells. Your hard drive stored data in units of bytes. Your CPU's word size is a multipla of 8, to make it easier to process bytes.

A file on the hard drive is an array of bytes.

H_TeXMeX_H · 04-28-2010, 01:12 PM

Quote:

Originally Posted by Ronin-8

A file name can be up to 256 characters, so that means up to 256 bytes.

If a file contained only one word like "Hello" then the size of that file would be 5 bytes.

If I'm right about what I've learned so far, I guess what I would like to know is where do Decimal, Hexidecimal and Octal numbers come into the picture?

Sorry if this is obvious but I'm just starting to learn both Linux and computers and would like to have a clear understanding of how this works.

Thank You.

Yes, try:

Code:

bash-3.1$ printf Hello > te
bash-3.1$ stat -c %s te
5

5 bytes in size, if you add a newline it will be 6.

This table is useful in understanding ASCII:
http://www.cs.utk.edu/~pham/ascii.html

johnsfine · 04-28-2010, 01:59 PM

Quote:

Originally Posted by Ronin-8

A file is stored as a long list of bytes.

Basically yes, but there are some definitional quibbles possible.

Quote:

A byte has 256 combinations

Yes.

Quote:

A single byte represents any one of the 256 characters

Larger definitional quibbles on that one.

Quote:

Each character is a byte

In some representations of some character sets that is true (except that it still depends on what the meaning of "is" is).

Quote:

"A" 01000001 is a byte
"?" 00111111 is a byte

Yes.

Quote:

"3" 00000011 is a byte

No. Ascii '3' is not binary 3.

Quote:

A file name can be up to 256 characters, so that means up to 256 bytes.

Depends on the filesystem. I don't know the limit for common filesystems in Linux.

Quote:

If a file contained only one word like "Hello" then the size of that file would be 5 bytes.

The filesystem might keep track of 5 as the nominal size of the file, but the physical size of the file would be rounded up to some allocation unit.

MTK358 · 04-28-2010, 03:12 PM

Quote:

Originally Posted by johnsfine

No. Ascii '3' is not binary 3.

Yes, good point.

Remember, the number 3 and the ASCII character "3" are completely different.

Ronin-8 · 04-28-2010, 05:11 PM

Hey everyone, thank you for all your replies.

Okay I can now see how using hex is easier than dealing directly with binary. So in what situation would you be writing or reading hex? Are there specific files that have to be written in hex?

Whoops-ASCII '3' in binary is '00110011'

MTK358 · 04-28-2010, 05:13 PM

Remember, files are not "stored in hex" or "stored in binary". Do you understand that?

Ronin-8 · 04-28-2010, 05:53 PM

No I'm sorry I don't understand. I thought that hardware can only understand binary or "sequences of on's and off's", and since files are stored in hardware it would have to be in binary?

Do you mean that files are not "stored in binary" in the way that binary is just a human interpretation of the on and off sequences.

jiml8 · 04-28-2010, 06:14 PM

Try man ascii.

And files ARE stored in binary.

pixellany · 04-28-2010, 07:06 PM

files are not stored in hex--or binary (But-see the discussion to follow)--or decimal---or octal. Those are all number systems.

Digital storage is in bits. We have seen some definitions here, and you can look it up also. A "bit" is a way of describing an element which can have two states. To be sure, the word "binary" is sometimes used in reference to this 2-state paradigm. Personally, I think it is better to make the distinction between:
Analog data: stored or transmitted as a continuum of voltage or current states
Digital data: stored or transmitted as a series of bits (or bytes, where 1 byte = 8 bits)

Isn't semantics fun?.......

MTK358 · 04-28-2010, 07:23 PM

Technically files are stored in binary, but that is intrinsic to the file system. It is transparent to the user. It's impossible to have "a file in hex" or "a file in decimal".

Imagine a file as an array of numbers, each of which can be an integer from 0 to 255 (inclusive).

Ronin-8 · 04-28-2010, 07:24 PM

Lol, yes semantics is very fun!

Okay so then files are stored in bits. And number systems such as binary, hex, oct and decimal are ways to represent the state of these bits.

MTK358 · 04-28-2010, 07:26 PM

Quote:

Originally Posted by Ronin-8

Okay so then files are stored in bits. And number systems such as binary, hex, oct and decimal are ways to represent the state of these bits.

Exactly.

I recommend you try programming, esp. working with files, that will clear up many things.

(but I still can't imagine the amount of confusion it takes to think that files can be stored in "different number systems"...)