Hi. I'm jon.404, a Unix/Linux/Database/Openstack/Kubernetes Administrator, AWS/GCP/Azure Engineer, mathematics enthusiast, and amateur philosopher. This is where I rant about that which upsets me, laugh about that which amuses me, and jabber about that which holds my interest most: *nix.

assert(technology != magic);

Posted 08-12-2011 at 12:37 PM by rocket357

Some years ago I happened upon an uncomfortable truth regarding source code. I happened to be programming in an interpreted language (the only language I knew when I was 8 years old), and I made the unfortunate realization that without the interpreter, my code was useless. Somehow that seemed disheartening to me, so I vowed I would learn everything there was to know about computers. I vowed that I would understand the machine.

I didn't really know where to turn (I didn't have any "computer geek" friends yet) and I didn't really have any resources, so opening executable files in a text editor was about the best I could do. As you can imagine, that injected a very strong sense of complexity into my thinking on the topic, and I eventually gave up in frustration. As I met more and more computer geeks over the years, I would ask the same questions again and again..."how do programs run?", "what do the individual bits in an executable file mean?", "how do compilers turn source code into machine code?" I routinely met with "just think of it as a black box, man...it's easier that way and you won't go mad trying to figure it out." Bah! Nonsense. I am many things, but "incapable of understanding" is not one of them!

Nothing, for the record, is more frustrating than being told "it's too complicated for you to understand. Just accept that it's magic".

I let the issue slide for years, and when I went to college (take two) I felt that enough time had been wasted...I would understand how code works...not just how to use a compiler or assembler...but how it *really* works internally. I got through college with ease, but felt that so much core knowledge was swept under the ever-constant "black box". I'd asked for an operating system design course to be taught (it was offered, but only taught when enough students collectively requested it), and set off to gather the four other students required to get the course going.

Out of the entire Computer Science Department at my university (which happened to rely heavily on Java and .NET), I found ONE other student who wanted to take the course. Without the remaining three seats filled, the class wouldn't get offered. I tried bribing other students into signing up for the course so they could drop it afterwards, no one was interested. I begged the Dean to teach the class anyways, to pursue a waiver of some sort, to do WHATEVER...just teach the class, and I met with a solidly resounding "NO". I got what I could from my education, but realized they only taught you how to wear a suit and tie and show up to write worthless business apps all day in your cubicle. They taught how to make a paycheck, nothing more (well, I got alot out of the "Data Structures" and "Algorithms" classes). Real understanding wouldn't come from some professor...I'd have to hunt it down on my own.

I decided immediately to learn C (something I'd been too busy learning Java and .NET to do), so I went through the Linux kernel source. Yikes. Not a good starting point for a newbie to C. I then tried to read GCC's code. Ugh...perhaps I *should* just look at it as a black box and forget it. Maybe it *would* be easier that way...

I started looking at the OpenBSD codebase a bit later, and realized that C code *can* be written cleanly and in a readable form. I started building a library of books that were of the highest caliber, and started studying relentlessly. I picked up a copy of the Dragon Book and read it from front to back...and really didn't get it. I read it again, still didn't really get it. I picked up a copy of "The C Programming Language" and "Advanced Programming in the Unix Environment", and it started to make sense a bit. I went back and re-read the Dragon Book, and parsing started to make sense. I studied assembly on a few different architectures (and acquired many books on the topic, including the Intel Software Developer's Manuals), and started to see the big picture. I understood now how C source got to assembly source. At that point, I started looking at assemblers.

How does an assembler, after all of the layers of conversion and analysis are done, convert assembly instructions into machine code? If you've ever studied Intel machine code, you know why this is a big deal...machine code instructions are encoded with the operands in the instruction. In other words, "addl $4, %ecx" is not encoded the same as "addl $8, %ecx". There is a method to the madness, but it is a complex, twisted route of business decisions and compatibility measures. I had always invisioned some exotically complex block of logic that outputs 1's and 0's in whatever seemingly random pattern was required to get the processor to do what the assembly called for...

So how does an assembler work? I studied nasm and gas to see...and discovered tragically long tables of instruction formats! There's no magic to it. An assembly instruction is mapped to a machine code hex string that gets written to the output file. All of the "business decisions" and "compatibility measures" are modelled in the instruction lookup table. It really is *that* simple. (all of my exceptionally knowledgable friends all went "umm, yeah" when I brought this realization to their attention).

So my next question, which an ancient Unix hacker that works in the QA department of my company so kindly answered for me, is "what is the meaning of the individual instruction formats...rather, what does this bit do in terms of turning circuits in the processor on and off?" The answer? "It's a black box. Intel isn't going to give that info up on the exact implementation." Considering this dude worked for Intel back before the world wide web came into being, I think I'll (finally) accept that as an authoritative answer.

Except that I bought a book on MIPS (some architectures are more open than others. yay). Still not 100% there, but once I learn Verilog I'll be set.

Attempting to stay ahead of the game, I googled "how do transistors work?" I got this lovely bit of knowledge:

Quote:

The purist might argue that current flows from emitter to collector - dependent on whether we are discussing electron flow or "hole" flow. I don't want to get involved in the physics of current flow. You don't need to know this to design a circuit.

Sigh. Off to amazon for more books, I suppose...

Posted in Uncategorized

Views 11247 Comments 5

« Prev Main Next »

Total Comments 5

Comments

Heh, I see you've been caught in the ever-descending spiral of lowering abstraction levels.

Apparently you need to be a physics major to understand how printf("Hello, World!\n"); works, LOL.

Actually, when I first read the bit about interpreted languages, I was almost expecting a rant about compiled vs. interpreted languages, software bloat/overhead, etc., but I guess I was wrong, heh.

I'm kind of the same way WRT learning about how stuff works on a software level, but I'm maybe not quite as zealous; I don't always go out looking for reading materials on absolutely every detail/component of the system (I don't expect to know everything about the Linux kernel after reading for 1 day

), and if I do go searching Wikipedia or what have you for low-level tech info (I remember briefly reading through the pages on "Table Lookaside Buffer"(?) and the various CPU caches), I usually either forget most of it, or I only take from it what I see as potentially helpful to what I'm doing. For instance, knowing how the CPU cache works (at least on a "higher" level) helps a lot with writing efficient, elegant C programs (or any other compiled language, for that matter

). To be honest though, I myself haven't done that much WRT programming lately; I've been kinda lonely/depressed over the past week or so, plus there have been other things come up which I won't mention in this post, but have (seemingly) affected my ability to stay focused/interested in computers/Linux/programming lately. (I think part of it is just a lack of genuine human social interaction, I practically "live" on the internet. :( )

TL;DR: My philosophy is basically along the lines of, "learn as much as you really think you'll need to, but if you find yourself forgetting what you've learned or that you're not really interested in it, you probably shouldn't bother.". In my case, probably the only reason I don't go looking up absolutely everything I can about how my system works on a low level is due to low self-esteem (i.e. I don't feel like I can "know everything" in that sense). Either that, or I've just been lazy, I dunno… :-\

Yeah, I'm feeling a little "off" today; it feels like I almost don't sound like myself in this comment…

EDIT: As far as "clean code", I've found Neverball's source to be easy on the eyes (and the head, heh). In particular, what makes it easy on the eyes for me is that it uses a format/style for C I like to call "BNL" (for "Brace on New Line"), where the opening brace for a block of code (e.g. a function definition, an if/else/(switch/case) structure, loops, etc.):

Code:

void blah(int param)
{
    …

…as opposed to:

Code:

void blah(int param) {
    …

What I find interesting/unfortunate (IMO) is that the latter style (that is, having the opening brace on the same line) seems to be more common among C/C++ programmers, at least in the code that I've read (the Linux kernel uses this style AFAICT).

BNL is easier for me to read because it makes it more visually obvious where code blocks/nests begin and end, whereas when you put the opening braces on the same lines as the declarations, you have to really look at the code's semantics more than its visual structure, I guess. :-\

Anyways, now I'm really done, LOL. Sorry for this monster of a comment; it's practically big enough to be its own entry…

Posted 08-12-2011 at 03:06 PM by MrCode MrCode is offline

Updated 08-12-2011 at 03:26 PM by MrCode

I *was* a physics/mathematics student before switching to CS. =D

The knowledge quest I find myself on...it's a personality flaw, I think. I've been OCD about figuring out how stuff works for the longest time, since I can't remember when. I briefly gave up on learning everything for a time, but it wasn't natural...it felt artificial to intentionally ignore the drive to learn just so I would have time to do other stuff. I dunno. I don't necessarily actively remember everything I read, but I do walk away with a thorough understanding of the *concept* behind the operation I'm studying. I might not be able to recite the exact number of registers an UltraSPARC IIi has, for instance, but I know that it has many more physical registers than "logical" registers, and it has a sliding register "window" that "masks off" physical registers so switching between processes is considerably faster than having to store/load each register before the switch. Stuff like that...that's what I aim for.

I dunno. It's weird, but that's just the way my brain works haha.

Posted 08-12-2011 at 03:22 PM by rocket357 rocket357 is offline

	I agree completely, MrCode. The most hideous of all is the second form you mentioned, and then the closing brace indented. Blarg! I just can't get into that style for some reason.
	Posted 08-12-2011 at 03:30 PM by rocket357

	I'm still learning much of this myself... I love assembly, especially RISC archs, like MIPS. Had no close contact with SPARC or ARM yet, but all of these look exciting. x86 just scares me. gl & hf with this hacking
	Posted 08-14-2011 at 10:28 AM by Web31337

Absolutely, Web31337...x86 is what I like to call "organically grown"...there was considerable planning and forward-looking design, I suppose, but backwards-compatibility really made the arch a mess. It works, but it's difficult to wrap your head around. MIPS is insanely simple and consistent compared to Intel, and even SPARC is easier to digest, IMHO.

Of course, it helps that MIPS and SPARC were designed from the ground up and didn't evolve from 8 bit forward, but that doesn't change the fact that x86 is a mess (and amd64 inherits the mess to a large degree).

Posted 08-14-2011 at 12:40 PM by rocket357 rocket357 is offline

Updated 08-14-2011 at 12:44 PM by rocket357