Aren't text-based interfaces inherently inefficient? (like procfs)

hydraMax · 12-14-2011, 04:55 AM

A certain aspect of the Linux system paradigm doesn't quite make sense to me: In the procfs and sys directories, there are many files that provide various statistics on the system, in text format; e.g., /proc/meminfo will tell you all about your system memory, and /proc/cpuinfo will tell you all about your CPUs. And the expectation is that programs are supposed to parse these files to get said statistics. (At least from the articles I have read.) Indeed, I am aware of a number of programs that depend heavily on procfs.

Nevertheless, is not a text-based interface an inherently inefficient way to receive information from the system? For example, let's say I was developing a program that needed to know the amount of free memory: Not only would the program have to go to the trouble of skipping over all the ASCII for "MemFree:" (and a bunch of spaces) but then it would have to read the number in ASCII. Now, if the amount was 319956 (as it happens to be in my system) it would need to process a total of 6 bytes (not including a necessary delimiter) to understand a number that, in binary, could be represented in 3. (The problem being more marked when dealing with larger numbers like 34359385596; though perhaps, depending on the design, we might use a number of bits more convenient for our system architecture.) Furthermore, the application is likely going to need to convert the ASCII number to binary, for the purposes of more efficient calculation or storage. (This is the greatest evil.)

Now, one could answer that the text-based representation makes development easier for the programmer. But the reply would be that nobody really cares about the programmer, once the compiling is over, because the performance is what everyone has to continue living with. Furthermore, programmers can always provide for themselves development libraries which hide the complexities of dealing with a binary-based interface.

Cedrik · 12-14-2011, 05:30 AM

The beauty of using ascii chars to present system infos is that you can use simple text tools like cat, grep, cut, awk to display them, in any format you want. Why would you need hyperspeed performance to read system infos anyway ?
Binary infos format means specific program to read them, so more program to install.
I don't agree that the application is necessarly going to convert ascii infos to binary for displaying infos purpose, in most of case it is ascii in > ascii out

H_TeXMeX_H · 12-14-2011, 06:22 AM

I can see that you are using Gentoo...

Well, all I can say is that:
1) It is more important that the info be available in a readable format than for it to be available in a format that is most efficient for programmers.
2) Using procfs is not the only way to get this info, as least some of it.

syg00 · 12-14-2011, 06:31 AM

Huh ???
You think it is "more efficient" to have to write code to reformat data so I can work out how much memory I have - or process ids, or ... ?.

Feel free to write a kernel module to acquire the data in any form you want.
BTW, do you happen to use X - and still have the audacity to raise a query about efficiency ?.

Balderdash.

dugan · 12-14-2011, 01:35 PM

Storing them in text format makes them human-readable as well as machine-readable. That's an advantage.

Sed_Awk · 12-15-2011, 08:56 AM

Another reason for text interfaces is some people don't install X servers/clients on their servers. They login via ssh and use text tools to get system info.

Cedrik · 12-15-2011, 09:30 AM

Also performance wise, read chars directly is less processor cycles than read binary numbers and convert them to chars before send them to stdout. I mean if any byte counts, let's take in account processor cycles

Sed_Awk · 12-15-2011, 09:39 AM

Quote:

Originally Posted by Cedrik

I mean if any byte counts, let's take in account processor cycles

Yeah, why waste precious computer cycles drawing GUI objects

DavidMcCann · 12-15-2011, 12:07 PM

This sums up what happens if you rely on unreadable code:

http://www.linuxquestions.org/questi...ml#post4550801

And think of all the problems people have with the Windows registry.

hydraMax · 12-16-2011, 12:32 AM

Many, I dare say, have missed the point of my original post. I never wrote that it was bad to be able to get a text-based representation of system information. What I wrote was that it is inherently inefficient for programs to get system information via a text-based representation of it. Yet, this is the general expectation and ideal held out for us. I could not pull the references off of the top of my head, but I have read several articles in which programmers were encouraged to get system information from the proc and sys files. It should also be noted that many proc and sys files, despite being text-based, are in fact not at all formatted for easy human reading. For an example, run cat /proc/1/stat. One article I read encouraged users not to read proc files directly, but rather to always view the information through an intermediate program (ps for example).

Rebuking me for the "inefficiency" of using X11 is ridiculous. Cycles spent drawing GUI objects are not wasted because they provide me with a direct service. My point was that having our programs interact with the system through an ASCII interface wastes cycles to no benefit; except perhaps for the programmer; though as I mentioned the programmer's comfort is no longer a factor once the program has been compiled.

Drawing a parallel to text configuration files is also ridiculous. Text files exist solely for the providing a human interface to the configuration of a program or system. Furthermore, configuration files need (generally speaking) to be read only once by the program upon execution, or, as in the example cases of certain postfix configuration files, translated into a binary database or hash format before being first utilized.

Cedriks point seem to be more relevant to the discussion, but I believe it can be answered: First of all, we are not taking into account that fact that, in order to provide the text-interface, the system itself must first translate the system data from binary format to ASCII format. Unless, of course, the system is storing and processing all those numbers, UIDs, and so forth, in ASCII format, which I truly hope is not the case! And to answer the principle objection, we should recognize that, although there are some programs that simply take the system data and output it raw to stdout, this cannot be assumed to be the case. (One could give many counter-examples.) The data should be provided in its raw form; it should be up to the application layer to decide how it wishes the data to be displayed (if at all) and to format it for its own purposes.

And finally, I will respond to the question of "why would you need hyperspeed performance to read system infos?" by stating that hyper-speed performance is an inherent good, and that, being as system information is quite often read, this would seem like a very sensible place for optimization.

Cedrik · 12-16-2011, 04:26 AM

The "System" (the Kernel) convert binary into ascii char once to make /proc infos for the most case, while text utilities that access them can be used more often.

Standard output uses ascii chars, the infos is usually displayed in stdout, so in order to display infos, you need ascii chars... Efficiently wise, you cannot make better than read ascii chars and output ascii chars

You can always get these infos in raw form if you want, eg syscall for getting uid will output binary number

Code:

mov     eax,24; syscall 24 = get uid
int     80h
;uid in 32 bits binary number in eax)

But to display this binary number, you have to write a function that convert 32bits binary number into a few 8bits chars = more CPU cycles = slower than read 8bits char, display 8bits chars

[edit]
Sorry uid example is actually irrelevant, there is no uid infos in /proc (ok yes, if you cat /proc/self/status...)
But the theory stays, and the more infos you have to read, the more binary/chars conversions you have to make if they are provided in binary...
To resume, access infos in binary may accelerate reading, but slows down displaying, makes CPU to work more, needs specific program to access them etc..

jlinkels · 12-16-2011, 04:34 AM

Yes, it is inefficient. But the inefficiency is by far outweighted by the advantage that this is readable text, accessible by anything.

Text based configuration and interfaces are Unix philosophy and later adopted by Linux to assure openness.

Processing the /proc information might be 10 times as inefficient as binary, but since you use this information so seldomly (maybe a few times per second) the cost is 10 times nothing.

jlinkels

sundialsvcs · 12-16-2011, 07:15 AM

The "inefficiency" is irrelevant. Computers routinely execute hundreds of millions of instructions per second now, sometimes "per CPU," and what really matters most is that such information is exceptionally easy to get to by any sort of procedure that you care to use. Including the cat command. (Meow.)

dugan · 12-16-2011, 11:49 AM

Quote:

Originally Posted by jlinkels

Processing the /proc information might be 10 times as inefficient as binary, but since you use this information so seldomly (maybe a few times per second) the cost is 10 times nothing.

That is correct. There would be absolutely no benefit to this optimization. At all. Whereas the cost would be to make this information inaccessible to Unix's standard toolchain and therefore be massive.

Quote:

being as system information is quite often read

Is it? It looks to me as if the exact opposite is true.

Now, have you ever seen a source code profiler detect a delay in the part of a computer program that reads this information? No? That's what I thought. That means that if you think that this very costly optimization would bring you "hyper-speed performance", or any performance increase at all, you're lying to yourself.

hydraMax · 12-16-2011, 04:15 PM

Quote:

Originally Posted by Cedrik

The "System" (the Kernel) convert binary into ascii char once to make /proc infos for the most case, while text utilities that access them can be used more often.

So, the kernel actually stores all this data in ASCII format in volatile memory, and then simply copies it out on request? Or does it make a binary to ASCII conversion whenever a call is made to read data from a proc file? You seem to be affirming the former, but as it is an important technically point I was hoping for clarification.