how Command interpreter understand the meaning of command
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
The shell will first parse the string that you entered. In your example it will say
"hey, user wants to run the command cat"
"next it asks itself if it knows '>'"; yes, user wants to redirect results to a file
"wait, what does redirection require? Oh, a file so f_1 must be the file where the user wants to store the result"
So at the end it will have a command, a redirect instruction and the file where to write the output. It will also keep track of arguments that you specify; arguments are basically pieces of text on the command line that the interpreter itself does not know about (like -A, a filename etc)
Next the shell first looks if it knows the requested command internally in the shell.
If it finds it, it executes the internal function and it passes the arguments (-A, filename etc)
If it does not find it, it checks the directories in the PATH variable for a file (program) called 'cat'. If it finds it, it will call that file with the arguments that you passed (-A, filename etc).
If it can't find the file, the shell will inform you with the error message.
The executed program will process the arguments (-A, filename etc) and do whatever it is supposed to do. The data (output) is send back to the shell. If no redirection was requested, the shell will dump the result on the screen (stdout), else (in your case) it will redirect it to the file.
Last edited by Wim Sturkenboom; 08-20-2012 at 01:47 AM.
It sounds like you are interested in the underlying mechanism for parsing the commandline string. One method that may be used in modern shells is based on tools such as lex and yacc (or in GNU/Linux, flex and bison). These are tools that are use to create parsers. Using these tools, parsers can be created that do their work in two parts.
One part is a called a lexical analyzer. The lexical analyzer simply breaks up the input stream (the commandline, in the case of a shell) into tokens. The tokens have 'types' and values, but do not have semantic properties. For instance, a token may be a 'word' consisting of those characters that would be valid as a program name. Or a token may be numeric, either integer format, some real number format, perhaps even imaginary. A token may be an 'operator', such as arithmetic operators, assignments, etc. Tokens are parsed by a lexical analyzer without any contextual relationships. In other words, an operator is just an operator, regardless of the position in the input stream. Same for any other type of token. The lexical analyzer only identifies the type and value of the tokens.
The other part of a parser is the grammar. The grammar is a rigorous specification that defines how sequences of tokens can be interpreted. The grammar specifies one or more general expressions, made up of sequences of tokens of specified types. Also associated with an expression is a potential action to be performed when such an expression is recognized in the input stream. THe grammar may use the values of some of the tokens in order to perform the action.
In a shell commandline interpreter, an example of an expression might be a simple command to be executed. The grammar would define this as simply a token in isolation, containing only those characters that can be used to compose a filename. The shell, having recognized such an expression, would take the value of the token (a string containing the token itself), and use it as an argument to an exec() function. Another example of an expression in the grammar of a shell commandline interpreter would be a comment. The grammar would describe this as something like 'zero or more whitespace characters, followed by the '#' character, followed by zero or more of any characters.' The action to perform upon recognizing such an expression would be to do nothing at all.
Yet another example of an expression would be an assignment: a token of type 'variable', followed by the '=' character, followed by some other expression. The action would be to evaluate the expression (remembering the value), create an instance of the variable (the grammar doesn't specify how this is done), and associate the instance of the variable with the value of the expression already evaluated.
The grammar used by a shell commandline interpreter would of course be fairly complex. You can use the tools flex and bison to create parsers for applications of your own, and for some of us, it is a subject of some fascination. There are also other types of parsers that use other techniques, and other tools that generate other types of parsers. I have no knowledge of whether modern and common shells do use flex & bison to generate their parsers, but I strongly suspect that some formal parser generators are used in at least some of them.
The exact parsing order is still a bit unclear to me in a few details, particularly the first few steps, but I'm pretty sure it goes something like this:
1) The line is first broken up into words/tokens, separated by whitespace. Quotation marks and escape backslashes will be processed to determine the exact tokens, and removed from the line.
2) The line is parsed to look for command list separators and multi-line constructs. If necessary, additional lines will be grabbed, and/or it will be broken up into multiple commands for separate processing.
3) The first word of each recognized simple command is inspected, and checked for matching aliases. If one is found, it substitutes it, and the resulting command line is re-processed (minus recursive alias checks). Somewhere in here any environment parameters that are being passed to the command will also be processed and saved for the execution step.
4) The command is scanned for redirection patterns. The necessary file descriptors will be set up, and those tokens removed from the line.
5) Brace and tilde expansions are performed on any tokens that were not initially protected by quotes. This may result in new tokens being created.
6) Each token is processed in left-to-right order, with variable substitutions, command substitutions, process substitutions, and arithmetic expansions completed and substituted, unless they were protected by single quotes/escapes.
7) Word splitting is also performed on any expanded tokens that were not initially protected by double-quotes. Unlike the original tokenizing in the first step, this word splitting is done based on the setting of the IFS variable, which may not be whitespace.
8) Pathname (globbing) expansions are done, again on unquoted patterns, which also may result in new tokens being created. Note that since it's done after the word-splitting step, spaces in the filenames to not result in new tokens being created.
9) The first token becomes the command name (function/executable/keyword/whatever), and its location determined (i.e. the PATH is checked), so that the final command can be assembled and error-checked accordingly.
10) The final command is assembled and passed to the system for execution.