LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   detect shell script language (https://www.linuxquestions.org/questions/programming-9/detect-shell-script-language-479343/)

lorebett 09-01-2006 02:42 AM

detect shell script language
 
Hi

I need to detect the actual programming language of a script.

A way of detecting it is to examine the first line searching for the "sha-bang" (#!), e.g.,

#!/bin/bash

or

#!/usr/bin/perl

However, there are cases where this is not enough, since the script, although it has #!/bin/sh is actually written (and interpreted) in another language, e.g., Tcl.

So my question is, is there another way of detecting the actual language? I mean, another convention?

Another guy told me that a possible way is to use, on the second line a pattern like

-*- <interpreter> -*-

and I found some tcl scripts that look like this.

Is there a specific standard convention?

many thanks in advance

druuna 09-01-2006 05:11 AM

Hi,

Did you take a look at the file command?

Example:

cat a01 a02 a03
#!/bin/bash
#!/usr/bin/perl
#!/bin/csh

file a01 a02 a03
a01: Bourne-Again shell script text executable
a02: perl script text executable
a03: C shell script text executable

I don't know if this solves your tcl/interpreter problem, but you can give it a try.

Hope this helps.

jlliagre 09-01-2006 06:13 AM

Quote:

Originally Posted by lorebett
I need to detect the actual programming language of a script.
A way of detecting it is to examine the first line searching for the "sha-bang" (#!), e.g.,

#!/bin/bash

or

#!/usr/bin/perl

This is a good hint.

Quote:


However, there are cases where this is not enough, since the script, although it has #!/bin/sh is actually written (and interpreted) in another language, e.g., Tcl.
As you noticed, a single script can have parts written different languages, so your question hasn't a single answer.

Also, a script can be written such a way it is compatible with more than one language.
Quote:

So my question is, is there another way of detecting the actual language? I mean, another convention?

Another guy told me that a possible way is to use, on the second line a pattern like

-*- <interpreter> -*-

and I found some tcl scripts that look like this.

Is there a specific standard convention?
The is no real standards, beyond the shebang and file extensions.

lorebett 09-01-2006 09:45 AM

Quote:

Originally Posted by jlliagre
This is a good hint.
The is no real standards, beyond the shebang and file extensions.

thanks for your information! :)

so I can only use some heuristics (like the -*-), that are not ensured to work everytime...

I need these hueristics to determine the source language to use the correct highlighting in the software I maintain, GNU Source-highlight http://www.gnu.org/software/src-highlite/

I noticed that emacs somehow detects pretty correctly the language (without extension and with a "wrong" shabang, so I'll to check what it's doing.

Lorenzo

makyo 09-01-2006 11:36 AM

Hi.

This is a question that I have considered from time to time.

If there could be a single syntax for calling, say a standard script, and getting the result, one could look at processes, and try to get (at least one parent) that was the process responsible for the current (part of a) script.

I have not had much luck. To depend on the sh-bang, internal comments, etc., is unreliable, but probably better than nothing ... cheers, makyo

lorebett 09-01-2006 11:51 AM

Quote:

Originally Posted by lorebett
Another guy told me that a possible way is to use, on the second line a pattern like

-*- <interpreter> -*-

by the way, this is the Emacs convention:

http://www.phys.ufl.edu/docs/emacs/emacs_201.html

lorebett 09-01-2006 11:54 AM

Quote:

Originally Posted by druuna
Hi,

Did you take a look at the file command?

Example:

cat a01 a02 a03
#!/bin/bash
#!/usr/bin/perl
#!/bin/csh

file a01 a02 a03
a01: Bourne-Again shell script text executable
a02: perl script text executable
a03: C shell script text executable

I don't know if this solves your tcl/interpreter problem, but you can give it a try.

Hope this helps.

Unfortunately, this doesn't work, for instance, for a tcl script starting like this:

Code:

#!/bin/sh
# Tcl ignores the next line -*- tcl -*- \
exec wish "$0" -- "$@"

which is reported as shell script, instead of a tcl script

makyo 09-01-2006 12:29 PM

Hi.

My recollection is that statements like:
Code:

exec wish "$0" -- "$@"
was used to get around problems in older versions of shells.

You can try (adjusting the path for your system):
Code:

If you create a Tcl script in a file whose first line is
              #!/usr/local/bin/tclsh
      then you can invoke the script file directly from  your  shell  if  you
      mark  the  file  as  executable. -- man tclsh

but in looking through my scripts, it appears that I have written at most 2 such scripts.

If that can be done, then file will be able to report something useful ... cheers, makyo

lorebett 09-02-2006 07:55 AM

actually, as I said above, I need this recognition for a program that highlights the syntax of other programs, which I don't write myself...

makyo 09-02-2006 10:36 AM

Hi, Lorenzo.
Quote:

Originally Posted by lorebett
actually, as I said above, I need this recognition for a program that highlights the syntax of other programs, which I don't write myself...

Ah, yes, thanks for pointing that out -- I skipped over that part.
Quote:

Originally Posted by lorebett
I need these hueristics to determine the source language to use the correct highlighting in the software I maintain, GNU Source-highlight http://www.gnu.org/software/src-highlite/

I noticed that emacs somehow detects pretty correctly the language (without extension and with a "wrong" shabang, so I'll to check what it's doing.

I tend to use vi, so I have not seen that ability of emacs, but that is useful information.

I looked at the src-highlight web page -- s.h. seems to know a lot about a lot of languages.

The very flexibility of scripts being semi-structured is the cause of us not being able to easily tell the language, especially when you can have the shell call tclsh, awk, perl, etc. I don't see any way other than looking for key strings, pieces of syntax, etc., that are peculiar to one language or another. You could allow the caller to specify the language if you cannot guess it correctly.

I like that kind of work, and I have had a bit of exposure to parsing -- long ago I wrote a SNOBOL program that converted PL/1 into Fortran. Lately, I used a parser that filtered language elements with the use of a truth table -- one looked at a token, checked the table, and the result would exclude a number of possibilities, then the table entry might link to another entry, etc. That was part of a restructurizer, a pretty-printer. I think the University of Colorado was known for work like that. Fordham has also produced some tools along that line. However, I don't know of any specific place these days.

Interesting problem, and I'll keep an eye on src-highlight to track the progress. Best wishes ... cheers, makyo

lorebett 09-02-2006 11:15 AM

Quote:

Originally Posted by makyo
I looked at the src-highlight web page -- s.h. seems to know a lot about a lot of languages.

thanks :)

however I'm adding more language there are still many others out there :D

Quote:

Originally Posted by makyo
You could allow the caller to specify the language if you cannot guess it correctly.

yes that's the standard behavior actually;

I've started to add "language inference" in the current development version, and it's pretty useful for highlighting, e.g., an entire directory, or when used in less http://www.gnu.org/software/src-high...ight-with-less

Quote:

Originally Posted by makyo
I like that kind of work, and I have had a bit of exposure to parsing -- long ago I wrote a SNOBOL program that converted PL/1 into Fortran. Lately, I used a parser that filtered language elements with the use of a truth table -- one looked at a token, checked the table, and the result would exclude a number of possibilities, then the table entry might link to another entry, etc. That was part of a restructurizer, a pretty-printer. I think the University of Colorado was known for work like that. Fordham has also produced some tools along that line. However, I don't know of any specific place these days.

Then I'll have to take a look at it! :cool:

Quote:

Originally Posted by makyo
Interesting problem, and I'll keep an eye on src-highlight to track the progress. Best wishes ... cheers, makyo

Thanks! and hope you're gonna use source-highlight someday! :)


All times are GMT -5. The time now is 03:18 AM.