ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I need to detect the actual programming language of a script.
A way of detecting it is to examine the first line searching for the "sha-bang" (#!), e.g.,
#!/bin/bash
or
#!/usr/bin/perl
However, there are cases where this is not enough, since the script, although it has #!/bin/sh is actually written (and interpreted) in another language, e.g., Tcl.
So my question is, is there another way of detecting the actual language? I mean, another convention?
Another guy told me that a possible way is to use, on the second line a pattern like
Distribution: Solaris 11.4, Oracle Linux, Mint, Debian/WSL
Posts: 9,789
Rep:
Quote:
Originally Posted by lorebett
I need to detect the actual programming language of a script.
A way of detecting it is to examine the first line searching for the "sha-bang" (#!), e.g.,
#!/bin/bash
or
#!/usr/bin/perl
This is a good hint.
Quote:
However, there are cases where this is not enough, since the script, although it has #!/bin/sh is actually written (and interpreted) in another language, e.g., Tcl.
As you noticed, a single script can have parts written different languages, so your question hasn't a single answer.
Also, a script can be written such a way it is compatible with more than one language.
Quote:
So my question is, is there another way of detecting the actual language? I mean, another convention?
Another guy told me that a possible way is to use, on the second line a pattern like
-*- <interpreter> -*-
and I found some tcl scripts that look like this.
Is there a specific standard convention?
The is no real standards, beyond the shebang and file extensions.
This is a good hint.
The is no real standards, beyond the shebang and file extensions.
thanks for your information!
so I can only use some heuristics (like the -*-), that are not ensured to work everytime...
I need these hueristics to determine the source language to use the correct highlighting in the software I maintain, GNU Source-highlight http://www.gnu.org/software/src-highlite/
I noticed that emacs somehow detects pretty correctly the language (without extension and with a "wrong" shabang, so I'll to check what it's doing.
This is a question that I have considered from time to time.
If there could be a single syntax for calling, say a standard script, and getting the result, one could look at processes, and try to get (at least one parent) that was the process responsible for the current (part of a) script.
I have not had much luck. To depend on the sh-bang, internal comments, etc., is unreliable, but probably better than nothing ... cheers, makyo
was used to get around problems in older versions of shells.
You can try (adjusting the path for your system):
Code:
If you create a Tcl script in a file whose first line is
#!/usr/local/bin/tclsh
then you can invoke the script file directly from your shell if you
mark the file as executable. -- man tclsh
but in looking through my scripts, it appears that I have written at most 2 such scripts.
If that can be done, then file will be able to report something useful ... cheers, makyo
actually, as I said above, I need this recognition for a program that highlights the syntax of other programs, which I don't write myself...
Ah, yes, thanks for pointing that out -- I skipped over that part.
Quote:
Originally Posted by lorebett
I need these hueristics to determine the source language to use the correct highlighting in the software I maintain, GNU Source-highlight http://www.gnu.org/software/src-highlite/
I noticed that emacs somehow detects pretty correctly the language (without extension and with a "wrong" shabang, so I'll to check what it's doing.
I tend to use vi, so I have not seen that ability of emacs, but that is useful information.
I looked at the src-highlight web page -- s.h. seems to know a lot about a lot of languages.
The very flexibility of scripts being semi-structured is the cause of us not being able to easily tell the language, especially when you can have the shell call tclsh, awk, perl, etc. I don't see any way other than looking for key strings, pieces of syntax, etc., that are peculiar to one language or another. You could allow the caller to specify the language if you cannot guess it correctly.
I like that kind of work, and I have had a bit of exposure to parsing -- long ago I wrote a SNOBOL program that converted PL/1 into Fortran. Lately, I used a parser that filtered language elements with the use of a truth table -- one looked at a token, checked the table, and the result would exclude a number of possibilities, then the table entry might link to another entry, etc. That was part of a restructurizer, a pretty-printer. I think the University of Colorado was known for work like that. Fordham has also produced some tools along that line. However, I don't know of any specific place these days.
Interesting problem, and I'll keep an eye on src-highlight to track the progress. Best wishes ... cheers, makyo
I looked at the src-highlight web page -- s.h. seems to know a lot about a lot of languages.
thanks
however I'm adding more language there are still many others out there
Quote:
Originally Posted by makyo
You could allow the caller to specify the language if you cannot guess it correctly.
yes that's the standard behavior actually;
I've started to add "language inference" in the current development version, and it's pretty useful for highlighting, e.g., an entire directory, or when used in less http://www.gnu.org/software/src-high...ight-with-less
Quote:
Originally Posted by makyo
I like that kind of work, and I have had a bit of exposure to parsing -- long ago I wrote a SNOBOL program that converted PL/1 into Fortran. Lately, I used a parser that filtered language elements with the use of a truth table -- one looked at a token, checked the table, and the result would exclude a number of possibilities, then the table entry might link to another entry, etc. That was part of a restructurizer, a pretty-printer. I think the University of Colorado was known for work like that. Fordham has also produced some tools along that line. However, I don't know of any specific place these days.
Then I'll have to take a look at it!
Quote:
Originally Posted by makyo
Interesting problem, and I'll keep an eye on src-highlight to track the progress. Best wishes ... cheers, makyo
Thanks! and hope you're gonna use source-highlight someday!
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.