LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-01-2006, 02:42 AM   #1
lorebett
Member
 
Registered: May 2004
Location: Italy
Distribution: Ubuntu, Gentoo
Posts: 57

Rep: Reputation: 16
detect shell script language


Hi

I need to detect the actual programming language of a script.

A way of detecting it is to examine the first line searching for the "sha-bang" (#!), e.g.,

#!/bin/bash

or

#!/usr/bin/perl

However, there are cases where this is not enough, since the script, although it has #!/bin/sh is actually written (and interpreted) in another language, e.g., Tcl.

So my question is, is there another way of detecting the actual language? I mean, another convention?

Another guy told me that a possible way is to use, on the second line a pattern like

-*- <interpreter> -*-

and I found some tcl scripts that look like this.

Is there a specific standard convention?

many thanks in advance
 
Old 09-01-2006, 05:11 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Did you take a look at the file command?

Example:

cat a01 a02 a03
#!/bin/bash
#!/usr/bin/perl
#!/bin/csh

file a01 a02 a03
a01: Bourne-Again shell script text executable
a02: perl script text executable
a03: C shell script text executable

I don't know if this solves your tcl/interpreter problem, but you can give it a try.

Hope this helps.
 
Old 09-01-2006, 06:13 AM   #3
jlliagre
Moderator
 
Registered: Feb 2004
Location: Outside Paris
Distribution: Solaris 11.4, Oracle Linux, Mint, Debian/WSL
Posts: 9,789

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
Quote:
Originally Posted by lorebett
I need to detect the actual programming language of a script.
A way of detecting it is to examine the first line searching for the "sha-bang" (#!), e.g.,

#!/bin/bash

or

#!/usr/bin/perl
This is a good hint.

Quote:

However, there are cases where this is not enough, since the script, although it has #!/bin/sh is actually written (and interpreted) in another language, e.g., Tcl.
As you noticed, a single script can have parts written different languages, so your question hasn't a single answer.

Also, a script can be written such a way it is compatible with more than one language.
Quote:
So my question is, is there another way of detecting the actual language? I mean, another convention?

Another guy told me that a possible way is to use, on the second line a pattern like

-*- <interpreter> -*-

and I found some tcl scripts that look like this.

Is there a specific standard convention?
The is no real standards, beyond the shebang and file extensions.
 
Old 09-01-2006, 09:45 AM   #4
lorebett
Member
 
Registered: May 2004
Location: Italy
Distribution: Ubuntu, Gentoo
Posts: 57

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by jlliagre
This is a good hint.
The is no real standards, beyond the shebang and file extensions.
thanks for your information!

so I can only use some heuristics (like the -*-), that are not ensured to work everytime...

I need these hueristics to determine the source language to use the correct highlighting in the software I maintain, GNU Source-highlight http://www.gnu.org/software/src-highlite/

I noticed that emacs somehow detects pretty correctly the language (without extension and with a "wrong" shabang, so I'll to check what it's doing.

Lorenzo
 
Old 09-01-2006, 11:36 AM   #5
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 735

Rep: Reputation: 76
Hi.

This is a question that I have considered from time to time.

If there could be a single syntax for calling, say a standard script, and getting the result, one could look at processes, and try to get (at least one parent) that was the process responsible for the current (part of a) script.

I have not had much luck. To depend on the sh-bang, internal comments, etc., is unreliable, but probably better than nothing ... cheers, makyo
 
Old 09-01-2006, 11:51 AM   #6
lorebett
Member
 
Registered: May 2004
Location: Italy
Distribution: Ubuntu, Gentoo
Posts: 57

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by lorebett
Another guy told me that a possible way is to use, on the second line a pattern like

-*- <interpreter> -*-
by the way, this is the Emacs convention:

http://www.phys.ufl.edu/docs/emacs/emacs_201.html
 
Old 09-01-2006, 11:54 AM   #7
lorebett
Member
 
Registered: May 2004
Location: Italy
Distribution: Ubuntu, Gentoo
Posts: 57

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by druuna
Hi,

Did you take a look at the file command?

Example:

cat a01 a02 a03
#!/bin/bash
#!/usr/bin/perl
#!/bin/csh

file a01 a02 a03
a01: Bourne-Again shell script text executable
a02: perl script text executable
a03: C shell script text executable

I don't know if this solves your tcl/interpreter problem, but you can give it a try.

Hope this helps.
Unfortunately, this doesn't work, for instance, for a tcl script starting like this:

Code:
#!/bin/sh
# Tcl ignores the next line -*- tcl -*- \
exec wish "$0" -- "$@"
which is reported as shell script, instead of a tcl script
 
Old 09-01-2006, 12:29 PM   #8
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 735

Rep: Reputation: 76
Hi.

My recollection is that statements like:
Code:
exec wish "$0" -- "$@"
was used to get around problems in older versions of shells.

You can try (adjusting the path for your system):
Code:
If you create a Tcl script in a file whose first line is
              #!/usr/local/bin/tclsh
       then you can invoke the script file directly from  your  shell  if  you
       mark  the  file  as  executable. -- man tclsh
but in looking through my scripts, it appears that I have written at most 2 such scripts.

If that can be done, then file will be able to report something useful ... cheers, makyo
 
Old 09-02-2006, 07:55 AM   #9
lorebett
Member
 
Registered: May 2004
Location: Italy
Distribution: Ubuntu, Gentoo
Posts: 57

Original Poster
Rep: Reputation: 16
actually, as I said above, I need this recognition for a program that highlights the syntax of other programs, which I don't write myself...
 
Old 09-02-2006, 10:36 AM   #10
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 735

Rep: Reputation: 76
Hi, Lorenzo.
Quote:
Originally Posted by lorebett
actually, as I said above, I need this recognition for a program that highlights the syntax of other programs, which I don't write myself...
Ah, yes, thanks for pointing that out -- I skipped over that part.
Quote:
Originally Posted by lorebett
I need these hueristics to determine the source language to use the correct highlighting in the software I maintain, GNU Source-highlight http://www.gnu.org/software/src-highlite/

I noticed that emacs somehow detects pretty correctly the language (without extension and with a "wrong" shabang, so I'll to check what it's doing.
I tend to use vi, so I have not seen that ability of emacs, but that is useful information.

I looked at the src-highlight web page -- s.h. seems to know a lot about a lot of languages.

The very flexibility of scripts being semi-structured is the cause of us not being able to easily tell the language, especially when you can have the shell call tclsh, awk, perl, etc. I don't see any way other than looking for key strings, pieces of syntax, etc., that are peculiar to one language or another. You could allow the caller to specify the language if you cannot guess it correctly.

I like that kind of work, and I have had a bit of exposure to parsing -- long ago I wrote a SNOBOL program that converted PL/1 into Fortran. Lately, I used a parser that filtered language elements with the use of a truth table -- one looked at a token, checked the table, and the result would exclude a number of possibilities, then the table entry might link to another entry, etc. That was part of a restructurizer, a pretty-printer. I think the University of Colorado was known for work like that. Fordham has also produced some tools along that line. However, I don't know of any specific place these days.

Interesting problem, and I'll keep an eye on src-highlight to track the progress. Best wishes ... cheers, makyo
 
Old 09-02-2006, 11:15 AM   #11
lorebett
Member
 
Registered: May 2004
Location: Italy
Distribution: Ubuntu, Gentoo
Posts: 57

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by makyo
I looked at the src-highlight web page -- s.h. seems to know a lot about a lot of languages.
thanks

however I'm adding more language there are still many others out there

Quote:
Originally Posted by makyo
You could allow the caller to specify the language if you cannot guess it correctly.
yes that's the standard behavior actually;

I've started to add "language inference" in the current development version, and it's pretty useful for highlighting, e.g., an entire directory, or when used in less http://www.gnu.org/software/src-high...ight-with-less

Quote:
Originally Posted by makyo
I like that kind of work, and I have had a bit of exposure to parsing -- long ago I wrote a SNOBOL program that converted PL/1 into Fortran. Lately, I used a parser that filtered language elements with the use of a truth table -- one looked at a token, checked the table, and the result would exclude a number of possibilities, then the table entry might link to another entry, etc. That was part of a restructurizer, a pretty-printer. I think the University of Colorado was known for work like that. Fordham has also produced some tools along that line. However, I don't know of any specific place these days.
Then I'll have to take a look at it!

Quote:
Originally Posted by makyo
Interesting problem, and I'll keep an eye on src-highlight to track the progress. Best wishes ... cheers, makyo
Thanks! and hope you're gonna use source-highlight someday!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
shell script problem, want to use shell script auto update IP~! singying304 Programming 4 11-29-2005 05:32 PM
shell script to detect IDE devices kushalkoolwal Programming 8 09-28-2005 11:15 AM
Convert shell scripts to other language vladmihaisima Programming 6 07-28-2005 02:25 AM
Change shell language linmix Linux - Software 9 07-26-2004 12:45 PM
Apache unable to detect language character max_tcs Linux - Software 0 07-22-2004 12:41 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:32 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration