LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   shell scripting testing (https://www.linuxquestions.org/questions/linux-newbie-8/shell-scripting-testing-848051/)

kswapnadevi 12-02-2010 08:07 PM

shell scripting testing
 
I have an input file (10000 lines) named ‘out’ in the given format. Each two lines represent one structure. I am giving this data file as input to a shell program which creates 5000 folders; one directory for each chromosomes using built-in tool ‘mfold SEQ=input’ command. The shell program is ‘process.awk’ given below. I am executing with a command ‘awk –f process.awk out’. The program executed and created 1020 directories and after that it is giving error like this:
awk: process.awk:6: (FILENAME=out FNR=1022) fatal: can't redirect to `dir1021/input' (No such file or directory)
I am not able to correct this. Help for this highly appreciated. Thanks in advance.

Quote:

out file: input
>Chr5:26236034-26236054
ACCGCCGCCGCCTGCCGCGTA
>Chr25:2622217-2622237
TGATTCTCGCTTTGGGTGCGA
>Chr10:23813143-23813163
AGTTAGTCTTTGTTTTTTGTT
>Chr23:24400416-24400436
AAACACTCAGCTCCCGATCTG
>Chr14:68746745-68746765
TCACATTCTAAGATTTTGCTG
>Chr29:3473120-3473140
CAAATACCATGGTTTCTACAG
>ChrX:62081589-62081609
ACGGGGGGGCGCCGGGGGCCT
>Chr18:31139220-31139240
AAGGGATTGGGAGAGTAGGAT
process.awk

Quote:

BEGIN {
FS=">";RS=">";ORS="";
}
$NF { d++
system("mkdir dir"d);
print ">"$0 > ("dir"d"/input");
system("cd dir"d"; mfold SEQ=input");
system("cd dir"d"; /home/rsankar/bin/mfold SEQ=input");

Tinkster 12-02-2010 08:44 PM

And did you check whether dir1021/input exists?

Maybe the script creating the dirs had a partial failure?



Cheers,
Tink

GrapefruiTgirl 12-02-2010 09:12 PM

I'm curious if awk is running out of file descriptors (too many open files) since there are no close() calls in the script. We don't know what OS this is running on (do we?); GNU gawk apparently has no limits (within reason?) but other awk implementations may have limits.

This page may be helpful if lack of file descriptors is a problem:
http://www.gnu.org/manual/gawk/html_...And-Pipes.html

kswapnadevi 12-02-2010 09:38 PM

shell script testing
 
I am working on Linux OS. How to modify the above awk program? help me.

Quote:

Originally Posted by GrapefruiTgirl (Post 4178996)
I'm curious if awk is running out of file descriptors (too many open files) since there are no close() calls in the script. We don't know what OS this is running on (do we?); GNU gawk apparently has no limits (within reason?) but other awk implementations may have limits.

This page may be helpful if lack of file descriptors is a problem:
http://www.gnu.org/manual/gawk/html_...And-Pipes.html


GrapefruiTgirl 12-03-2010 12:06 PM

Have you made progress, or done further investigation yet? If so, what did you find?

I am not sure if what I suggest above is correct; but the way I figure it, your script is trying to create fd #1021, which would be the 1022nd fd created based upon the input data; the `mfold` system command's fd would account for the 1023rd fd; the input file would account for the 1024th fd; the awk script itself *might* account for the 1025th fd, or maybe each of your `mfold` commands accounts for its own fd -- either way makes for a total of 1024 open fd's and an attempt being made to open another one, which fails. The failure hints at a limit of 1024 open fd's for your awk version.

If this is all correct (again, I do not know if it is), then the solution would be to issue the close() command once for every open file descriptor that gets created, after the script is finished using that descriptor. So if it were me, I would use close("fd name here") after every `print` command you use.

Keep us posted! :)

kswapnadevi 12-04-2010 01:35 AM

Shell scripting testing
 
Still I am in investigation round the clock. Pls modify the given script by adding close statements madam. I will try that also
Thanks in advance.

Quote:

Originally Posted by GrapefruiTgirl (Post 4179698)
Have you made progress, or done further investigation yet? If so, what did you find?

I am not sure if what I suggest above is correct; but the way I figure it, your script is trying to create fd #1021, which would be the 1022nd fd created based upon the input data; the `mfold` system command's fd would account for the 1023rd fd; the input file would account for the 1024th fd; the awk script itself *might* account for the 1025th fd, or maybe each of your `mfold` commands accounts for its own fd -- either way makes for a total of 1024 open fd's and an attempt being made to open another one, which fails. The failure hints at a limit of 1024 open fd's for your awk version.

If this is all correct (again, I do not know if it is), then the solution would be to issue the close() command once for every open file descriptor that gets created, after the script is finished using that descriptor. So if it were me, I would use close("fd name here") after every `print` command you use.

Keep us posted! :)


GrapefruiTgirl 12-04-2010 08:22 AM

Quote:

Originally Posted by kswapnadevi (Post 4180198)
Still I am in investigation round the clock. Pls modify the given script by adding close statements madam. I will try that also
Thanks in advance.

With the information I have given thus far, plus the link given above, and based the snippets of code you have posted in the past, I am reasonably confident that you yourself are capable of adding a single close() statement with the right stuff inside the brackets, after the print statement. I have used numerous close() statements in my awk code over here:
http://www.linuxquestions.org/questi...2/#post4126234
Have a look and see what I did. Modify your code. Test it. Show us the results if it fails (copy + paste the errors of execution of your program) and show us the code again, with the close() statements added.

Please allow me to remind you again though: I do not know if this will address the issue, or if the number of fd's is even the problem, but if it were me, I would be doing exactly what I'm suggesting you try: close() statements.

Good luck!

grail 12-04-2010 09:30 AM

As I have said in other posts from same OP on same topic, why not just use bash seeing all the calls to system?
A simple while loop could easily read from the file and then issue all your command as you have inside the system calls.

Something along the lines of:
Code:

#!/bin/bash

exec 3<f2

d=1

while read -u 3 -r fline
do
    read -u 3 -r sline

    DIR="dir$((d++))"
    mkdir $DIR
    echo "$fline" > $DIR/input
    echo "$sline" >> $DIR/input
    ...
done

exec 3>&-

Seems pretty simple if you ask me.

catkin 12-04-2010 09:41 AM

Even simpler with (untested but I saw it on LQ the other day!):
Code:

#!/bin/bash

d=1

while read -u 3 -r fline
do
    ...
done 3<f2



All times are GMT -5. The time now is 12:49 AM.