Shell scripting
I have the data file in the given format. In the given file, 3 structures are there starting with’Contig’and each structure having substructures starting with’-‘ like -150.90, -150.70, -149.70. One substructure contains range of values and one did not have any value. In -150.70, three ranges are there in that 672:693 and 679:700 are continous. I need a shell program to combine such ranges like 672:700 for this and remove the substructures which do not have range of values. The output needed shown below.
INPUT Contig1430 -150.90: -150.70: 672:693 711:732 679:700 -149.70: Contig1439 -134.80 20:41 42:63 55:76 Contig1454 -178.40: 536:557 648:669 546:567 551:572 554:575 561:582 567:588 572:593 579:600 583:604 591:612 601:622 607:628 614:635 617:638 OUTPUT Contig1430 -150.70: 672:700 711:732 Contig1439 -134.80 20:76 Contig1454 -178.40: 536:638 |
Are you saying that it is continuous as the last field of one substructure is greater than or equal to the first field of another substructure?
Code:
-150.70: |
Quote:
|
Quote:
Comparing two numerical values is trivial...since you've (as you've said) been working on shell scripts "round the clock" (http://www.linuxquestions.org/questi...esting-848051/), since last year, this should be childs play to you. |
Well looking at your data I believe your suggested output is actually incorrect. The following seems to work (and can probably be shortened if you look through it):
Code:
#!/usr/bin/awk -f Code:
Contig1430 |
I'll also say this: don't send a shell script (for any shell) to do a Camel's job.
Shell scripts (with the slight exception of the Korn shell, which for some reason does embed a full programming language) were not really designed for "heavy duty programming." But all of them do support the #! ("shebang...") construct, which allows the script to be written in any language. And there are many of them at your fingertips: Perl, PHP, Ruby, Python, and many more. The task that you describe, while not entirely trivial, is the work of a simple Perl program that would be no more than a hundred lines or so, if that. (No, I'm not going to write it for you.) :tisk: |
All times are GMT -5. The time now is 01:41 PM. |