Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
2- Uncompress the downloaded file
- $ gunzip x.vcf.gz
3- Extract the lines 40-60 from the uncompressed file and generate an output file
-$ sed -n 40,61p x.vcf > output-file-1
4-Based on the output from step 3, extract the first 4 columns and generate an output file
- $ cut -c 1-5 output-file-1 > output-file-2
5-Based on the output from step 4, remove the lines that have ID ( in the 2nd column) starts with the string (DEL) and generate an output file
- $ sed -i '/DEL/2d' output-file-2 > output-file-3
6- show the content of the final output
- $ cat output-file-3
and the result is empty. In fact, when I try to search for the string DEL in output-file-3 or 2 I don't see anything.
Step 4. cut -c extracts based on character count, not field (column) count. Look at cut -f instead, with -d if necessary. Alternatively, you could use awk.
Also, in step #5, you have accidentally added the -i option to sed and it gets in the way. It will cause sed to do in-place editing of the one file. As a result it will produce no output, thus the redirection of output (where there is no output to redirect) to the second file will result in the second file becoming empty.
Code:
man sed
Myself, I see the -i option on sed as more of a misfeature and do it the way you have it, with a redirect of sed's output to a new file.
user@trisquel:~$ ls -l
total 0
user@trisquel:~$ wget https://github.com/vgteam/vg/raw/master/test/order/x.vcf.gz
--2017-[...ton of msgs...] saved [2906/2906]
user@trisquel:~$ ls -l
total 4
-rw-rw-r-- 1 user user 2906 Feb 10 14:40 x.vcf.gz
user@trisquel:~$ gunzip x.vcf.gz
user@trisquel:~$ ls -l
total 16
-rw-rw-r-- 1 user user 16167 Feb 10 14:40 x.vcf
user@trisquel:~$ sed -n 40,61p x.vcf > output-file-1
user@trisquel:~$ ls -l
total 20
-rw-rw-r-- 1 user user 1137 Feb 10 14:43 output-file-1
-rw-rw-r-- 1 user user 16167 Feb 10 14:40 x.vcf
user@trisquel:~$ wc output-file-1
22 22 1137 output-file-1
user@trisquel:~$ cut -c 1-5 output-file-1 > output-file-2
user@trisquel:~$ ls -l
total 24
-rw-rw-r-- 1 user user 1137 Feb 10 14:43 output-file-1
-rw-rw-r-- 1 user user 132 Feb 10 14:43 output-file-2
-rw-rw-r-- 1 user user 16167 Feb 10 14:40 x.vcf
user@trisquel:~$ head -1 output-file-2; tail -1 output-file-2 #sed -n '1p;$p' #awk 'NR==1;END{print}'
##con
##con
user@trisquel:~$ sed -i '/DEL/2d' output-file-2 > output-file-3
sed: -e expression #1, char 6: unknown command: `2'
cat -n x.vcf
will show line numbers; also (in the vcf I guessed at):
grep -n DEL x.vcf
93:##ALT=<ID=DEL,Description="Deletion">
So, my step #3 would have -different- line numbers.
[thoughts added later:]
A key concept here is debugging=troubleshooting what's happening.
Another way is: make a tiny/simple test case, and dig-deeply thru it,
to get each step working and understood.
(a bit like a new car, with buttons/switches that [may] do something 'good')
-I- actually didn't know what the sed -i switch does, so I
do a 1-minute scan of the [100page] manual, section sed, switch -i
IF that doesn't clarify it, I try a web-search (including 'goal'), like:
sed examples remove lines that have|match a string
Yea! 1st 'hit': http://stackoverflow.com/questions/5...pecific-string
(I thought this would be a 10+minute 'project' but it turned out <2!)
Deeper search: sed "delete 2 lines": seq 5|sed /2/,+2d COOL!
Anyway, best wishes! You'll be a Linux Master in no time at all!
Hi again! #1 doesn't look like a valid URL! Look for the file with: ls -l
Yeah, that was the first thing that struck me too, Jjanel, but I reckoned that the OP would probably have mentioned that there was an error at that stage. I assumed therefore that the OP had judiciously edited this portion of the original post to delete the actual URL used.
Step 4. cut -c extracts based on character count, not field (column) count. Look at cut -f instead, with -d if necessary. Alternatively, you could use awk.
You need to look at each step, and the output from each step, separately and check that the output matches the format and content that you want it to be in. If it doesn't match your requirements then try changing the command and/or options, read the man files for the command in question, search the internet for answers. Then, if you still can't get the output to match what you want, come back on here and post the command and output in question, along with an example of how you want the output to be.
So, starting from the first step, and working downwards step by step, where do you hit your first problem?
You need to look at each step, and the output from each step, separately and check that the output matches the format and content that you want it to be in. If it doesn't match your requirements then try changing the command and/or options, read the man files for the command in question, search the internet for answers. Then, if you still can't get the output to match what you want, come back on here and post the command and output in question, along with an example of how you want the output to be.
So, starting from the first step, and working downwards step by step, where do you hit your first problem?
I suspect that my problem starts with step 4 as I'm not sure what I'm extracting, columns or characters. The final output should be a gene sequencing file, something like : CCGTCGAACCA.
I suspect that my problem starts with step 4 as I'm not sure what I'm extracting, columns or characters. The final output should be a gene sequencing file, something like : CCGTCGAACCA.
An internet search for "vcf format" produced the following as the first result:
I suspect that my problem starts with step 4 as I'm not sure what I'm extracting, columns or characters. The final output should be a gene sequencing file, something like : CCGTCGAACCA.
You can't get there from here. You'll have to back up and replace step #3 and onwards. I'd recommend awk to extract the 4th column, if the 4th column consists of a larger number of A, C, G, and T.
What are you really supposed to extract from the file?
I added a bit to my #4post. Have a peek at this web-search: Genomes bio-linux
Also (| is OR here [not 'pipe'!]): "internationalgenome" linux|awk|perl
Oh, and (I'm addicted to web-searching!): book|.pdf linux for bioinformatics
(I'm curious as to what computer and Linux 'distro' you use ... just curious)
You'll 'get there', tho maybe in a different awk-mobile Linux is INFINITE!
I added a bit to my #4post. Have a peek at this web-search: Genomes bio-linux
Also (| is OR): "internationalgenome" linux|awk|perl
(I'm curious as to what computer and Linux 'distro' you use ... just curious)
You'll 'get there', tho maybe in a different awk-mobile Linux is INFINITE!
I use a super computer at our campus ( Karst), remotely from my macbook. I have no idea about the linux. I'm new to all of this.
I use a super computer at our campus ( Karst), remotely from my macbook. I have no idea about the linux. I'm new to all of this.
A pipe is a fairly simple way to concatenate commands, taking the stdout of the command of the left and putting it to the stdin of the command to the right. That might sound a bit strange so here is an example
Code:
echo "hello" | sed 's/hello/hello world/' | cat
hello world
So the output of the first echo is hello, this is passed to sed. Sed does the replacement on hello to hello world.
The output of sed is then passed to the cat on the right which outputs "hello world"
You seem to be using a lot of files unnecessarily where pipes probably would have been a better choice.
Last edited by r3sistance; 02-11-2017 at 07:24 PM.
Reason: Thanks ntubski
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.