LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-10-2017, 04:21 PM   #1
pizzarist
LQ Newbie
 
Registered: Feb 2017
Posts: 9

Rep: Reputation: Disabled
Please help me with these basic commands


Hello,

I'm struggling with this homework. I tried everything and my final output is always empty. Here are the questions and my commands :

1- Download file http://x.vcf.gz
- $ wget http://x.vcf.gz

2- Uncompress the downloaded file
- $ gunzip x.vcf.gz

3- Extract the lines 40-60 from the uncompressed file and generate an output file
-$ sed -n 40,61p x.vcf > output-file-1

4-Based on the output from step 3, extract the first 4 columns and generate an output file
- $ cut -c 1-5 output-file-1 > output-file-2

5-Based on the output from step 4, remove the lines that have ID ( in the 2nd column) starts with the string (DEL) and generate an output file
- $ sed -i '/DEL/2d' output-file-2 > output-file-3

6- show the content of the final output
- $ cat output-file-3

and the result is empty. In fact, when I try to search for the string DEL in output-file-3 or 2 I don't see anything.

Your help would very much appreciated

Last edited by pizzarist; 02-10-2017 at 04:24 PM.
 
Old 02-11-2017, 04:58 AM   #2
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 20 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918
Step 4. cut -c extracts based on character count, not field (column) count. Look at cut -f instead, with -d if necessary. Alternatively, you could use awk.
 
Old 02-11-2017, 05:29 AM   #3
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 5,655
Blog Entries: 3

Rep: Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901
Also, in step #5, you have accidentally added the -i option to sed and it gets in the way. It will cause sed to do in-place editing of the one file. As a result it will produce no output, thus the redirection of output (where there is no output to redirect) to the second file will result in the second file becoming empty.

Code:
man sed
Myself, I see the -i option on sed as more of a misfeature and do it the way you have it, with a redirect of sed's output to a new file.
 
Old 02-11-2017, 05:37 AM   #4
Jjanel
Member
 
Registered: Jun 2016
Distribution: any&all, in VBox; Ol'UnixCLI; NO GUI resources
Posts: 999
Blog Entries: 12

Rep: Reputation: 363Reputation: 363Reputation: 363Reputation: 363
Hi again! #1 doesn't look like a valid URL! Look for the file with: ls -l
Maybe: https://github.com/vgteam/vg/raw/mas...order/x.vcf.gz
Code:
user@trisquel:~$ ls -l
total 0
user@trisquel:~$ wget https://github.com/vgteam/vg/raw/master/test/order/x.vcf.gz
--2017-[...ton of msgs...] saved [2906/2906]
user@trisquel:~$ ls -l
total 4
-rw-rw-r-- 1 user user 2906 Feb 10 14:40 x.vcf.gz
user@trisquel:~$ gunzip x.vcf.gz
user@trisquel:~$ ls -l
total 16
-rw-rw-r-- 1 user user 16167 Feb 10 14:40 x.vcf
user@trisquel:~$ sed -n 40,61p x.vcf > output-file-1
user@trisquel:~$ ls -l
total 20
-rw-rw-r-- 1 user user  1137 Feb 10 14:43 output-file-1
-rw-rw-r-- 1 user user 16167 Feb 10 14:40 x.vcf
user@trisquel:~$ wc output-file-1
  22   22 1137 output-file-1
user@trisquel:~$ cut -c 1-5 output-file-1 > output-file-2
user@trisquel:~$ ls -l
total 24
-rw-rw-r-- 1 user user  1137 Feb 10 14:43 output-file-1
-rw-rw-r-- 1 user user   132 Feb 10 14:43 output-file-2
-rw-rw-r-- 1 user user 16167 Feb 10 14:40 x.vcf
user@trisquel:~$ head -1 output-file-2; tail -1 output-file-2 #sed -n '1p;$p' #awk 'NR==1;END{print}'
##con
##con
user@trisquel:~$ sed -i '/DEL/2d' output-file-2 > output-file-3
sed: -e expression #1, char 6: unknown command: `2'
cat -n x.vcf
will show line numbers; also (in the vcf I guessed at):
grep -n DEL x.vcf
93:##ALT=<ID=DEL,Description="Deletion">
So, my step #3 would have -different- line numbers.

[thoughts added later:]
A key concept here is debugging=troubleshooting what's happening.
Another way is: make a tiny/simple test case, and dig-deeply thru it,
to get each step working and understood.
(a bit like a new car, with buttons/switches that [may] do something 'good')
-I- actually didn't know what the sed -i switch does, so I
do a 1-minute scan of the [100page] manual, section sed, switch -i
IF that doesn't clarify it, I try a web-search (including 'goal'), like:
sed examples remove lines that have|match a string
Yea! 1st 'hit': http://stackoverflow.com/questions/5...pecific-string
(I thought this would be a 10+minute 'project' but it turned out <2!)
Deeper search: sed "delete 2 lines": seq 5|sed /2/,+2d COOL!

Anyway, best wishes! You'll be a Linux Master in no time at all!

Last edited by Jjanel; 02-11-2017 at 08:17 PM.
 
Old 02-11-2017, 05:45 AM   #5
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 20 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918
Quote:
Originally Posted by Jjanel View Post
Hi again! #1 doesn't look like a valid URL! Look for the file with: ls -l
Yeah, that was the first thing that struck me too, Jjanel, but I reckoned that the OP would probably have mentioned that there was an error at that stage. I assumed therefore that the OP had judiciously edited this portion of the original post to delete the actual URL used.
 
1 members found this post helpful.
Old 02-11-2017, 08:45 AM   #6
pizzarist
LQ Newbie
 
Registered: Feb 2017
Posts: 9

Original Poster
Rep: Reputation: Disabled
I'm terribly sorry, I just made that link up as an example. This is the actual link :
http://ftp.1000genomes.ebi.ac.uk/vol...notypes.vcf.gz
 
Old 02-11-2017, 09:16 AM   #7
pizzarist
LQ Newbie
 
Registered: Feb 2017
Posts: 9

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by hydrurga View Post
Step 4. cut -c extracts based on character count, not field (column) count. Look at cut -f instead, with -d if necessary. Alternatively, you could use awk.
Do you mean like this :

$ cut -d ' ' -f 1-5 output-file-1 > output-file-2
 
Old 02-11-2017, 09:23 AM   #8
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 20 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918
Quote:
Originally Posted by pizzarist View Post
Do you mean like this :

$ cut -d ' ' -f 1-5 output-file-1 > output-file-2
You need to look at each step, and the output from each step, separately and check that the output matches the format and content that you want it to be in. If it doesn't match your requirements then try changing the command and/or options, read the man files for the command in question, search the internet for answers. Then, if you still can't get the output to match what you want, come back on here and post the command and output in question, along with an example of how you want the output to be.

So, starting from the first step, and working downwards step by step, where do you hit your first problem?

Last edited by hydrurga; 02-11-2017 at 09:24 AM.
 
Old 02-11-2017, 10:45 AM   #9
pizzarist
LQ Newbie
 
Registered: Feb 2017
Posts: 9

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by hydrurga View Post
You need to look at each step, and the output from each step, separately and check that the output matches the format and content that you want it to be in. If it doesn't match your requirements then try changing the command and/or options, read the man files for the command in question, search the internet for answers. Then, if you still can't get the output to match what you want, come back on here and post the command and output in question, along with an example of how you want the output to be.

So, starting from the first step, and working downwards step by step, where do you hit your first problem?
I suspect that my problem starts with step 4 as I'm not sure what I'm extracting, columns or characters. The final output should be a gene sequencing file, something like : CCGTCGAACCA.
 
Old 02-11-2017, 10:55 AM   #10
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 20 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918Reputation: 2918
Quote:
Originally Posted by pizzarist View Post
I suspect that my problem starts with step 4 as I'm not sure what I'm extracting, columns or characters. The final output should be a gene sequencing file, something like : CCGTCGAACCA.
An internet search for "vcf format" produced the following as the first result:

http://www.internationalgenome.org/w...alysis/vcf4.0/

Have a read of the section entitled "Data lines".
 
Old 02-11-2017, 10:56 AM   #11
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 5,655
Blog Entries: 3

Rep: Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901
Quote:
Originally Posted by pizzarist View Post
I suspect that my problem starts with step 4 as I'm not sure what I'm extracting, columns or characters. The final output should be a gene sequencing file, something like : CCGTCGAACCA.
You can't get there from here. You'll have to back up and replace step #3 and onwards. I'd recommend awk to extract the 4th column, if the 4th column consists of a larger number of A, C, G, and T.

What are you really supposed to extract from the file?
 
Old 02-11-2017, 06:14 PM   #12
Jjanel
Member
 
Registered: Jun 2016
Distribution: any&all, in VBox; Ol'UnixCLI; NO GUI resources
Posts: 999
Blog Entries: 12

Rep: Reputation: 363Reputation: 363Reputation: 363Reputation: 363
I added a bit to my #4post. Have a peek at this web-search: Genomes bio-linux
Also (| is OR here [not 'pipe'!]): "internationalgenome" linux|awk|perl
Oh, and (I'm addicted to web-searching!): book|.pdf linux for bioinformatics
(I'm curious as to what computer and Linux 'distro' you use ... just curious)
You'll 'get there', tho maybe in a different awk-mobile Linux is INFINITE!

Last edited by Jjanel; 02-11-2017 at 06:42 PM.
 
Old 02-11-2017, 06:33 PM   #13
pizzarist
LQ Newbie
 
Registered: Feb 2017
Posts: 9

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Jjanel View Post
I added a bit to my #4post. Have a peek at this web-search: Genomes bio-linux
Also (| is OR): "internationalgenome" linux|awk|perl
(I'm curious as to what computer and Linux 'distro' you use ... just curious)
You'll 'get there', tho maybe in a different awk-mobile Linux is INFINITE!
I use a super computer at our campus ( Karst), remotely from my macbook. I have no idea about the linux. I'm new to all of this.
 
Old 02-11-2017, 06:54 PM   #14
r3sistance
Senior Member
 
Registered: Mar 2004
Location: UK
Distribution: CentOS 6/7
Posts: 1,375

Rep: Reputation: 217Reputation: 217Reputation: 217
Quote:
Originally Posted by pizzarist View Post
I use a super computer at our campus ( Karst), remotely from my macbook. I have no idea about the linux. I'm new to all of this.
A pipe is a fairly simple way to concatenate commands, taking the stdout of the command of the left and putting it to the stdin of the command to the right. That might sound a bit strange so here is an example

Code:
echo "hello" | sed 's/hello/hello world/' | cat
hello world
So the output of the first echo is hello, this is passed to sed. Sed does the replacement on hello to hello world.
The output of sed is then passed to the cat on the right which outputs "hello world"

You seem to be using a lot of files unnecessarily where pipes probably would have been a better choice.

Last edited by r3sistance; 02-11-2017 at 07:24 PM. Reason: Thanks ntubski
 
Old 02-11-2017, 07:09 PM   #15
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,604

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
Quote:
Originally Posted by r3sistance View Post
The output of sed is then passed to the echo on the right which outputs "hello world"
echo ignores its input, doesn't it? Maybe you meant to put cat there instead?
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
basic commands siawash Linux - Newbie 12 02-05-2008 06:46 PM
Basic Commands Instantly Linux - Newbie 2 06-29-2007 04:03 PM
looking for basic commands etc? lchisholm SUSE / openSUSE 4 05-03-2007 12:15 AM
Basic Commands Help willbasmith Linux - Hardware 1 02-16-2007 09:05 AM
Basic commands? MrPolite Linux - General 5 11-08-2002 09:19 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 08:51 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration