Hello,
I got a foo.fastq.gz file as output from a sequencer (Windows 7 machine). It is 1.9 GB. After transfer to RHEL55 box I tried gunzip, which about a halfway through exited with error with all available options deleting the temporary file about 5.4 GB:
Code:
[yaximik@G5NNJN1 MiSeq]$ gunzip -t SC2T252P15_S1_L001_R1_001.fastq.gz
gunzip: SC2T252P15_S1_L001_R1_001.fastq.gz: invalid compressed data--crc error
gunzip: SC2T252P15_S1_L001_R1_001.fastq.gz: invalid compressed data--length error
[yaximik@G5NNJN1 MiSeq]$ gunzip -q SC2T252P15_S1_L001_R1_001.fastq.gz
gunzip: SC2T252P15_S1_L001_R1_001.fastq.gz: invalid compressed data--crc error
gunzip: SC2T252P15_S1_L001_R1_001.fastq.gz: invalid compressed data--length error
[yaximik@G5NNJN1 MiSeq]$ gunzip -dq SC2T252P15_S1_L001_R1_001.fastq.gz
gunzip: SC2T252P15_S1_L001_R1_001.fastq.gz: invalid compressed data--crc error
gunzip: SC2T252P15_S1_L001_R1_001.fastq.gz: invalid compressed data--length error
Using GUI (Archive Manager) I saw that both errors are caused by one particular line entry, which is not hsown in the stdout. Finally, I was able to salvage everything until this line entry using
Code:
[yaximik@G5NNJN1 MiSeq]$ gunzip -c SC2T252P15_S1_L001_R1_001.fastq.gz > ./SC2T252P15_S1_L001_R1_001.fastq
gunzip: SC2T252P15_S1_L001_R1_001.fastq.gz: invalid compressed data--crc error
gunzip: SC2T252P15_S1_L001_R1_001.fastq.gz: invalid compressed data--length error
which still exited with error, but I got the 5.2 GB foo.fastq text file
Here is the file content output:
Code:
[yaximik@G5NNJN1 MiSeq]$ gunzip -cl SC2T252P15_S1_L001_R1_001.fastq.gz >./content.txt
[yaximik@G5NNJN1 MiSeq]$ cat content.txt
compressed uncompressed ratio uncompressed_name
2086259746 1331149161 -56.7% SC2T252P15_S1_L001_R1_001.fastq
[yaximik@G5NNJN1 MiSeq]$
Is anything wrong with the content?
But the main question I am asking for help with - is there an utility that will allow to edit the original foo.fastq.gz file and remove the offending line and re-extract healthy data, which are too valuable?