LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-07-2017, 04:50 PM   #1
onthetopo
LQ Newbie
 
Registered: May 2017
Posts: 11

Rep: Reputation: Disabled
Grep all files recursively in directory, and if file contains two patterns on the same line, bzip2 it while preserving directory structure


Hello
I have a directory structure like the following, each subfolder contains about 10 .txt files

filings/000104/1042391
filings/000104/1042392
filings/000105/1052391
filings/000105/1052222
.......


How do I recursively grep through all files in the filings directory and gzip them, while preserving directory structure in the resultant gz file

1.I want to find all files that contain two strings on the same line, 'FORM TYPE' and '99D' right now the following line does the job for me

grep 'FORM TYPE' *.txt |grep '99D'

2.I want to tar and bzip2 all of the above files with maximal compression , but I don't know how to incorporate the piped sequential-grep command here, as the command in paragraph 1. Right now I am using the following:

tar cvf - ./11a.txt | bzip2 -v9 - > ~/myname.tar.bz2

3.I want to preserve directory structure once the gz file is extracted, so that filings/000105/1052222 directory is created once the file is extracted
 
Old 05-07-2017, 05:03 PM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 23,950

Rep: Reputation: 7029Reputation: 7029Reputation: 7029Reputation: 7029Reputation: 7029Reputation: 7029Reputation: 7029Reputation: 7029Reputation: 7029Reputation: 7029Reputation: 7029
Quote:
Originally Posted by onthetopo View Post
Hello
I have a directory structure like the following, each subfolder contains about 10 .txt files

filings/000104/1042391
filings/000104/1042392
filings/000105/1052391
filings/000105/1052222
.......


How do I recursively grep through all files in the filings directory and gzip them, while preserving directory structure in the resultant gz file

1.I want to find all files that contain two strings on the same line, 'FORM TYPE' and '99D' right now the following line does the job for me

grep 'FORM TYPE' *.txt |grep '99D'

2.I want to tar and bzip2 all of the above files with maximal compression , but I don't know how to incorporate the piped sequential-grep command here, as the command in paragraph 1. Right now I am using the following:

tar cvf - ./11a.txt | bzip2 -v9 - > ~/myname.tar.bz2

3.I want to preserve directory structure once the gz file is extracted, so that filings/000105/1052222 directory is created once the file is extracted
You can write a script to do this. Personally, I'd approach it in exactly the steps you laid out:
  1. Since you have a top level to start with, you can run a find on it for any .txt files. That gives you the path and file name, so you can...
  2. ..feed it into the grep statements, and if it's found....
  3. ..put the file name (the variable you found going in) into an array, to feed into...
  4. ..the tar command.
That said, read the man page on tar...you don't need to pipe it to bzip (see the -j option). What have you written so far?
 
1 members found this post helpful.
Old 05-07-2017, 05:16 PM   #3
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_12{.0|.1}
Posts: 5,661
Blog Entries: 11

Rep: Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706
Welcome to LQ!

You can do that all in one line with only small variation from what you have shown.

First, no need for two greps if 'FORM TYPE' and '99D' are on the same line.

Next, you can use grep to only produce the matching file names, see man grep, -l option.

Same to make it recursive, man grep, -r.

Then you can pipe that directly into the tar command, man tar, -T reading from stdin. You can also bzip that in the tar comman and redirect to your file.

So something like this should do it...

Code:
grep -rl YOUR_EXPRESSION | tar [other options] -T - >file.tar.bz
 
2 members found this post helpful.
Old 05-07-2017, 08:29 PM   #4
onthetopo
LQ Newbie
 
Registered: May 2017
Posts: 11

Original Poster
Rep: Reputation: Disabled
Thanks for the reply. But I need two greps because they could be spaces or other characters between 'FORM TYPE' and '99D',
how do I handle it with one grep?

Mind posting the complete answer please, including how to bzip2 with maximal compression? Thanks.

Quote:
Originally Posted by astrogeek View Post
Welcome to LQ!

You can do that all in one line with only small variation from what you have shown.

First, no need for two greps if 'FORM TYPE' and '99D' are on the same line.

Next, you can use grep to only produce the matching file names, see man grep, -l option.

Same to make it recursive, man grep, -r.

Then you can pipe that directly into the tar command, man tar, -T reading from stdin. You can also bzip that in the tar comman and redirect to your file.

So something like this should do it...

Code:
grep -rl YOUR_EXPRESSION | tar [other options] -T - >file.tar.bz
 
Old 05-07-2017, 08:41 PM   #5
onthetopo
LQ Newbie
 
Registered: May 2017
Posts: 11

Original Poster
Rep: Reputation: Disabled
Sir I know the bare minimum of bash script. Mind give me the answer please?
I am not competent enough to write this on my own.
Quote:
Originally Posted by TB0ne View Post
You can write a script to do this. Personally, I'd approach it in exactly the steps you laid out:
  1. Since you have a top level to start with, you can run a find on it for any .txt files. That gives you the path and file name, so you can...
  2. ..feed it into the grep statements, and if it's found....
  3. ..put the file name (the variable you found going in) into an array, to feed into...
  4. ..the tar command.
That said, read the man page on tar...you don't need to pipe it to bzip (see the -j option). What have you written so far?
 
Old 05-07-2017, 09:09 PM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 19,788

Rep: Reputation: 3574Reputation: 3574Reputation: 3574Reputation: 3574Reputation: 3574Reputation: 3574Reputation: 3574Reputation: 3574Reputation: 3574Reputation: 3574Reputation: 3574
Quote:
Originally Posted by onthetopo View Post
But I need two greps because they could be spaces or other characters between 'FORM TYPE' and '99D',
how do I handle it with one grep?
Not true - grep accepts regex; basic will suffice if the 'FORM TYPE' is always first on the line. Like so
Code:
grep -rl "FORM TYPE.*99D" *.txt
Using the hints given above you should be able to resolve this yourself.
 
Old 05-07-2017, 09:29 PM   #7
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 5,672

Rep: Reputation: 2220Reputation: 2220Reputation: 2220Reputation: 2220Reputation: 2220Reputation: 2220Reputation: 2220Reputation: 2220Reputation: 2220Reputation: 2220Reputation: 2220
If you are concerned that your two search terms can occur in either order in a line, then you can search for both with
Code:
grep -rl -e "FORM TYPE.*99D" -e "99D.*FORM TYPE"
 
Old 05-07-2017, 09:41 PM   #8
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_12{.0|.1}
Posts: 5,661
Blog Entries: 11

Rep: Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706
Quote:
Originally Posted by onthetopo View Post
Thanks for the reply. But I need two greps because they could be spaces or other characters between 'FORM TYPE' and '99D',
how do I handle it with one grep?

Mind posting the complete answer please, including how to bzip2 with maximal compression? Thanks.
Let's see if you can figure it yourself from the hints given first! LQ members are always happy to help you learn, but you learn best by doing.

And in a way, we have given you the complete answer already, all you need to do is think about what has been posted, maybe read the referenced man pages for more complete details (but the answer is already given).

Grep simply matches regular expressions in the the files, so all you need is a regular expression to match the text you are looking for.

Regular expressions are just a kind of shorthand notation for matching arbitrary strings of characters, and you are already using it to match each expression. All you need is to modify your regular expression to match both expressions. Others have already given the basic hint, '.*', which matches zero or more of any character.

There are many regular expression tutorials online, I found this one with a quick search, and it includes the ability to test your knowledge by submitting your own regular expressions and testing the result.

The man page for tar will provide all the information needed to tweak that part of the command pipeline to you exact requirements.

Give it a try and tell us what you figure out!
 
Old 05-07-2017, 11:41 PM   #9
onthetopo
LQ Newbie
 
Registered: May 2017
Posts: 11

Original Poster
Rep: Reputation: Disabled
My attempt is following, I removed *.txt since I want to search all subdirectories and there is no .txt file in current directory, not sure I need to do tar -cv instead of tar -cvf, is missing 'f' a problem?

grep -rl ".*FORM TYPE.*99D" | tar -T - -cv | bzip2 -v9 - > ~/99D.tar.bz2 >~/log.txt

I got an ambiguous output error, it actually worked without the final part on log.txt. How could I keep the stdout in this situation with a log.txt logfile?

Quote:
Originally Posted by astrogeek View Post
Let's see if you can figure it yourself from the hints given first! LQ members are always happy to help you learn, but you learn best by doing.

And in a way, we have given you the complete answer already, all you need to do is think about what has been posted, maybe read the referenced man pages for more complete details (but the answer is already given).

Last edited by onthetopo; 05-08-2017 at 12:08 AM.
 
Old 05-08-2017, 01:10 AM   #10
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 16,636

Rep: Reputation: 5614Reputation: 5614Reputation: 5614Reputation: 5614Reputation: 5614Reputation: 5614Reputation: 5614Reputation: 5614Reputation: 5614Reputation: 5614Reputation: 5614
you added two redirection of stdout, which cannot work. What do you want to put into log.txt?
probably you need to redirect stderr to log.txt.
 
1 members found this post helpful.
Old 05-08-2017, 01:37 AM   #11
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_12{.0|.1}
Posts: 5,661
Blog Entries: 11

Rep: Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706
Quote:
Originally Posted by onthetopo View Post
My attempt is following, I removed *.txt since I want to search all subdirectories and there is no .txt file in current directory, not sure I need to do tar -cv instead of tar -cvf, is missing 'f' a problem?

grep -rl ".*FORM TYPE.*99D" | tar -T - -cv | bzip2 -v9 - > ~/99D.tar.bz2 >~/log.txt

I got an ambiguous output error, it actually worked without the final part on log.txt. How could I keep the stdout in this situation with a log.txt logfile?
That is progress, nice to see!

If you want to include the bzip without the final pipe, see the -j option to tar as mentioned by TB0ne above. But the way you have it is OK too, and gives you all the bzip2 options if you want them.

If you want the file list to go to log.txt you can use tee with the log filename (see man tee as usual). The basic form would be grep ...| tee ... | tar ...

On the other hand, if you want error messages redirect stderr into log.txt (hint: 2>log.txt).

If you want the list of files and all status and errors, you should be able to use tee as described above and append stderr to log.txt at the end (or send to a separate file, errs.txt). I haven't tested tee+append but you can do so easily, right!

Last edited by astrogeek; 05-08-2017 at 02:08 AM. Reason: calirfy
 
Old 05-08-2017, 06:55 AM   #12
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 23,950

Rep: Reputation: 7029Reputation: 7029Reputation: 7029Reputation: 7029Reputation: 7029Reputation: 7029Reputation: 7029Reputation: 7029Reputation: 7029Reputation: 7029Reputation: 7029
Quote:
Originally Posted by onthetopo View Post
Sir I know the bare minimum of bash script. Mind give me the answer please? I am not competent enough to write this on my own.
And you never will be until you actually do it. Read the "Question Guidelines" link in my posting signature. We are all happy to help you if you're stuck, but we aren't going to write your scripts/do your work for you, and hand it to you.

There are THOUSANDS of easily found scripting guides you can find with a very brief internet search...exactly like the one you did to find this site. Start there.
 
1 members found this post helpful.
Old 05-08-2017, 11:25 AM   #13
onthetopo
LQ Newbie
 
Registered: May 2017
Posts: 11

Original Poster
Rep: Reputation: Disabled
well then, thanks for nothing, I got it on my own.
Quote:
Originally Posted by TB0ne View Post
And you never will be until you actually do it. Read the "Question Guidelines" link in my posting signature. We are all happy to help you if you're stuck, but we aren't going to write your scripts/do your work for you, and hand it to you.

There are THOUSANDS of easily found scripting guides you can find with a very brief internet search...exactly like the one you did to find this site. Start there.
 
Old 05-08-2017, 12:22 PM   #14
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_12{.0|.1}
Posts: 5,661
Blog Entries: 11

Rep: Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706Reputation: 3706
Quote:
Originally Posted by onthetopo View Post
well then, thanks for nothing, I got it on my own.
Please try to be respectful of the time and effort provided by others such as TB0ne to help you learn to resolve your issues. LQ is all about learning from others!

If you feel entitled to free handouts and are scornful of those trying to help in other ways, then you may be in the wrong place. From the link in TB0ne's sig, the LQ Welcome page:

Quote:
Please understand that LQ is not a help desk, customer service line for a product you purchased or willing to do your homework (although we are happy to assist you with specifics, if you show some effort of your own!). We're a 100% volunteer organization that wants to help you help yourself.
Please consider the spirit of that statement as the basis of LQ participation, and try to be respectful of those offering help, if not thankful for the guidance received.

We look forward to your continuing participation in that same spirit, and hope that you will also learn along the way so that you may also help others in your turn!
 
Old 05-08-2017, 01:57 PM   #15
onthetopo
LQ Newbie
 
Registered: May 2017
Posts: 11

Original Poster
Rep: Reputation: Disabled
My apologies. Must have mis-read TB0ne's comment as one that is extremely condescending. Appreciations to all those who actually had substantive input.

Quote:
Originally Posted by astrogeek View Post
Please try to be respectful of the time and effort provided by others such as TB0ne to help you learn to resolve your issues. LQ is all about learning from others!

If you feel entitled to free handouts and are scornful of those trying to help in other ways, then you may be in the wrong place. From the link in TB0ne's sig, the LQ Welcome page:



Please consider the spirit of that statement as the basis of LQ participation, and try to be respectful of those offering help, if not thankful for the guidance received.

We look forward to your continuing participation in that same spirit, and hope that you will also learn along the way so that you may also help others in your turn!

Last edited by onthetopo; 05-08-2017 at 08:18 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] How to copy directory and file structure with zero length files dchurch315 Linux - Newbie 7 07-25-2016 08:41 PM
Copying jpegs recursively while preserving the directory structure jsmith54 Linux - Newbie 1 08-09-2010 07:17 PM
chapter 6, want to use bzip2 but it fails with bash: bzip2: No such file or directory nomad5000 Linux From Scratch 2 10-12-2009 08:58 PM
how to recursively delete *.xtension files from a directory structure kpachopoulos Linux - General 6 08-24-2008 08:53 AM
Preserving directory tree structure /etc/fstab Hikito Linux - Newbie 1 09-11-2004 04:07 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 12:02 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration