LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Combine several commands which write their output to different files (https://www.linuxquestions.org/questions/programming-9/combine-several-commands-which-write-their-output-to-different-files-4175480424/)

georgi 10-11-2013 08:56 AM

Combine several commands which write their output to different files
 
Hi all,

I have large files with url-s ending on "|<number>" which is the Page Rank, for example

Code:

http://www.machinokairo.com/2012/05/post-39.html|2
I am using "grep" to sort out all url-s in a particular way: first, remove all ending on "|0" and write the out put to a file, then remove all ending on "|1" and write the output to a new file and so on up to "|5". Each time I remove certain PR and have the rest in separate file. For now I use the following commands to do that

Code:

grep --invert-match "|0" sitelist >> sitelist_PR1.txt
grep --invert-match "|1" sitelist_PR1.txt >> sitelist_PR2.txt
.
.
grep --invert-match "|5" sitelist_PR1.txt >> sitelist_PR6.txt

I will greatly appreciate if someone points how to automate this process with CLI or bash script.

sundialsvcs 10-11-2013 09:00 AM

Just ">>" them to the same filename. ">>" means append, whereas ">" means replace.

Or, if for other reasons you want them to remain separate, run the commands then "cat file1 file2 file3 > outputfile"

georgi 10-11-2013 09:05 AM

Quote:

Originally Posted by sundialsvcs (Post 5043996)
Just ">>" them to the same filename. ">>" means append, whereas ">" means replace.

Or, if for other reasons you want them to remain separate, run the commands then "cat file1 file2 file3 > outputfile"

Yes, I would like to keep each of the files as separate and to make the whole process with one run of a script or command line.

TenTenths 10-11-2013 09:13 AM

Code:

#!/bin/bash

for PR in {0..9} ; do
 grep "|${PR}$" /path/to/input/sitelist > /path/to/output/sitelist_PR${PR}.txt
done

Will loop through the numbers 0 to 9

Then tries to grep for the occurance of a bar and that number at the end of each line in the input file and puts them in the output file accordingly.

georgi 10-11-2013 01:05 PM

Quote:

Originally Posted by TenTenths (Post 5044004)
Code:

for PR in {0..9} ; do
Will loop through the numbers 0 to 9

Then tries to grep for the occurance of a bar and that number at the end of each line in the input file and puts them in the output file accordingly.

Ten, your script might be the one I need, but when I first looked at it I noticed that you wrote PR in the text inside. I have "|<number>" in my text files, like in the example
Code:

|2
at the end. The number is actual PR of that site. So, I need to deal with "|".

TenTenths 10-11-2013 01:48 PM

Combine several commands which write their output to different files
 
in my example PR is a variable in the "for" loop it is refered to in the main line as ${PR} so first time round the grep command will look for "|0$" where $ tells grep to match it at the end of the line. Next time through the loop it becomes "|1$" etc.

georgi 10-11-2013 01:56 PM

the script creates only empty files

TenTenths 10-12-2013 07:40 AM

Combine several commands which write their output to different files
 
worked fine for me when I created an input file in the firnat you gave.

georgi 10-12-2013 07:55 AM

Quote:

Originally Posted by TenTenths (Post 5044424)
worked fine for me when I created an input file in the firnat you gave.

still have empty files although numbered correctly.
anyway, thanks for your efforts and time.

ntubski 10-12-2013 10:26 AM

TenTenth's grep expression is bit stricter than the original. I can think of a couple of cases where this could cause failure:

Does "sitefile" use DOS/Windows line endings (post output of file sitefile)?

Can the page rank be more than one digit?

georgi 10-12-2013 01:21 PM

Quote:

Originally Posted by ntubski (Post 5044477)
TenTenth's grep expression is bit stricter than the original. I can think of a couple of cases where this could cause failure:

Does "sitefile" use DOS/Windows line endings (post output of file sitefile)?

Can the page rank be more than one digit?

sitefile is converted from windows and your point is excellent. Now I formatted the proper way and the script works.
The point is that script gives me output in a different way than desired i.e. each output file contains only url-s with PR from its name
sitelist_PR1.txt contains only url-s with PR1.

ntubski 10-12-2013 01:52 PM

Untested:

Code:

awk -F\| '{for (i=1; i <= 5; i++) { if ($2 >= i) print > ("sitelist_PR" i ".txt") }}' sitelist

georgi 10-12-2013 01:58 PM

Quote:

Originally Posted by ntubski (Post 5044558)
Untested:

Code:

awk -F\| '{for (i=1; i <= 5; i++) { if ($2 >= i) print > ("sitelist_PR" i ".txt") }' sitelist

Code:

awk: cmd. line:1: {for (i=1; i <= 5; i++) { if ($2 >= i) print > ("sitelist_PR" i ".txt") }
awk: cmd. line:1:                                                                          ^ unexpected newline or end of string


ntubski 10-12-2013 03:41 PM

Ah, missed the closing brace (fixed in edit).

georgi 10-12-2013 04:46 PM

Quote:

Originally Posted by ntubski (Post 5044558)
Untested:

Code:

awk -F\| '{for (i=1; i <= 5; i++) { if ($2 >= i) print > ("sitelist_PR" i ".txt") }}' sitelist

this did the job. thank you.


All times are GMT -5. The time now is 03:54 PM.