LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-11-2013, 08:56 AM   #1
georgi
LQ Newbie
 
Registered: Apr 2012
Location: Bulgaria
Distribution: openSuSE
Posts: 15

Rep: Reputation: Disabled
Combine several commands which write their output to different files


Hi all,

I have large files with url-s ending on "|<number>" which is the Page Rank, for example

Code:
http://www.machinokairo.com/2012/05/post-39.html|2
I am using "grep" to sort out all url-s in a particular way: first, remove all ending on "|0" and write the out put to a file, then remove all ending on "|1" and write the output to a new file and so on up to "|5". Each time I remove certain PR and have the rest in separate file. For now I use the following commands to do that

Code:
grep --invert-match "|0" sitelist >> sitelist_PR1.txt
grep --invert-match "|1" sitelist_PR1.txt >> sitelist_PR2.txt
.
.
grep --invert-match "|5" sitelist_PR1.txt >> sitelist_PR6.txt
I will greatly appreciate if someone points how to automate this process with CLI or bash script.
 
Old 10-11-2013, 09:00 AM   #2
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,633
Blog Entries: 4

Rep: Reputation: 3931Reputation: 3931Reputation: 3931Reputation: 3931Reputation: 3931Reputation: 3931Reputation: 3931Reputation: 3931Reputation: 3931Reputation: 3931Reputation: 3931
Just ">>" them to the same filename. ">>" means append, whereas ">" means replace.

Or, if for other reasons you want them to remain separate, run the commands then "cat file1 file2 file3 > outputfile"

Last edited by sundialsvcs; 10-11-2013 at 09:01 AM.
 
Old 10-11-2013, 09:05 AM   #3
georgi
LQ Newbie
 
Registered: Apr 2012
Location: Bulgaria
Distribution: openSuSE
Posts: 15

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by sundialsvcs View Post
Just ">>" them to the same filename. ">>" means append, whereas ">" means replace.

Or, if for other reasons you want them to remain separate, run the commands then "cat file1 file2 file3 > outputfile"
Yes, I would like to keep each of the files as separate and to make the whole process with one run of a script or command line.
 
Old 10-11-2013, 09:13 AM   #4
TenTenths
Senior Member
 
Registered: Aug 2011
Location: Dublin
Distribution: Centos 5 / 6 / 7
Posts: 3,469

Rep: Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553
Code:
#!/bin/bash

for PR in {0..9} ; do
 grep "|${PR}$" /path/to/input/sitelist > /path/to/output/sitelist_PR${PR}.txt
done
Will loop through the numbers 0 to 9

Then tries to grep for the occurance of a bar and that number at the end of each line in the input file and puts them in the output file accordingly.
 
Old 10-11-2013, 01:05 PM   #5
georgi
LQ Newbie
 
Registered: Apr 2012
Location: Bulgaria
Distribution: openSuSE
Posts: 15

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by TenTenths View Post
Code:
for PR in {0..9} ; do
Will loop through the numbers 0 to 9

Then tries to grep for the occurance of a bar and that number at the end of each line in the input file and puts them in the output file accordingly.
Ten, your script might be the one I need, but when I first looked at it I noticed that you wrote PR in the text inside. I have "|<number>" in my text files, like in the example
Code:
|2
at the end. The number is actual PR of that site. So, I need to deal with "|".

Last edited by georgi; 10-11-2013 at 01:33 PM.
 
Old 10-11-2013, 01:48 PM   #6
TenTenths
Senior Member
 
Registered: Aug 2011
Location: Dublin
Distribution: Centos 5 / 6 / 7
Posts: 3,469

Rep: Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553
Combine several commands which write their output to different files

in my example PR is a variable in the "for" loop it is refered to in the main line as ${PR} so first time round the grep command will look for "|0$" where $ tells grep to match it at the end of the line. Next time through the loop it becomes "|1$" etc.
 
Old 10-11-2013, 01:56 PM   #7
georgi
LQ Newbie
 
Registered: Apr 2012
Location: Bulgaria
Distribution: openSuSE
Posts: 15

Original Poster
Rep: Reputation: Disabled
the script creates only empty files

Last edited by georgi; 10-11-2013 at 02:12 PM.
 
Old 10-12-2013, 07:40 AM   #8
TenTenths
Senior Member
 
Registered: Aug 2011
Location: Dublin
Distribution: Centos 5 / 6 / 7
Posts: 3,469

Rep: Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553
Combine several commands which write their output to different files

worked fine for me when I created an input file in the firnat you gave.
 
Old 10-12-2013, 07:55 AM   #9
georgi
LQ Newbie
 
Registered: Apr 2012
Location: Bulgaria
Distribution: openSuSE
Posts: 15

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by TenTenths View Post
worked fine for me when I created an input file in the firnat you gave.
still have empty files although numbered correctly.
anyway, thanks for your efforts and time.
 
Old 10-12-2013, 10:26 AM   #10
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
TenTenth's grep expression is bit stricter than the original. I can think of a couple of cases where this could cause failure:

Does "sitefile" use DOS/Windows line endings (post output of file sitefile)?

Can the page rank be more than one digit?
 
Old 10-12-2013, 01:21 PM   #11
georgi
LQ Newbie
 
Registered: Apr 2012
Location: Bulgaria
Distribution: openSuSE
Posts: 15

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by ntubski View Post
TenTenth's grep expression is bit stricter than the original. I can think of a couple of cases where this could cause failure:

Does "sitefile" use DOS/Windows line endings (post output of file sitefile)?

Can the page rank be more than one digit?
sitefile is converted from windows and your point is excellent. Now I formatted the proper way and the script works.
The point is that script gives me output in a different way than desired i.e. each output file contains only url-s with PR from its name
sitelist_PR1.txt contains only url-s with PR1.

Last edited by georgi; 10-12-2013 at 01:38 PM.
 
Old 10-12-2013, 01:52 PM   #12
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Untested:

Code:
awk -F\| '{for (i=1; i <= 5; i++) { if ($2 >= i) print > ("sitelist_PR" i ".txt") }}' sitelist

Last edited by ntubski; 10-12-2013 at 03:41 PM. Reason: missed closing brace
 
1 members found this post helpful.
Old 10-12-2013, 01:58 PM   #13
georgi
LQ Newbie
 
Registered: Apr 2012
Location: Bulgaria
Distribution: openSuSE
Posts: 15

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by ntubski View Post
Untested:

Code:
awk -F\| '{for (i=1; i <= 5; i++) { if ($2 >= i) print > ("sitelist_PR" i ".txt") }' sitelist
Code:
awk: cmd. line:1: {for (i=1; i <= 5; i++) { if ($2 >= i) print > ("sitelist_PR" i ".txt") }
awk: cmd. line:1:                                                                          ^ unexpected newline or end of string
 
Old 10-12-2013, 03:41 PM   #14
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Ah, missed the closing brace (fixed in edit).
 
Old 10-12-2013, 04:46 PM   #15
georgi
LQ Newbie
 
Registered: Apr 2012
Location: Bulgaria
Distribution: openSuSE
Posts: 15

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by ntubski View Post
Untested:

Code:
awk -F\| '{for (i=1; i <= 5; i++) { if ($2 >= i) print > ("sitelist_PR" i ".txt") }}' sitelist
this did the job. thank you.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] bash: concatenating the output of multiple commands without using temp files twoprop Linux - Software 3 03-16-2012 12:22 AM
write output of two commands to a file pgb205 Programming 1 10-12-2010 01:13 PM
incrontab cron how to combine commands afonit Linux - Software 3 01-30-2009 02:11 PM
Combine output of multiple files in one CSV file say_hi_ravi Programming 4 07-17-2008 03:04 AM
Using commands to output to files, how is this done? nro Linux - Newbie 3 11-11-2003 08:38 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:33 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration