LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 01-04-2019, 08:59 AM   #1
sagardauti123
LQ Newbie
 
Registered: Jan 2019
Posts: 4

Rep: Reputation: Disabled
Question working with for loop/ alternate way welcome


I have 3 files named- log.2018-12-10.gz,log.2018-12-11.gz,log.2018-12-13.gz.

These 3 files contains records in date/time format. (date is according to file name).

Aim is to sum hourwise (08 AM to 22 PM) of total records.

I have used below command in UNIX and output is as-

Command-

for i in `ls -1 za.log.2018-12-1[0-3]*`; do zcat $i|grep -i abcd|cut -c 5-6|egrep "0[8-9]|1[0-9]|2[0-2]"|sort|uniq -c;done


OutPut-

473 08
765 09
957 10
1085 11
1220 12
1205 13
1143 14
1035 15
920 16
752 17
653 18
526 19
389 20
153 21
130 22
395 08
642 09
877 10
1055 11
1163 12
1130 13
935 14
986 15
929 16
724 17
578 18
537 19
317 20
169 21
119 22

Note- Here first col is count and second one is hour.

I want the resulting output in column wise as below-

473 8 395 8 462 8
765 9 642 9 704 9
957 10 877 10 906 10
1085 11 1055 11 953 11
1220 12 1163 12 1180 12
1205 13 1130 13 628 13
1143 14 935 14 645 14
1035 15 986 15 899 15
920 16 929 16 896 16
752 17 724 17 679 17
653 18 578 18 689 18
526 19 537 19 492 19
389 20 317 20 391 20
153 21 169 21 138 21
130 22 119 22 107 22


##
Workaround - I have created separate output files and then used paste command. But want to do in a single command.

(There are more than 3 files.)

Last edited by sagardauti123; 01-04-2019 at 09:01 AM.
 
Old 01-04-2019, 09:20 AM   #2
berndbausch
LQ Addict
 
Registered: Nov 2013
Location: Tokyo
Distribution: Mostly Ubuntu and Centos
Posts: 6,316

Rep: Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002
Quote:
Originally Posted by sagardauti123 View Post
I have 3 files named- log.2018-12-10.gz,log.2018-12-11.gz,log.2018-12-13.gz.

These 3 files contains records in date/time format. (date is according to file name).

Aim is to sum hourwise (08 AM to 22 PM) of total records.

I have used below command in UNIX and output is as-

Command-

for i in `ls -1 za.log.2018-12-1[0-3]*`; do zcat $i|grep -i abcd|cut -c 5-6|egrep "0[8-9]|1[0-9]|2[0-2]"|sort|uniq -c;done
Not sure what your question is. Is the output incorrect? Do you want a better way?

It's hard to say anything without knowing the input and the expected output, but the script can be improved as follows:
  • Why do you use ls to list the files? Just say for i in za.*.
  • You most likely have a command named zgrep that allows you to remove the zcat.
  • sort has an option -u. No need to pipe the output to uniq.
For more feedback, provide the data. And please, add code tags (see below) to commands, input and output. Much easier to read.
 
Old 01-04-2019, 09:26 AM   #3
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,945

Rep: Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325
yes, we need to know more (or probably an example input file would be better.
it looks like a single awk/perl/python script can do this (and will be faster)
 
Old 01-04-2019, 09:45 AM   #4
sagardauti123
LQ Newbie
 
Registered: Jan 2019
Posts: 4

Original Poster
Rep: Reputation: Disabled
Lightbulb

Just adding response to above-mentioned

I have used ls- as there are other files in my dir. And used sort|uniq -c to get count hour wise. Sort-u will not work.

Just adding more clearity to my question.
-
I want to work on structured output.
As you can see my output is in one column format (count-hour).
I have shown above the required output.

There are other records/entries in my file so I have used cut command to get hour field (hh).

Last edited by sagardauti123; 01-04-2019 at 09:47 AM.
 
Old 01-04-2019, 01:11 PM   #5
berndbausch
LQ Addict
 
Registered: Nov 2013
Location: Tokyo
Distribution: Mostly Ubuntu and Centos
Posts: 6,316

Rep: Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002
If the files whose names start with za are regular files, and their names contain no funny characters, then this:
Code:
for i in $(ls -1 za*)
is the same as
Code:
for i in za*
If the filenames contain blanks or other characters that the shell interprets as separators, the “ls” solution will not produce the desired result, but the second solution might.

I now understand what you want: Sum of all records for hour 8 for file 1, then file 2, then file 3 in the first line. Then the same for hour 9 in the second line, and so on.

The problem is that the pipeline inside the for loop first produces output for file 1, then file 2, then file 3 sequentially, so it can’t create the columns you want. Instead, you need to collect the data for all files, then display the count when you have reached the last line in file 3. This is not that easy, and I would say awk is the right tool for it, as pan64 hinted.

To do this, we still need to know the input format, at least the date/time format in the input.

Last edited by berndbausch; 01-04-2019 at 01:19 PM.
 
Old 01-04-2019, 09:09 PM   #6
sagardauti123
LQ Newbie
 
Registered: Jan 2019
Posts: 4

Original Poster
Rep: Reputation: Disabled
Ok below is the sample file record data where hour field is 12th character (in my command I have used cut -c5-6 for temp, you can say cut -c12-13).

Sample File-
log.2018-12-10.gz :
10-12-2018 00:01:15 abcd ......
10-12-2018 03:12:17 abcd ......
.
.
.
10-12-2018 08:16:14 abcd .....
10-12-2018 10:12:01 abcd .....
.
.
.
10-12-2018 22:12:12 abcd .....
10-12-2018 23:01:01 abcd .....

Same for other files. we can say there are day wise files for every month.
We required sum of records for hour 08-22 for specific files hence I have filtered it using grep and wild character.

My command output is in one column for all files hour wise sum and I need file1 output at 1 col. file 2 at 2nd file3 at 3rd col. ....and so on.

Actual output format-

File1
(Sum hour)
1234 08
1232 09

File2
1243 08
1263 22

File3
5423 08
3456 12
3453 22

Expected output-
File1- is at 1st column.
File2 is at 2nd column
File3 is at 3rd column.
 
Old 01-04-2019, 10:15 PM   #7
berndbausch
LQ Addict
 
Registered: Nov 2013
Location: Tokyo
Distribution: Mostly Ubuntu and Centos
Posts: 6,316

Rep: Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002
How well do you know awk?

This is what I came up with, running the following awk program with three file arguments (log.*):
Code:
$ awk '{count[substr($2,1,2)] += 1 }
ENDFILE { for (c in count) result[c] = result[c] " " count[c] " " c; delete count }
END { for (r in result) print result[r] }' log.*
 1 08 1 08 1 08
 2 00 3 00 2 00
 2 10 2 10 2 10
 1 03 1 03 1 03
 1 22 1 22 1 22
 1 23 1 23 2 23
The program is based on one of awk's most powerful features, associative arrays.

The first line collects the lines for each hour.
The hour is used as the index of the array count; to get the hour, I use the substr() function to peel the first two characters off the second field ($2) in each line.

Second line: ENDFILE only exists in the Gnu version of awk, which is normally the version in Linux distros. It may or may not work in BSD or other UNIXes. ENDFILE signals that the end of a file is reached. At this point, I go through the count array and add the result to another array named result. c is an hour, count[c] is the number of times the hour occurred in a file.
After that, I delete the count array so that I can start from scratch with a new file.

Third line: END signals the overall end of input. At that point, I dump out the result array. Unfortunately, associative arrays are not sorted in any way.

Exercises: Correct sorting (I'd pipe the output into the sort command), and labeling the columns with the file names (needs to be added to the awk program I think).

I warmly recommend the awk guide referenced in my signature.

EDIT: I wrote the program based on your comment in the original post:
Quote:
I want the resulting output in column wise as below-

473 8 395 8 462 8
765 9 642 9 704 9
957 10 877 10 906 10
which is not what you say in post 6.

Last edited by berndbausch; 01-04-2019 at 10:17 PM.
 
Old 01-05-2019, 02:31 AM   #8
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,806

Rep: Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207
Postprocess your loop with awk (or perl or bash-4 commands) that provides associative arrays. The array is indexed by column#2 and each element collects the string. The array is printed at the end.
Code:
for i in za.log.2018-12-1[0-3]*.gz; do zcat "$i"|grep -i abcd|cut -c 5-6|egrep "0[8-9]|1[0-9]|2[0-2]"|sort|uniq -c;done | awk '{A[$2]=(A[$2] " " $1)} END {for (i in A) print i,A[i]}'
 
Old 01-05-2019, 03:59 AM   #9
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,945

Rep: Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325
I do not really know if we need that for loop.
Also zcat and grep can be combined, so:
Code:
zgrep -i abcd za.log.2018-12-1[0-3]*.gz # should work. remember, if there were multiple files grep will add filename to the output
also the two greps probable can be combined together:
Code:
zgrep -e -i "201[89] 0[8-9]|1[0-9]|2[0-2] abcd" za.log.2018-12-1[0-3]*.gz # or something similar can work, even cut can be eliminated this way
and now sort and uniq are completely superfluous, because awk can sum up what you need without that
just you need to use associative arrays as it was mentioned
Code:
awk ' BEGIN { FS="[: ]" }   # set field separator to something convenient
                            # now $1 is the filename and $3 is the hour (if I didn't miss something)
      { A[$3][$1]++ }       # this is the sort/uniq in one
      END { 
        for (hour in A) {
           printf hour ":"
           for (host in A[hour]) {
               printf " " A[hour][host]
           }
           printf "\n"
        }
     }                     # print the required result (something like this)
This double loop on the array A is quite similar to the MULTI used here: https://stackoverflow.com/questions/...nsional-arrays, see the last comment.
 
Old 01-08-2019, 03:30 AM   #10
sagardauti123
LQ Newbie
 
Registered: Jan 2019
Posts: 4

Original Poster
Rep: Reputation: Disabled
Trying to make my original post more simplified.

I need hour wise record count separated by "#column" (means record count of new file should be at next column) as shown below- Hour is 12th character position.


My file has below data-

File1
2019-01-04 00:00:19
2019-01-04 00:00:19
2019-01-04 00:00:19
2019-01-04 01:07:38
2019-01-04 01:07:38
2019-01-04 01:07:38
2019-01-04 08:00:39
2019-01-04 08:02:27

File2

2019-01-04 00:00:19
2019-01-04 01:00:19
2019-01-04 02:00:19
2019-01-04 02:07:38
2019-01-04 02:07:38
2019-01-04 10:07:38
2019-01-04 10:00:39
2019-01-04 13:02:27

File3

2019-01-04 08:00:19
2019-01-04 09:00:19
2019-01-04 09:00:19
2019-01-04 10:07:38
2019-01-04 12:07:38
2019-01-04 12:07:38
2019-01-04 19:00:39
2019-01-04 19:02:27


Output

0 3 0 1 8 1
1 3 1 1 9 2
8 2 2 3 10 1
10 2 12 2
13 1 19 2


Here- Red count is of File1, Blue- File2 and Black- File3.

Last edited by sagardauti123; 01-08-2019 at 03:39 AM.
 
Old 01-08-2019, 03:40 AM   #11
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,945

Rep: Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325Reputation: 7325
did you try [to check] any of the posted scripts?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Idea for an alternate way to express network configuration Skaperen Linux - Networking 18 09-29-2011 02:40 PM
Alternate way to update Ubuntu Bjornmagne Linux - Newbie 3 07-31-2011 08:09 AM
Is there an alternate to while(1) loop in the TCP server ? rajat Linux - Networking 1 06-01-2007 03:42 AM
while loop or for loop? mijohnst Programming 18 11-21-2005 04:48 PM
Alternate way of Installing Linux acdc Linux - General 1 09-29-2003 10:38 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 12:47 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration