LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   File manipulation? (https://www.linuxquestions.org/questions/linux-newbie-8/file-manipulation-807331/)

hadimotamedi 05-12-2010 04:33 AM

File manipulation?
 
1 Attachment(s)
Dear All
From my previous posts, I learned on how to modify my text file to filter out undesired ones. I need to know how to find the number of occurances of distinct records in my text file. Please find attached my text file. Can you please show me the power of my Linux on file manipulation issues? Actually, I want to find the number of occurances of distinct CallId in my logfile.

PMP 05-12-2010 04:46 AM

From what I understood !!

This will count number of uniquer CallIs
Code:

cut -d" " -f2 <log_filename>  | sort -u | wc -l

grail 05-12-2010 05:01 AM

Code:

awk 'END{print NR}' logfile4.txt

PMP 05-12-2010 05:04 AM

@ grail

This will not give you the unique count, as I understood OP want count of unique Callids

hadimotamedi 05-12-2010 05:52 AM

I am getting two different results from the following two codes :
#awk '{print $2}' logfile4 | sort -u | wc -l
#cut -d " " -f2 logfile4 | sort -u | wc -l
In your opinion , which one is correct?

PMP 05-12-2010 06:17 AM

What is the difference ?

__raHulk 05-12-2010 06:22 AM

The correct answer is the one with awk
ie. ~~>"awk '{print $2}' logfile4 | sort -u | wc -l"<~~

The manner in which cut is used above fails for the below given lines where the second field starts after "two" blank spaces after the first field ends; whereas awk simply ignores the consecutive blank spaces while separating the columns.

CallId 400 State TK bt 2 bt 0 Tr (2 0x0d) E (3 0 1) Tr (0 0 2)
CallId 3 State TK bt 7 bt 2 Tr (13 0x0f) E (4 1 11) Tr (0 2 0)
CallId 3 State TK bt 7 bt 2 Tr (13 0x0f) E (4 1 11) Tr (0 2 0)
CallId 3 State TK bt 7 bt 2 Tr (13 0x0f) E (4 1 11) Tr (0 2 0)

catkin 05-12-2010 06:24 AM

Quote:

Originally Posted by hadimotamedi (Post 3965595)
I am getting two different results from the following two codes :
#awk '{print $2}' logfile4 | sort -u | wc -l
#cut -d " " -f2 logfile4 | sort -u | wc -l
In your opinion , which one is correct?

The awk one. awk treats any number of spaces as a word separator. Cut takes a single space as a field delimiter so awk always gets the CallID value and sometimes cut gets a space. Here's an illustration
Code:

c@CW8:~$ echo 'CallId  9  State TK' | cut -d " " -f2

c@CW8:~$ echo 'CallId 9  State TK' | cut -d " " -f2
9


__raHulk 05-12-2010 06:25 AM

Oops...
The extra space is not visible in the post, although it is present.
Just vi the file and go to line number 1393 and u will be able to see that.

hadimotamedi 05-12-2010 06:59 AM

Thank you very much. So I will base my calculations on the result of 'awk' output.

grail 05-12-2010 07:14 AM

Thanks PMP, missed that bit :( This should do:
Code:

awk '!_[$2]++{uniq++}END{print uniq}' logfile4.txt

MTK358 05-12-2010 07:56 AM

I'm confused about what the OP wants.

schneidz 05-12-2010 02:31 PM

Code:

awk '{print $2}' logfile.txt | sort | uniq | wc -l

MTK358 05-12-2010 02:50 PM

Code:

cut --delimiter=' ' --fields=2 | sort --unique | wc --lines


All times are GMT -5. The time now is 02:33 AM.