LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-12-2010, 05:33 AM   #1
hadimotamedi
Member
 
Registered: Aug 2009
Posts: 228

Rep: Reputation: 30
File manipulation?


Dear All
From my previous posts, I learned on how to modify my text file to filter out undesired ones. I need to know how to find the number of occurances of distinct records in my text file. Please find attached my text file. Can you please show me the power of my Linux on file manipulation issues? Actually, I want to find the number of occurances of distinct CallId in my logfile.
Attached Files
File Type: txt logfile4.txt (152.1 KB, 14 views)
 
Old 05-12-2010, 05:46 AM   #2
PMP
Member
 
Registered: Apr 2009
Location: ~
Distribution: RHEL, Fedora
Posts: 381

Rep: Reputation: 58
From what I understood !!

This will count number of uniquer CallIs
Code:
cut -d" " -f2 <log_filename>  | sort -u | wc -l
 
Old 05-12-2010, 06:01 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,552

Rep: Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898
Code:
awk 'END{print NR}' logfile4.txt
 
Old 05-12-2010, 06:04 AM   #4
PMP
Member
 
Registered: Apr 2009
Location: ~
Distribution: RHEL, Fedora
Posts: 381

Rep: Reputation: 58
@ grail

This will not give you the unique count, as I understood OP want count of unique Callids
 
Old 05-12-2010, 06:52 AM   #5
hadimotamedi
Member
 
Registered: Aug 2009
Posts: 228

Original Poster
Rep: Reputation: 30
I am getting two different results from the following two codes :
#awk '{print $2}' logfile4 | sort -u | wc -l
#cut -d " " -f2 logfile4 | sort -u | wc -l
In your opinion , which one is correct?
 
Old 05-12-2010, 07:17 AM   #6
PMP
Member
 
Registered: Apr 2009
Location: ~
Distribution: RHEL, Fedora
Posts: 381

Rep: Reputation: 58
What is the difference ?
 
Old 05-12-2010, 07:22 AM   #7
__raHulk
Member
 
Registered: Apr 2010
Location: Mumbai
Distribution: RHEL, Debian, Fedora, Ubuntu
Posts: 39
Blog Entries: 1

Rep: Reputation: 16
The correct answer is the one with awk
ie. ~~>"awk '{print $2}' logfile4 | sort -u | wc -l"<~~

The manner in which cut is used above fails for the below given lines where the second field starts after "two" blank spaces after the first field ends; whereas awk simply ignores the consecutive blank spaces while separating the columns.

CallId 400 State TK bt 2 bt 0 Tr (2 0x0d) E (3 0 1) Tr (0 0 2)
CallId 3 State TK bt 7 bt 2 Tr (13 0x0f) E (4 1 11) Tr (0 2 0)
CallId 3 State TK bt 7 bt 2 Tr (13 0x0f) E (4 1 11) Tr (0 2 0)
CallId 3 State TK bt 7 bt 2 Tr (13 0x0f) E (4 1 11) Tr (0 2 0)
 
Old 05-12-2010, 07:24 AM   #8
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
Quote:
Originally Posted by hadimotamedi View Post
I am getting two different results from the following two codes :
#awk '{print $2}' logfile4 | sort -u | wc -l
#cut -d " " -f2 logfile4 | sort -u | wc -l
In your opinion , which one is correct?
The awk one. awk treats any number of spaces as a word separator. Cut takes a single space as a field delimiter so awk always gets the CallID value and sometimes cut gets a space. Here's an illustration
Code:
c@CW8:~$ echo 'CallId  9  State TK' | cut -d " " -f2

c@CW8:~$ echo 'CallId 9  State TK' | cut -d " " -f2
9

Last edited by catkin; 05-12-2010 at 08:28 AM. Reason: Speeling
 
Old 05-12-2010, 07:25 AM   #9
__raHulk
Member
 
Registered: Apr 2010
Location: Mumbai
Distribution: RHEL, Debian, Fedora, Ubuntu
Posts: 39
Blog Entries: 1

Rep: Reputation: 16
Oops...
The extra space is not visible in the post, although it is present.
Just vi the file and go to line number 1393 and u will be able to see that.
 
Old 05-12-2010, 07:59 AM   #10
hadimotamedi
Member
 
Registered: Aug 2009
Posts: 228

Original Poster
Rep: Reputation: 30
Thank you very much. So I will base my calculations on the result of 'awk' output.
 
Old 05-12-2010, 08:14 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,552

Rep: Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898
Thanks PMP, missed that bit This should do:
Code:
awk '!_[$2]++{uniq++}END{print uniq}' logfile4.txt
 
Old 05-12-2010, 08:56 AM   #12
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 721Reputation: 721Reputation: 721Reputation: 721Reputation: 721Reputation: 721Reputation: 721
I'm confused about what the OP wants.
 
Old 05-12-2010, 03:31 PM   #13
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 5,150

Rep: Reputation: 887Reputation: 887Reputation: 887Reputation: 887Reputation: 887Reputation: 887Reputation: 887
Code:
awk '{print $2}' logfile.txt | sort | uniq | wc -l
 
Old 05-12-2010, 03:50 PM   #14
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 721Reputation: 721Reputation: 721Reputation: 721Reputation: 721Reputation: 721Reputation: 721
Code:
cut --delimiter=' ' --fields=2 | sort --unique | wc --lines
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
File manipulation in C kalimat Programming 1 05-05-2010 09:47 AM
script for file manipulation yongitz Programming 3 08-17-2006 12:22 PM
file name manipulation chocolatetoothpaste Linux - Software 19 04-03-2006 11:18 AM
Mass file manipulation Drack Linux - General 5 02-27-2006 07:40 AM
file manipulation with c C.Aymen Programming 2 09-01-2005 01:48 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:10 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration