LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 02-29-2012, 02:52 AM   #1
kshabbir
LQ Newbie
 
Registered: Feb 2012
Posts: 3

Rep: Reputation: Disabled
sorting & finding unique in python logs


Hi,

I want to sort the paragraphs which starts with ^Traceback, sort them and find the unique paragraphs, the problem is the block ending could be different and block length also differs.

Any help would be of great use.

====================================================
Traceback (most recent call last):
File "/home/shabbir/apps/django/core/handlers/base.py", line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/home/shabbir/apps/web/views/sbf_views.py", line 213, in search
ctxt = search_browser_filter(request)
File "/home/shabbir/apps/web/views/sbf_views.py", line 560, in search_browser_filter
score=True, sort=sort, sort_order=sort_order, operation='/spell', request=request, **params)
File "/home/shabbir/apps/utils/solrutils.py", line 103, in solr_search
response = s.query(q, fields, highlight, score, sort, sort_order, **params)
File "/home/shabbir/apps/solr/core.py", line 495, in query
request, self.form_headers)
File "/home/shabbir/apps/solr/core.py", line 746, in _post
return check_response_status(self.conn.getresponse())
File "/home/shabbir/apps/solr/core.py", line 994, in check_response_status
raise ex
SolrException: HTTP code=500, reason=Internal Server Error
Traceback (most recent call last):
File "/home/shabbir/apps/django/core/handlers/base.py", line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/home/shabbir/apps/django/views/decorators/cache.py", line 79, in _wrapped_view_func
response = view_func(request, *args, **kwargs)
File "/home/shabbir/apps/payments/views.py", line 64, in process_payment_hdfc
return payment_status(request,payment_attempt)
File "/home/shabbir/apps/payments/views.py", line 548, in payment_status
order.update_inventory(request, action='add', delta=delta_oi)
File "/home/shabbir/apps/orders/models.py", line 2355, in update_inventory
raise exp
InventoryError
====================================================
 
Old 03-06-2012, 07:50 PM   #2
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 735

Rep: Reputation: 76
Hi.

Welcome to the forum.

Best and quickest answers are provided when you post representative sample input and expected output between CODE and /CODE tags, the symbols being surrounded by [ ] -- see the guide in the signature below:
Code:
this is a code block
Telling us what you have tried so far helps too.

Best wishes ... cheers, makyo
 
Old 03-07-2012, 01:20 AM   #3
kshabbir
LQ Newbie
 
Registered: Feb 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
So far i have got a little success with the below code:

Code:
awk '/^Traceback/{if(NR!=1){for(i=0;i<j;i++)print a[i]>"file"k;j=0;k++;}a[j++]=$0;next}{a[j++]=$0;}END{for(i=0;i<j;i++)print a[i]>"file"k}' i=0 k=1  <filename>
But the problem is it creates as many files as it finds the paragraph in the file, now can some1 help me to find all the unique paragraphs and sort them according to the number of occurrences.

The sample output would be like below:

Code:
=========================Below error appeared 1 time=======================
Traceback (most recent call last):
File "/home/shabbir/apps/django/core/handlers/base.py", line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/home/shabbir/apps/web/views/sbf_views.py", line 213, in search
ctxt = search_browser_filter(request)
File "/home/shabbir/apps/web/views/sbf_views.py", line 560, in search_browser_filter
score=True, sort=sort, sort_order=sort_order, operation='/spell', request=request, **params)
File "/home/shabbir/apps/utils/solrutils.py", line 103, in solr_search
response = s.query(q, fields, highlight, score, sort, sort_order, **params)
File "/home/shabbir/apps/solr/core.py", line 495, in query
request, self.form_headers)
File "/home/shabbir/apps/solr/core.py", line 746, in _post
return check_response_status(self.conn.getresponse())
File "/home/shabbir/apps/solr/core.py", line 994, in check_response_status
raise ex
SolrException: HTTP code=500, reason=Internal Server Error
=========================Below error appeared 1 time=======================
Traceback (most recent call last):
File "/home/shabbir/apps/django/core/handlers/base.py", line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/home/shabbir/apps/django/views/decorators/cache.py", line 79, in _wrapped_view_func
response = view_func(request, *args, **kwargs)
File "/home/shabbir/apps/payments/views.py", line 64, in process_payment_hdfc
return payment_status(request,payment_attempt)
File "/home/shabbir/apps/payments/views.py", line 548, in payment_status
order.update_inventory(request, action='add', delta=delta_oi)
File "/home/shabbir/apps/orders/models.py", line 2355, in update_inventory
raise exp
InventoryError

Thanks in Advance,
Shabbir
 
Old 03-07-2012, 08:56 AM   #4
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 735

Rep: Reputation: 76
Hi.

Thanks for reposting.

Comments:

1) I did not find your daa sample to be representative, so I created a different one that seems more appropriate in that there are several instances of duplicate and unique blocks.

2) I'm glad you tried something, but I didn't understand what you would finally do if you were able to create all the files that your solution produced.

3) Among the *nix commands, there is a command uniq that can eliminate duplicates. However it works on lines, where you have blocks of lines, and uniq also requires the file to be sorted. That happens often enough that command sort can also do that as one result of its operation.

4) So one approach is to create a long line for each of your blocks, sort, and eliminate duplicates. That can be done with standard commands, as the example below shows. This may requires a few extra files (some for illustration), but does not need much memory, and is very general. The steps are:
a) create the long lines (done with awk here), using some character to take the place of the embedded newlines,
b) sort, eliminate duplicates (sort -u),
c) expand blocks to original separated lines (tr).

5) Another approach is to have a code that will track the occurrence of the entire contents of the block as a key. The awk and perl languages both have associative arrays, hashes, that make that easy. However, that's memory-intensive.

The script below is long because it shows the context, the intermediate results, and finally compares the result to the expected output. Concentrate on the inner part that obtains the solution.
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate obtaining unique instances of multi-line blocks of text.

# Section 1, setup, pre-solution, $Revision: 1.25 $".
# Infrastructure details, environment, debug commands for forum posts. 
# Uncomment export command to run script as external user.
# export PATH="/usr/local/bin:/usr/bin:/bin" HOME=""
set +o nounset
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
db() { : ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
C=$HOME/bin/context && [ -f $C ] && $C awk sort tr
set -o nounset
pe

FILE=${1-data1}
E="expected-output.txt"

# Display sample of data file, with head & tail as a last resort.
db " Section 1: display of input data."
pe " || start sample [ specimen first:middle:last ] $FILE"
specimen 10 $FILE $E  2>/dev/null
pe " || end"

# Section 2, solution.
db " Section 2: solution."
pl " Place all line in a block into a single long line:"
awk '
BEGIN	{ block = "" }
# /^Traceback/	{
/^Traceback \(most recent call last\):/	{
	if ( NR > 1 ) print block
	block = $0 ; next 
	}
	{ block = block "@" $0 }
END	{ print block }
' $FILE |
tee t1

# Sort the file, remove duplicates.
pl " Sort the long lines, leaving only unique lines:"
sort -u t1 |
tee t2

# Expand the blocks into separate lines.
pl " Separate the long lines into individual lines:"
tr '@' '\n' < t2 |
tee f1

# Section 3, post-solution, check results, clean-up, etc.
v1=$(wc -l <expected-output.txt)
v2=$(wc -l < f1)
pl " Comparison of $v2 created lines with $v1 lines of desired results:"
db " Section 3: validate generated calculations with desired results."

pl " Comparison with desired results:"
if [ ! -f expected-output.txt -o ! -s expected-output.txt ]
then
  pe " Comparison file \"expected-output.txt\" zero-length or missing."
  exit
fi
if cmp expected-output.txt f1
then
  pe " Succeeded -- files have same content."
else
  pe " Failed -- files not identical -- detailed comparison follows."
  if diff -b expected-output.txt f1
  then
    pe " Succeeded by ignoring whitespace differences."
  fi
fi

exit 0
producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
awk GNU Awk 3.1.5
sort (GNU coreutils) 6.10
tr (GNU coreutils) 6.10

 db,  Section 1: display of input data.
 || start sample [ specimen first:middle:last ] data1
Whole: 10:0:10 of 20 lines in file "data1"
Traceback (most recent call last):
Luci
commediene
extraordinaire
Traceback (most recent call last):
Desi
Traceback (most recent call last):
Fred
grouch
Traceback (most recent call last):
Luci
commediene
extraordinaire
Traceback (most recent call last):
Fred
grouch
Traceback (most recent call last):
Ethel
Traceback (most recent call last):
Little Ricki

Whole: 10:0:10 of 13 lines in file "expected-output.txt"
Traceback (most recent call last):
Desi
Traceback (most recent call last):
Ethel
Traceback (most recent call last):
Fred
grouch
Traceback (most recent call last):
Little Ricki
Traceback (most recent call last):
Luci
commediene
extraordinaire
 || end
 db,  Section 2: solution.

-----
 Place all line in a block into a single long line:
Traceback (most recent call last):@Luci@commediene@extraordinaire
Traceback (most recent call last):@Desi
Traceback (most recent call last):@Fred@grouch
Traceback (most recent call last):@Luci@commediene@extraordinaire
Traceback (most recent call last):@Fred@grouch
Traceback (most recent call last):@Ethel
Traceback (most recent call last):@Little Ricki

-----
 Sort the long lines, leaving only unique lines:
Traceback (most recent call last):@Desi
Traceback (most recent call last):@Ethel
Traceback (most recent call last):@Fred@grouch
Traceback (most recent call last):@Little Ricki
Traceback (most recent call last):@Luci@commediene@extraordinaire

-----
 Separate the long lines into individual lines:
Traceback (most recent call last):
Desi
Traceback (most recent call last):
Ethel
Traceback (most recent call last):
Fred
grouch
Traceback (most recent call last):
Little Ricki
Traceback (most recent call last):
Luci
commediene
extraordinaire

-----
 Comparison of 13 created lines with 13 lines of desired results:
 db,  Section 3: validate generated calculations with desired results.

-----
 Comparison with desired results:
 Succeeded -- files have same content.
Adapt as you need to for your data. See man pages for details.

Best wishes ... cheers, makyo
 
Old 03-08-2012, 02:05 AM   #5
kshabbir
LQ Newbie
 
Registered: Feb 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Mayko,

Thanks for the detailed explanation
Will try your solution and get back to you..

Thanks again.

Regards,
Shabbir
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] list sorting in python aihaike Programming 4 05-12-2011 12:20 AM
Linux sorting unique help? mbag102 Linux - Newbie 16 10-10-2009 03:23 PM
Finding LDAP Server Logs / Application Logs in Linux arbignay Linux - Newbie 2 03-24-2008 09:54 AM
Bash Script- Finding/Generating unique UserIDs pheasand Linux - General 2 12-11-2004 09:44 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 07:21 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration