Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I want to sort the paragraphs which starts with ^Traceback, sort them and find the unique paragraphs, the problem is the block ending could be different and block length also differs.
Any help would be of great use.
====================================================
Traceback (most recent call last):
File "/home/shabbir/apps/django/core/handlers/base.py", line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/home/shabbir/apps/web/views/sbf_views.py", line 213, in search
ctxt = search_browser_filter(request)
File "/home/shabbir/apps/web/views/sbf_views.py", line 560, in search_browser_filter
score=True, sort=sort, sort_order=sort_order, operation='/spell', request=request, **params)
File "/home/shabbir/apps/utils/solrutils.py", line 103, in solr_search
response = s.query(q, fields, highlight, score, sort, sort_order, **params)
File "/home/shabbir/apps/solr/core.py", line 495, in query
request, self.form_headers)
File "/home/shabbir/apps/solr/core.py", line 746, in _post
return check_response_status(self.conn.getresponse())
File "/home/shabbir/apps/solr/core.py", line 994, in check_response_status
raise ex
SolrException: HTTP code=500, reason=Internal Server Error
Traceback (most recent call last):
File "/home/shabbir/apps/django/core/handlers/base.py", line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/home/shabbir/apps/django/views/decorators/cache.py", line 79, in _wrapped_view_func
response = view_func(request, *args, **kwargs)
File "/home/shabbir/apps/payments/views.py", line 64, in process_payment_hdfc
return payment_status(request,payment_attempt)
File "/home/shabbir/apps/payments/views.py", line 548, in payment_status
order.update_inventory(request, action='add', delta=delta_oi)
File "/home/shabbir/apps/orders/models.py", line 2355, in update_inventory
raise exp
InventoryError
====================================================
Best and quickest answers are provided when you post representative sample input and expected output between CODE and /CODE tags, the symbols being surrounded by [ ] -- see the guide in the signature below:
But the problem is it creates as many files as it finds the paragraph in the file, now can some1 help me to find all the unique paragraphs and sort them according to the number of occurrences.
The sample output would be like below:
Code:
=========================Below error appeared 1 time=======================
Traceback (most recent call last):
File "/home/shabbir/apps/django/core/handlers/base.py", line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/home/shabbir/apps/web/views/sbf_views.py", line 213, in search
ctxt = search_browser_filter(request)
File "/home/shabbir/apps/web/views/sbf_views.py", line 560, in search_browser_filter
score=True, sort=sort, sort_order=sort_order, operation='/spell', request=request, **params)
File "/home/shabbir/apps/utils/solrutils.py", line 103, in solr_search
response = s.query(q, fields, highlight, score, sort, sort_order, **params)
File "/home/shabbir/apps/solr/core.py", line 495, in query
request, self.form_headers)
File "/home/shabbir/apps/solr/core.py", line 746, in _post
return check_response_status(self.conn.getresponse())
File "/home/shabbir/apps/solr/core.py", line 994, in check_response_status
raise ex
SolrException: HTTP code=500, reason=Internal Server Error
=========================Below error appeared 1 time=======================
Traceback (most recent call last):
File "/home/shabbir/apps/django/core/handlers/base.py", line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/home/shabbir/apps/django/views/decorators/cache.py", line 79, in _wrapped_view_func
response = view_func(request, *args, **kwargs)
File "/home/shabbir/apps/payments/views.py", line 64, in process_payment_hdfc
return payment_status(request,payment_attempt)
File "/home/shabbir/apps/payments/views.py", line 548, in payment_status
order.update_inventory(request, action='add', delta=delta_oi)
File "/home/shabbir/apps/orders/models.py", line 2355, in update_inventory
raise exp
InventoryError
1) I did not find your daa sample to be representative, so I created a different one that seems more appropriate in that there are several instances of duplicate and unique blocks.
2) I'm glad you tried something, but I didn't understand what you would finally do if you were able to create all the files that your solution produced.
3) Among the *nix commands, there is a command uniq that can eliminate duplicates. However it works on lines, where you have blocks of lines, and uniq also requires the file to be sorted. That happens often enough that command sort can also do that as one result of its operation.
4) So one approach is to create a long line for each of your blocks, sort, and eliminate duplicates. That can be done with standard commands, as the example below shows. This may requires a few extra files (some for illustration), but does not need much memory, and is very general. The steps are:
a) create the long lines (done with awk here), using some character to take the place of the embedded newlines,
b) sort, eliminate duplicates (sort -u),
c) expand blocks to original separated lines (tr).
5) Another approach is to have a code that will track the occurrence of the entire contents of the block as a key. The awk and perl languages both have associative arrays, hashes, that make that easy. However, that's memory-intensive.
The script below is long because it shows the context, the intermediate results, and finally compares the result to the expected output. Concentrate on the inner part that obtains the solution.
Code:
#!/usr/bin/env bash
# @(#) s1 Demonstrate obtaining unique instances of multi-line blocks of text.
# Section 1, setup, pre-solution, $Revision: 1.25 $".
# Infrastructure details, environment, debug commands for forum posts.
# Uncomment export command to run script as external user.
# export PATH="/usr/local/bin:/usr/bin:/bin" HOME=""
set +o nounset
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
db() { : ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
C=$HOME/bin/context && [ -f $C ] && $C awk sort tr
set -o nounset
pe
FILE=${1-data1}
E="expected-output.txt"
# Display sample of data file, with head & tail as a last resort.
db " Section 1: display of input data."
pe " || start sample [ specimen first:middle:last ] $FILE"
specimen 10 $FILE $E 2>/dev/null
pe " || end"
# Section 2, solution.
db " Section 2: solution."
pl " Place all line in a block into a single long line:"
awk '
BEGIN { block = "" }
# /^Traceback/ {
/^Traceback \(most recent call last\):/ {
if ( NR > 1 ) print block
block = $0 ; next
}
{ block = block "@" $0 }
END { print block }
' $FILE |
tee t1
# Sort the file, remove duplicates.
pl " Sort the long lines, leaving only unique lines:"
sort -u t1 |
tee t2
# Expand the blocks into separate lines.
pl " Separate the long lines into individual lines:"
tr '@' '\n' < t2 |
tee f1
# Section 3, post-solution, check results, clean-up, etc.
v1=$(wc -l <expected-output.txt)
v2=$(wc -l < f1)
pl " Comparison of $v2 created lines with $v1 lines of desired results:"
db " Section 3: validate generated calculations with desired results."
pl " Comparison with desired results:"
if [ ! -f expected-output.txt -o ! -s expected-output.txt ]
then
pe " Comparison file \"expected-output.txt\" zero-length or missing."
exit
fi
if cmp expected-output.txt f1
then
pe " Succeeded -- files have same content."
else
pe " Failed -- files not identical -- detailed comparison follows."
if diff -b expected-output.txt f1
then
pe " Succeeded by ignoring whitespace differences."
fi
fi
exit 0
producing:
Code:
% ./s1
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution : Debian GNU/Linux 5.0.8 (lenny)
bash GNU bash 3.2.39
awk GNU Awk 3.1.5
sort (GNU coreutils) 6.10
tr (GNU coreutils) 6.10
db, Section 1: display of input data.
|| start sample [ specimen first:middle:last ] data1
Whole: 10:0:10 of 20 lines in file "data1"
Traceback (most recent call last):
Luci
commediene
extraordinaire
Traceback (most recent call last):
Desi
Traceback (most recent call last):
Fred
grouch
Traceback (most recent call last):
Luci
commediene
extraordinaire
Traceback (most recent call last):
Fred
grouch
Traceback (most recent call last):
Ethel
Traceback (most recent call last):
Little Ricki
Whole: 10:0:10 of 13 lines in file "expected-output.txt"
Traceback (most recent call last):
Desi
Traceback (most recent call last):
Ethel
Traceback (most recent call last):
Fred
grouch
Traceback (most recent call last):
Little Ricki
Traceback (most recent call last):
Luci
commediene
extraordinaire
|| end
db, Section 2: solution.
-----
Place all line in a block into a single long line:
Traceback (most recent call last):@Luci@commediene@extraordinaire
Traceback (most recent call last):@Desi
Traceback (most recent call last):@Fred@grouch
Traceback (most recent call last):@Luci@commediene@extraordinaire
Traceback (most recent call last):@Fred@grouch
Traceback (most recent call last):@Ethel
Traceback (most recent call last):@Little Ricki
-----
Sort the long lines, leaving only unique lines:
Traceback (most recent call last):@Desi
Traceback (most recent call last):@Ethel
Traceback (most recent call last):@Fred@grouch
Traceback (most recent call last):@Little Ricki
Traceback (most recent call last):@Luci@commediene@extraordinaire
-----
Separate the long lines into individual lines:
Traceback (most recent call last):
Desi
Traceback (most recent call last):
Ethel
Traceback (most recent call last):
Fred
grouch
Traceback (most recent call last):
Little Ricki
Traceback (most recent call last):
Luci
commediene
extraordinaire
-----
Comparison of 13 created lines with 13 lines of desired results:
db, Section 3: validate generated calculations with desired results.
-----
Comparison with desired results:
Succeeded -- files have same content.
Adapt as you need to for your data. See man pages for details.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.