LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 12-16-2005, 03:14 PM   #1
labelle7152
LQ Newbie
 
Registered: Dec 2005
Posts: 7

Rep: Reputation: 0
problem with sort in linux and sed


Help! The group I work with just switched from a Unix based computer to a Linux based computer. everything works pretty much the same way but sort. On Unix I could do a merge and sort on a huge SLR monthly file and it would put it in cronical order with out much troubles. In Linux it runs into the blanks and give me day 1-10 then day 2-20 ect.

I know I can do all the blanks to zero but I don't want to do that because it will change the standardized format the data is in, but what I need to know is, is there a way to replace the any blanks that occur in the first 27 columes of every line with a zero in sed.?
any advice on sed or sort would be greatly appicated
Ruth
 
Old 12-16-2005, 03:36 PM   #2
sirclif
Member
 
Registered: Sep 2004
Location: south texas
Distribution: fedora core 3,4; gentoo
Posts: 192

Rep: Reputation: 30
what exactly do the strings you want to sort look like? you can pass sort the -t option to tell it what to use as a field delimieter, then use -k to specify what field to sort by first, second and so on and -n to use numberic sort.

if your strings are as simple as xx-xx, like:

2-20
1-10
3-23

then sort -t- -nk1,2 file.txt

would do what i think you want.
 
Old 12-16-2005, 09:19 PM   #3
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Have you checked the man page for sort?
Also, I suggest the Bash Guide for Beginners by Machtelt Garrels--it's on TLDP
 
Old 12-19-2005, 09:50 AM   #4
labelle7152
LQ Newbie
 
Registered: Dec 2005
Posts: 7

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by pixellany
Have you checked the man page for sort?
Also, I suggest the Bash Guide for Beginners by Machtelt Garrels--it's on TLDP
I have checked out the man pages and I also have Linux in a Nutshell
plus I have bugged any one I can find who know any thing about Linux. What's TLDp Mean?

sirclif-
Here is an example of the line in the files I am trying to sort.
(This is all one line wraped)

860610103 1677821583010578208201 475289216872 10644867574 3155320 80172816 74 128587 0 1573 107243 001

These are monthly Satellite Laser Ranging(SLR) files. The first seven
columns is the satalite number. Column 8-27 is the date. that the only area I need to replace any blank that occur with zero. Then I can sort it using a general numeric sort and it will sort the right way.
 
Old 12-19-2005, 12:46 PM   #5
sirclif
Member
 
Registered: Sep 2004
Location: south texas
Distribution: fedora core 3,4; gentoo
Posts: 192

Rep: Reputation: 30
ok, im sorry, but im having a hard time figureing out what you mean by 'column'. is each number a columen, or each group of numbers seperated by a space?

does this number represent the date --> 03 16778215830105782?

if so, how? i'm sure we can find a regular expression to do what you need, but i'm having trouble understanding the data format.
 
Old 12-19-2005, 07:52 PM   #6
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Quote:
Originally Posted by labelle7152
What's TLDp Mean?.
The Linux Documentation Project---sorry
 
Old 12-19-2005, 08:18 PM   #7
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
I'm with Sirclif, no idea what the criteria are you're trying
to sort by, nor can I see anything that I'd pick to be the re-
presentation of a date in your sample.

If the problem is caused by the space (or the alignment of numerical
values) you can use -g to tell it to sort those by value rather than
by ASCII codes.

E.g.
Code:
cat sortem.txt
1
25
3
4
19
7
8
23
9
10
11
28
12
2
14
17
5
18
20
21
22
6
13
24
16
26
15
27
29
Code:
sort sortem.txt
1
10
11
12
13
14
15
16
17
18
19
2
20
21
22
23
24
25
26
27
28
29
3
4
5
6
7
8
9
But what you want is
Code:
sort -g sortem.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

Cheers,
Tink
 
Old 12-20-2005, 09:22 AM   #8
labelle7152
LQ Newbie
 
Registered: Dec 2005
Posts: 7

Original Poster
Rep: Reputation: 0
sirclif
that is the date it's the year 03 the julian day 167 and the time in
Hunderds of seconds ending at 10, the 5782 is not part of the time its the beginning of the ground station number where the data is collected. so it clairfy the date is (03 1677821583010) This is the only part of the line which I need the blank replaced by zero's in. It is the part I sort by.

Tinkster
I tryed using -g it runs into the blanks and sorts
1
11
12
13
14
15
16
17
18
19
2
20
ect.
If I use the -u to get rid of any duplicate lines it runs into the blanks
thinks all other lines are the same and gives me only one line.
 
Old 12-20-2005, 11:18 AM   #9
sirclif
Member
 
Registered: Sep 2004
Location: south texas
Distribution: fedora core 3,4; gentoo
Posts: 192

Rep: Reputation: 30
ok i think i understand your problem now. i think this will work.

Code:
$ cat file.SLR | sed "s/^.\{7\}\(.\{16\}\).*/\1 X\0/" | sort -gk1,2 | sed "s/^.*X//"
this will...

display contents of file, send it to sed

sed then skips the first 7 characters ( the date starts at character 8), saves the next 17 characters (03 1677821583010 is 16 characters), then matches to the rest of the line.

sed replaces the line with a line that has the 16 important characters at the front, then a space, then an 'X', then the original line, so...

860610103 1677821583010578208201 475289216872 10644867574 3155320 80172816 74 128587 0 1573 107243 001

becomes...

03 1677821583010 X860610103 1677821583010578208201 475289216872 10644867574 3155320 80172816 74 128587 0 1573 107243 001

this is passed to sort, which does a general numeric sort (-g), first sorts by column 1 ("03" specified by -k1) then column 2 ("1677821583010" by ,2). This may be overkill, simply using sort worked for me, but this will make sure for you.

the sorted list is passed back to sed which removes the string at the beginning of the line we put there to sort by matching all characters up to an 'X' and replacing it with nothing. this should work even if an 'X' is present in the line, because sed will only match to the first X it finds.

hope this is what your looking for.

also, Tinkster may have some suggestions on making the sed command more readable, it is cluttered with '\' to excape the metacharacters.
 
Old 12-20-2005, 11:37 AM   #10
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Ruth,

I'm still curious: which Unix were you using, and what was
the original invocation of the sort on that platform?

And can you please post an example of more than one row, and
what the sort should give you as well as what you get?

I've tried (based on the description of what you and Sirclif
were talking about ) with a little dummy file (copied your one
line repeatedly, and edited a few of the julian days), and a
plain sort without any further parameters seems to do what I'd
have expected.


Cheers,
Tink

P.S.: Sirclif: unless you stick your sed-command in a file
there's no way I'm aware off of escaping the escapes ;}
 
Old 12-21-2005, 09:39 AM   #11
labelle7152
LQ Newbie
 
Registered: Dec 2005
Posts: 7

Original Poster
Rep: Reputation: 0
Tinkster
here is my orginal Unix scrip program to merge and sort the daily files
for the glonass89 satilite into a monthly file.
Quote:
#!/bin/csh -f

set kt = 0501
set lt = glonass89
set lt2 = glonass89
set kt_a = $kt"_a"
set kt_b = $kt"_b"
set lt_a = $lt"_a"

cd ~labelle/slr/slrfr

cp /slr/data/fr/$lt2/daily/*/7*_$kt* ~labelle/slr/slrfr

uncompress 7*_$kt*.$lt.Z
set error = $status
if ($error == 1) then
goto next1
endif

sort -u -m -o $lt_a.$kt *$kt*.$lt

compress $lt_a.$kt

"slrmerg6" 30L, 455C 3,1 Top

If I just wanted to resort a file after a corrected bad data I would use
sort -u -o glonass89_b1.0305 glonass89_b.0305
One of my smaller files is 75000 line long
Here is an example of the results of the program above on linux

Quote:
020600105 25681157525912709005131681338541305134001854807 5320 98022895 8919549 3723 73 162593 0 470 2311110021A
020600105 25681160025908709005131681305541318134001404517 5320 98022895 8919549 3723 81 162593 0 470 2311110021A
020600105 25681165025912709005131681241541344134000503457 5320 98022895 8919548 3723 81 162593 0 470 2311110021A
020600105 25681172525904709005131681144541383133999152677 5320 98022895 8919547 3723 58 162593 0 470 2311110021A
020600105 25681177525914709005131681079541409133998252137 5320 98022895 8919547 3723 62 162593 0 470 2311110021A
020600105 2605132525964709005132454563356259141928052211 5320 98152880 8727194 3723 79 162609 -5 410 2311110021A
020600105 2605202525958709005132453802356423141919784111 5320 98152880 8727183 3723 56 162609 -5 410 2311110021A
020600105 2605237525964709005132453422356505141915655091 5320 98152880 8727178 3723 73 162609 -5 410 2311110021A
020600105 2605242525960709005132453368356517141915065441 5320 98152880 8727177 3723 52 162609 -5 410 2311110021A
020600105 2605375025946709005132451927356826141899465361 5320 98152880 8727156 3723 96 162609 -5 410 2311110021A
020600105 2605407525968709005132451574356902141895646061 5320 98152880 8727151 3723 51 162609 -5 410 2311110021A
@
when correctly sorted with unix this the top of the file
Quote:
020600105 2605132525964709005132454563356259141928052211 5320 98152880 8727194 3723 79 162609 -5 410 2311110021A020600105 2605202525958709005132453802356423141919784111 5320 98152880 8727183 3723 56 162609 -5 410 2311110021A020600105 2605237525964709005132453422356505141915655091 5320 98152880 8727178 3723 73 162609 -5 410 2311110021A
020600105 2605242525960709005132453368356517141915065441 5320 98152880 8727177 3723 52 162609 -5 410 2311110021A
020600105 2605375025946709005132451927356826141899465361 5320 98152880 8727156 3723 96 162609 -5 410 2311110021A
020600105 2605407525968709005132451574356902141895646061 5320 98152880 8727151 3723 51 162609 -5 410 2311110021A
020600105 2605417525958709005132451465356925141894471391 5320 98152880 8727150 3723 33 162609 -5 410 2311110021A
020600105 2605422525962709005132451411356937141893884071 5320 98152880 8727149 3723 41 162609 -5 410 2311110021A
020600105 2605427525956709005132451356356949141893296901 5320 98152880 8727148 3723 70 162609 -5 410 2311110021A
020600105 2605435026008709005132451275356966141892416301 5320 98152880 8727147 3723 51 162609 -5 410 2311110021A
020600105 2605467525956709005132450921357042141888602391 5320 98152880 8727142 3723 56 162609 -5 410 2311110021A
This is the middle of the file with the 25th day correctly sorted
Quote:
020600105 24747090025926709005131888220653672130696084270 5320 98162925 6617456 3723 127 162600 4 460 2311110021A
020600105 24747147525946709005131887976654215130683028920 5320 98162925 6617448 3723 50 162600 4 460 2311110021A
020600105 24747152525954709005131887955654262130681894630 5320 98162925 6617448 3723 109 162600 4 460 2311110021A
020600105 24747155025922709005131887944654286130681327710 5320 98162925 6617447 3723 72 162600 4 460 2311110021A
020600105 24747160026002709005131887923654333130680193960 5320 98162925 6617447 3723 52 162600 4 460 2311110021A
020600105 25644875025894709005132048411354472142086354647 5320 98012903 8527276 3723 50 162583 9 450 2311110021A
020600105 25644882525900709005132048347354508142084520737 5320 98012903 8527274 3723 30 162583 9 450 2311110021A
020600105 25644950025902709005132047767354829142068016287 5320 98012903 8527253 3723 38 162583 9 450 2311110021A
020600105 25644960025892709005132047681354877142065571947 5320 98012903 8527249 3723 40 162583 9 450 2311110021A
020600105 25644970025926709005132047595354924142063126417 5320 98012903 8527246 3723 46 162583 9 450 2311110021A
020600105 25644975025894709005132047552354948142061904177 5320 98012903 8527245 3723 91 162583 9 450 2311110021A
@
sirclif
the problem with your suggestion is moving the date and satellite number corrupts the format of the data . Also depending on the month and the seconds
such 12:01 AM the can be mor than one blank in the time
 
Old 12-21-2005, 10:55 AM   #12
sirclif
Member
 
Registered: Sep 2004
Location: south texas
Distribution: fedora core 3,4; gentoo
Posts: 192

Rep: Reputation: 30
the format is only "corrupted" when it is sent to sort, then "uncorrupted" by passing through sed again. so what is printed to the screen looks just like what you have shown. you can also modify the regular expression to be more general to account for such cases.
 
Old 12-21-2005, 02:10 PM   #13
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
I hope that I finally understood
Code:
$ sort -b -g -k1.8,1.9 -k2.1,2.3 sortem.txt 
860610103  4 83907765675740579011994593814105 10116725553    2044235 99782901 66    0     0    0  107639     1  710   12421111123A
860610103 11 74316042112740579012396120345812 15081516190 2224235100002914 59    0     0    0  107604    18  770   12421111123A
860610103 12 39881575180740579012734740235109 18284224508    2274235 99932901 63    0     0    0  107584    11  670   12421111123A
860610103 24 91863414314740579012167798335675 15324178587 2414235100092935 45    0     0    0  107857     4  850   12421111123A
860610103 10185275575500740579011247287318719 15680040163    2404235 99972878 70    0     0    0  107599     0  550   12421111123A
860610103 10185279575530740579011246675318425 15687625889    2404235 99972878 70    0     0    0  107599     0  550   12421111123A
860610103 10185282575490740579011246218318204 15693323275    2404235 99972878 70    0     0    0  107599     0  550   12421111123A
860610103 10185283575479740579011246065318130 15695224140    2404235 99972878 70    0     0    0  107599     0  550   12421111123A
860610103 10185288575461740579011245304317761 15704739005    2404235 99972878 70    0     0    0  107599     0  550   12421111123A
860610103 10185289575478740579011245151317688 15706644242    2404235 99972878 70    0     0    0  107599     0  550   12421111123A
860610103 10185290575511740579011244999317614 15708550818    2404235 99972878 70    0     0    0  107599     0  550   12421111123A


Cheers,
Tink
 
Old 12-22-2005, 08:41 AM   #14
labelle7152
LQ Newbie
 
Registered: Dec 2005
Posts: 7

Original Poster
Rep: Reputation: 0
sirclif
I tryed this command

cat ajisai_e.0301 | sed "s/^.\{7\}\(.\{16\}\).*/\1 X\0/" | sort -gk1,2 | sed "s/^.*X//" > ajisai_e2.0301
and got this results
Quote:
860610103 10185275575500740579011247287318719 15680040163 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185279575530740579011246675318425 15687625889 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185282575490740579011246218318204 15693323275 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185283575479740579011246065318130 15695224140 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185288575461740579011245304317761 15704739005 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185289575478740579011245151317688 15706644242 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185290575511740579011244999317614 15708550818 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185291575486740579011244847317540 15710456947 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185292575493740579011244695317466 15712364660 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185293575499740579011244543317393 15714273367 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185275575500740579011247287318719 15680040163 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
it starts the file at day 10 and not 3 it seems to still be sorting 10,11,12,13,14,15,16,17,18,19,2,20 ect

Last edited by labelle7152; 12-22-2005 at 08:43 AM.
 
Old 12-22-2005, 10:05 AM   #15
sirclif
Member
 
Registered: Sep 2004
Location: south texas
Distribution: fedora core 3,4; gentoo
Posts: 192

Rep: Reputation: 30
ok, i'm seeing your problem now. so how do you know how many digits the day takes up? in the above example, how can you tell that it's day 10 and not day 101? does the lenght of the string in the second column change? the year is the last two digits of the first columns string right? but the day could be either the first, first two, or first three digits of the second columns string?

sorry if im being a little slow here, but i dont fully understand the format you are requireing.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to delete duplicates entries in xml file using sed/awk/sort ? catzilla Linux - Software 1 10-28-2005 02:57 PM
Sharing internet from Mac to Linux(PC) - Some sort of DNS problem? lloyd_smart Linux - Networking 8 05-18-2005 05:08 PM
How to loop or sort in bash, awk or sed? j4r0d Programming 1 09-09-2004 03:22 AM
Insert character into a line with sed? & variables in sed? jago25_98 Programming 5 03-11-2004 06:12 AM
How to sort this problem out. manixmania Linux - Newbie 2 07-15-2003 02:53 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 10:49 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration