Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
|
12-16-2005, 03:14 PM
|
#1
|
LQ Newbie
Registered: Dec 2005
Posts: 7
Rep:
|
problem with sort in linux and sed
Help! The group I work with just switched from a Unix based computer to a Linux based computer. everything works pretty much the same way but sort. On Unix I could do a merge and sort on a huge SLR monthly file and it would put it in cronical order with out much troubles. In Linux it runs into the blanks and give me day 1-10 then day 2-20 ect.
I know I can do all the blanks to zero but I don't want to do that because it will change the standardized format the data is in, but what I need to know is, is there a way to replace the any blanks that occur in the first 27 columes of every line with a zero in sed.?
any advice on sed or sort would be greatly appicated
Ruth
|
|
|
12-16-2005, 03:36 PM
|
#2
|
Member
Registered: Sep 2004
Location: south texas
Distribution: fedora core 3,4; gentoo
Posts: 192
Rep:
|
what exactly do the strings you want to sort look like? you can pass sort the -t option to tell it what to use as a field delimieter, then use -k to specify what field to sort by first, second and so on and -n to use numberic sort.
if your strings are as simple as xx-xx, like:
2-20
1-10
3-23
then sort -t- -nk1,2 file.txt
would do what i think you want.
|
|
|
12-16-2005, 09:19 PM
|
#3
|
LQ Veteran
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809
|
Have you checked the man page for sort?
Also, I suggest the Bash Guide for Beginners by Machtelt Garrels--it's on TLDP
|
|
|
12-19-2005, 09:50 AM
|
#4
|
LQ Newbie
Registered: Dec 2005
Posts: 7
Original Poster
Rep:
|
Quote:
Originally Posted by pixellany
Have you checked the man page for sort?
Also, I suggest the Bash Guide for Beginners by Machtelt Garrels--it's on TLDP
|
I have checked out the man pages and I also have Linux in a Nutshell
plus I have bugged any one I can find who know any thing about Linux. What's TLDp Mean?
sirclif-
Here is an example of the line in the files I am trying to sort.
(This is all one line wraped)
860610103 1677821583010578208201 475289216872 10644867574 3155320 80172816 74 128587 0 1573 107243 001
These are monthly Satellite Laser Ranging(SLR) files. The first seven
columns is the satalite number. Column 8-27 is the date. that the only area I need to replace any blank that occur with zero. Then I can sort it using a general numeric sort and it will sort the right way.
|
|
|
12-19-2005, 12:46 PM
|
#5
|
Member
Registered: Sep 2004
Location: south texas
Distribution: fedora core 3,4; gentoo
Posts: 192
Rep:
|
ok, im sorry, but im having a hard time figureing out what you mean by 'column'. is each number a columen, or each group of numbers seperated by a space?
does this number represent the date --> 03 16778215830105782?
if so, how? i'm sure we can find a regular expression to do what you need, but i'm having trouble understanding the data format.
|
|
|
12-19-2005, 07:52 PM
|
#6
|
LQ Veteran
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809
|
Quote:
Originally Posted by labelle7152
What's TLDp Mean?.
|
The Linux Documentation Project---sorry
|
|
|
12-19-2005, 08:18 PM
|
#7
|
Moderator
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
|
I'm with Sirclif, no idea what the criteria are you're trying
to sort by, nor can I see anything that I'd pick to be the re-
presentation of a date in your sample.
If the problem is caused by the space (or the alignment of numerical
values) you can use -g to tell it to sort those by value rather than
by ASCII codes.
E.g.
Code:
cat sortem.txt
1
25
3
4
19
7
8
23
9
10
11
28
12
2
14
17
5
18
20
21
22
6
13
24
16
26
15
27
29
Code:
sort sortem.txt
1
10
11
12
13
14
15
16
17
18
19
2
20
21
22
23
24
25
26
27
28
29
3
4
5
6
7
8
9
But what you want is
Code:
sort -g sortem.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Cheers,
Tink
|
|
|
12-20-2005, 09:22 AM
|
#8
|
LQ Newbie
Registered: Dec 2005
Posts: 7
Original Poster
Rep:
|
sirclif
that is the date it's the year 03 the julian day 167 and the time in
Hunderds of seconds ending at 10, the 5782 is not part of the time its the beginning of the ground station number where the data is collected. so it clairfy the date is (03 1677821583010) This is the only part of the line which I need the blank replaced by zero's in. It is the part I sort by.
Tinkster
I tryed using -g it runs into the blanks and sorts
1
11
12
13
14
15
16
17
18
19
2
20
ect.
If I use the -u to get rid of any duplicate lines it runs into the blanks
thinks all other lines are the same and gives me only one line.
|
|
|
12-20-2005, 11:18 AM
|
#9
|
Member
Registered: Sep 2004
Location: south texas
Distribution: fedora core 3,4; gentoo
Posts: 192
Rep:
|
ok i think i understand your problem now. i think this will work.
Code:
$ cat file.SLR | sed "s/^.\{7\}\(.\{16\}\).*/\1 X\0/" | sort -gk1,2 | sed "s/^.*X//"
this will...
display contents of file, send it to sed
sed then skips the first 7 characters ( the date starts at character 8), saves the next 17 characters (03 1677821583010 is 16 characters), then matches to the rest of the line.
sed replaces the line with a line that has the 16 important characters at the front, then a space, then an 'X', then the original line, so...
860610103 1677821583010578208201 475289216872 10644867574 3155320 80172816 74 128587 0 1573 107243 001
becomes...
03 1677821583010 X860610103 1677821583010578208201 475289216872 10644867574 3155320 80172816 74 128587 0 1573 107243 001
this is passed to sort, which does a general numeric sort (-g), first sorts by column 1 ("03" specified by -k1) then column 2 ("1677821583010" by ,2). This may be overkill, simply using sort worked for me, but this will make sure for you.
the sorted list is passed back to sed which removes the string at the beginning of the line we put there to sort by matching all characters up to an 'X' and replacing it with nothing. this should work even if an 'X' is present in the line, because sed will only match to the first X it finds.
hope this is what your looking for.
also, Tinkster may have some suggestions on making the sed command more readable, it is cluttered with '\' to excape the metacharacters.
|
|
|
12-20-2005, 11:37 AM
|
#10
|
Moderator
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
|
Ruth,
I'm still curious: which Unix were you using, and what was
the original invocation of the sort on that platform?
And can you please post an example of more than one row, and
what the sort should give you as well as what you get?
I've tried (based on the description of what you and Sirclif
were talking about ) with a little dummy file (copied your one
line repeatedly, and edited a few of the julian days), and a
plain sort without any further parameters seems to do what I'd
have expected.
Cheers,
Tink
P.S.: Sirclif: unless you stick your sed-command in a file
there's no way I'm aware off of escaping the escapes ;}
|
|
|
12-21-2005, 09:39 AM
|
#11
|
LQ Newbie
Registered: Dec 2005
Posts: 7
Original Poster
Rep:
|
Tinkster
here is my orginal Unix scrip program to merge and sort the daily files
for the glonass89 satilite into a monthly file.
Quote:
#!/bin/csh -f
set kt = 0501
set lt = glonass89
set lt2 = glonass89
set kt_a = $kt"_a"
set kt_b = $kt"_b"
set lt_a = $lt"_a"
cd ~labelle/slr/slrfr
cp /slr/data/fr/$lt2/daily/*/7*_$kt* ~labelle/slr/slrfr
uncompress 7*_$kt*.$lt.Z
set error = $status
if ($error == 1) then
goto next1
endif
sort -u -m -o $lt_a.$kt *$kt*.$lt
compress $lt_a.$kt
"slrmerg6" 30L, 455C 3,1 Top
|
If I just wanted to resort a file after a corrected bad data I would use
sort -u -o glonass89_b1.0305 glonass89_b.0305
One of my smaller files is 75000 line long
Here is an example of the results of the program above on linux
Quote:
020600105 25681157525912709005131681338541305134001854807 5320 98022895 8919549 3723 73 162593 0 470 2311110021A
020600105 25681160025908709005131681305541318134001404517 5320 98022895 8919549 3723 81 162593 0 470 2311110021A
020600105 25681165025912709005131681241541344134000503457 5320 98022895 8919548 3723 81 162593 0 470 2311110021A
020600105 25681172525904709005131681144541383133999152677 5320 98022895 8919547 3723 58 162593 0 470 2311110021A
020600105 25681177525914709005131681079541409133998252137 5320 98022895 8919547 3723 62 162593 0 470 2311110021A
020600105 2605132525964709005132454563356259141928052211 5320 98152880 8727194 3723 79 162609 -5 410 2311110021A
020600105 2605202525958709005132453802356423141919784111 5320 98152880 8727183 3723 56 162609 -5 410 2311110021A
020600105 2605237525964709005132453422356505141915655091 5320 98152880 8727178 3723 73 162609 -5 410 2311110021A
020600105 2605242525960709005132453368356517141915065441 5320 98152880 8727177 3723 52 162609 -5 410 2311110021A
020600105 2605375025946709005132451927356826141899465361 5320 98152880 8727156 3723 96 162609 -5 410 2311110021A
020600105 2605407525968709005132451574356902141895646061 5320 98152880 8727151 3723 51 162609 -5 410 2311110021A
@
|
when correctly sorted with unix this the top of the file
Quote:
020600105 2605132525964709005132454563356259141928052211 5320 98152880 8727194 3723 79 162609 -5 410 2311110021A020600105 2605202525958709005132453802356423141919784111 5320 98152880 8727183 3723 56 162609 -5 410 2311110021A020600105 2605237525964709005132453422356505141915655091 5320 98152880 8727178 3723 73 162609 -5 410 2311110021A
020600105 2605242525960709005132453368356517141915065441 5320 98152880 8727177 3723 52 162609 -5 410 2311110021A
020600105 2605375025946709005132451927356826141899465361 5320 98152880 8727156 3723 96 162609 -5 410 2311110021A
020600105 2605407525968709005132451574356902141895646061 5320 98152880 8727151 3723 51 162609 -5 410 2311110021A
020600105 2605417525958709005132451465356925141894471391 5320 98152880 8727150 3723 33 162609 -5 410 2311110021A
020600105 2605422525962709005132451411356937141893884071 5320 98152880 8727149 3723 41 162609 -5 410 2311110021A
020600105 2605427525956709005132451356356949141893296901 5320 98152880 8727148 3723 70 162609 -5 410 2311110021A
020600105 2605435026008709005132451275356966141892416301 5320 98152880 8727147 3723 51 162609 -5 410 2311110021A
020600105 2605467525956709005132450921357042141888602391 5320 98152880 8727142 3723 56 162609 -5 410 2311110021A
|
This is the middle of the file with the 25th day correctly sorted
Quote:
020600105 24747090025926709005131888220653672130696084270 5320 98162925 6617456 3723 127 162600 4 460 2311110021A
020600105 24747147525946709005131887976654215130683028920 5320 98162925 6617448 3723 50 162600 4 460 2311110021A
020600105 24747152525954709005131887955654262130681894630 5320 98162925 6617448 3723 109 162600 4 460 2311110021A
020600105 24747155025922709005131887944654286130681327710 5320 98162925 6617447 3723 72 162600 4 460 2311110021A
020600105 24747160026002709005131887923654333130680193960 5320 98162925 6617447 3723 52 162600 4 460 2311110021A
020600105 25644875025894709005132048411354472142086354647 5320 98012903 8527276 3723 50 162583 9 450 2311110021A
020600105 25644882525900709005132048347354508142084520737 5320 98012903 8527274 3723 30 162583 9 450 2311110021A
020600105 25644950025902709005132047767354829142068016287 5320 98012903 8527253 3723 38 162583 9 450 2311110021A
020600105 25644960025892709005132047681354877142065571947 5320 98012903 8527249 3723 40 162583 9 450 2311110021A
020600105 25644970025926709005132047595354924142063126417 5320 98012903 8527246 3723 46 162583 9 450 2311110021A
020600105 25644975025894709005132047552354948142061904177 5320 98012903 8527245 3723 91 162583 9 450 2311110021A
@
|
sirclif
the problem with your suggestion is moving the date and satellite number corrupts the format of the data . Also depending on the month and the seconds
such 12:01 AM the can be mor than one blank in the time
|
|
|
12-21-2005, 10:55 AM
|
#12
|
Member
Registered: Sep 2004
Location: south texas
Distribution: fedora core 3,4; gentoo
Posts: 192
Rep:
|
the format is only "corrupted" when it is sent to sort, then "uncorrupted" by passing through sed again. so what is printed to the screen looks just like what you have shown. you can also modify the regular expression to be more general to account for such cases.
|
|
|
12-21-2005, 02:10 PM
|
#13
|
Moderator
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
|
I hope that I finally understood
Code:
$ sort -b -g -k1.8,1.9 -k2.1,2.3 sortem.txt
860610103 4 83907765675740579011994593814105 10116725553 2044235 99782901 66 0 0 0 107639 1 710 12421111123A
860610103 11 74316042112740579012396120345812 15081516190 2224235100002914 59 0 0 0 107604 18 770 12421111123A
860610103 12 39881575180740579012734740235109 18284224508 2274235 99932901 63 0 0 0 107584 11 670 12421111123A
860610103 24 91863414314740579012167798335675 15324178587 2414235100092935 45 0 0 0 107857 4 850 12421111123A
860610103 10185275575500740579011247287318719 15680040163 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185279575530740579011246675318425 15687625889 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185282575490740579011246218318204 15693323275 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185283575479740579011246065318130 15695224140 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185288575461740579011245304317761 15704739005 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185289575478740579011245151317688 15706644242 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185290575511740579011244999317614 15708550818 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
Cheers,
Tink
|
|
|
12-22-2005, 08:41 AM
|
#14
|
LQ Newbie
Registered: Dec 2005
Posts: 7
Original Poster
Rep:
|
sirclif
I tryed this command
cat ajisai_e.0301 | sed "s/^.\{7\}\(.\{16\}\).*/\1 X\0/" | sort -gk1,2 | sed "s/^.*X//" > ajisai_e2.0301
and got this results
Quote:
860610103 10185275575500740579011247287318719 15680040163 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185279575530740579011246675318425 15687625889 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185282575490740579011246218318204 15693323275 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185283575479740579011246065318130 15695224140 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185288575461740579011245304317761 15704739005 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185289575478740579011245151317688 15706644242 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185290575511740579011244999317614 15708550818 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185291575486740579011244847317540 15710456947 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185292575493740579011244695317466 15712364660 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185293575499740579011244543317393 15714273367 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
860610103 10185275575500740579011247287318719 15680040163 2404235 99972878 70 0 0 0 107599 0 550 12421111123A
|
it starts the file at day 10 and not 3 it seems to still be sorting 10,11,12,13,14,15,16,17,18,19,2,20 ect
Last edited by labelle7152; 12-22-2005 at 08:43 AM.
|
|
|
12-22-2005, 10:05 AM
|
#15
|
Member
Registered: Sep 2004
Location: south texas
Distribution: fedora core 3,4; gentoo
Posts: 192
Rep:
|
ok, i'm seeing your problem now. so how do you know how many digits the day takes up? in the above example, how can you tell that it's day 10 and not day 101? does the lenght of the string in the second column change? the year is the last two digits of the first columns string right? but the day could be either the first, first two, or first three digits of the second columns string?
sorry if im being a little slow here, but i dont fully understand the format you are requireing.
|
|
|
All times are GMT -5. The time now is 05:10 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|