How to scan a file with 2 different field separators?
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
How to scan a file with 2 different field separators?
I have a linux assignment - I'm not asking for any code, just an idea - and the task is to scan a text file and sort it first by one field separator, than every field to sort it by another separator. Awk looks like a good choice but I don't know how to use 2 FS in one awk script or to call an awk script from another awk script.
If the scanning process is sequential (that is the output of the first scanning represents in toto the input for the second scanning) you can pipe two awk calls, as in
In this example the option -F tells what separator is being used ("," in the first call, ":" in the second one). For a more sofisticated interaction between I/O, please post an example of what has to be done. It will be my pleasure to help!
For the hell of it, here's another suggestion. We'll go with the same assumption -- that one delimiter is a comma and the other is a colon.
First, run it through tr to change all delimiters to the same type:
Code:
tr ',' ':' < some-data-file
After that, all commas will be translated to colons. So you can use colucix's awk invocation above, except only for colons. (In other words, once everything has been changed to a single delimiter, life becomes easier.)
[ caveat: It's entirely possible I am misunderstanding what you are trying to do. I'm not too sure what you mean when you say "sort by separator". ]
Here's an extract from the info file for sort which may suggest a way to accomplish your task.
Code:
* Sort a set of log files, primarily by IPv4 address and secondarily
by time stamp. If two lines' primary and secondary keys are
identical, output the lines in the same order that they were
input. The log files contain lines that look like this:
4.150.156.3 - - [01/Apr/2004:06:31:51 +0000] message 1
211.24.3.231 - - [24/Apr/2004:20:17:39 +0000] message 2
Fields are separated by exactly one space. Sort IPv4 addresses
lexicographically, e.g., 212.61.52.2 sorts before 212.129.233.201
because 61 is less than 129.
sort -s -t ' ' -k 4.9n -k 4.5M -k 4.2n -k 4.14,4.21 file*.log |
sort -s -t '.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n
This example cannot be done with a single `sort' invocation, since
IPv4 address components are separated by `.' while dates come just
after a space. So it is broken down into two invocations of
`sort': the first sorts by time stamp and the second by IPv4
address. The time stamp is sorted by year, then month, then day,
and finally by hour-minute-second field, using `-k' to isolate each
field. Except for hour-minute-second there's no need to specify
the end of each key field, since the `n' and `M' modifiers sort
based on leading prefixes that cannot cross field boundaries. The
IPv4 addresses are sorted lexicographically. The second sort uses
`-s' so that ties in the primary key are broken by the secondary
key; the first sort uses `-s' so that the combination of the two
sorts is stable.
If the scanning process is sequential (that is the output of the first scanning represents in toto the input for the second scanning) you can pipe two awk calls, as in
In this example the option -F tells what separator is being used ("," in the first call, ":" in the second one). For a more sofisticated interaction between I/O, please post an example of what has to be done. It will be my pleasure to help!
excellent idea colucix. is there a way to implement this into one awk script?
anomie, I cannot transform every separator because I need different functions to be run on different fields.
PTrenholme, also a good idea, but I don't want to sort the fields.
thank you all.
is there a way to implement this into one awk script?
Do you mean to implement a single script to perform a single call to awk? The answer is always YES, since awk is a very powerful scripting language. However in this case i suggest to specify the first Field Separator in the BEGIN section of the script, e.g.
Code:
BEGIN { FS = "," }
then we can process each field by splitting it in other subfield by means of the split function, e.g.
Code:
split($2,names,":")
this is just an example which split the 2nd field using ":" as separator and assigning the splitted fields to the array "names". Then you can do some other processing on each element of the array.
A correct answer to your question requires the knowledge of the task you have to accomplish. By the way, this is a general issue for the great awk language!
thanks guys but I cannot use your ideas:
colucix: I need the fields inside the big field in order and using arrays I cannot acomplish this.
druuna: I need to ditinguish between the fields separated by ":" and the ones separated by ","
druuna I don't want to sort the input. is there a way to use sort with its options and not sort the input?
anomie, here is an example: january,february,june:sunday,saturday,monday. the output will be 1,2,6:1,7,2; something like that
colucix, I take it back your idea works, runnning throw the array using for(index in array) does not get the elements in order but using for (i=1;i=array_size;i++) does. thanks
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.