ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
i have input files where NF may vary, and, fields $7 $9 and $10 may be blank (hence it looks like one big space between $8 and $11). how to handle this in awk?
the reason why NF may vary is because one field in the file may look like this "name=xyz" or "name=joe doe"
Can you give an example of the input you have and output you want?
Are all the fields in double quotes?
I think setting the FS variable in awk intelligently may do most of the work for you.
Most likely (and hopefully) the actual field separator in the input file is TAB. Please post it (or part of it) as requested, using CODE tags to preserve spacing. Thank you.
ok, here's sample from windows txt file. i didnt look at it in hex yet.
some knowns about the data:
what seems consistent (they always exist, referencing my output below) is $1$2$3$4$5$6$8$11$12
...and $11 $12 always start with a *
Code:
11/28/11 06:52:15 PEEL BANNANA PRD 2 F APHSIP *1C*-99 INI NAME=CN 53510 SYS
11/28/11 06:52:15 PEEL ORANGE PRD 2 F APHSIP *1C*-99 INI NAME=CN 53510 IC
11/28/11 06:52:15 PEEL APPLE PRD 2 F APHSIP *1C*-99 INI NAME=CN 53510 NET
11/28/11 08:03:46 PEEL FRUIT PRD 2 F 01 APHSIP *08*-09 INI NAME=joe doe 53510 M058
11/28/11 09:31:17 PEEL GRAPES KRD 2 F 01 APHSIP EXECUTE NONE *08*-88 > DTPI 53510 M071
firstfire did put you on the right path but your data is not uniform. If we could assume that a space (or anything for that matter) were the delimiter then $5 which you have said is always
there and demonstrated that in the example is either PRD of KRD and yet, there is no consistency to a delimiter. In my opinion this makes it a lot more difficult. I would suggest that
you are now left with saying that each field has a specific length (ie. field 1 is 8 characters long) and trying to split the data based on this principle.
yep, this is a pita, certainly a good newb problem to solve, but its just ascii(hex) and we can always manipulate that, etc. this is the data to work with, the only fields that have consistent constant length (if they exist) is $1 $2 $6 $11 and $12 (referencing my output), all others can vary in length, etc.
i'll ask if the txt files can be generated using a better delimiter like a "|" char.
i could sed the data 1st, replacing every \s+ with \s, but then how to determine which fields are actually missing? i'm just trying to help the crew here turn another human-heavy process into a automated one, really has nothing to do with security, if i cant solve it today i'll likely leave it for someone else, which likely means it will remain a human-heavy process.
Last edited by Linux_Kidd; 04-19-2012 at 01:40 PM.
well, if i sed -i 's/\s/|/g' my data file i get something that may be usable as the new output has constant NF, and data is in predictable field locations.
i should be able to get it to work, need to sed 1st, then awk it. will let you know.
as example, i can do a if statement like "if $10="" then print $9,$13 else print $10,$11
The sed is not required as this would be the same as the solution provided by firstfire, ie by using FS = " " then this is now equivalent to what you created with sed.
FS of space is \s+ (is this correct, one or 40 spaces is considered a single FS?)
and this would not produce the same NF as sed did.
sed at least gave me constant NF
some fields seem to be predictable in max size, 8char max. i am off for a few days, will look at it next week. thnx.
Last edited by Linux_Kidd; 04-19-2012 at 03:37 PM.
This is a very clumsy and ugly solution, but you may be able to create regex that matches the line. For example, an expression like this one could work for your input as in post #5:
where I assumed, that:
1) $5 is either PRD or KRD
2) $7 is a number or blank
3) $8 is allways APHSIP
4) $9 and $10 are EXECUTE and NONE or blank
5) $14 doesn't contain digits
6) #15 is a number or at least starts with a digit
you may need to make the expression more general based on what you know about the input. Unless you can make the input file mmore regular and predictable, it will be very difficult to find an efficient and reliable solution.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.