Multiline and complex text input for bash/ awk/perl help
1 Attachment(s)
Hi. The horizontal space here is not enough. Instead, I attached the detail and sample input. Basically, it is a file where I need to pick data for import to excel.I attached an input sample file. Perl is welcome but I prefer bash/awk where I have basic knowledge. I am zero with perl.Thanks for the lending hands.
|
1.
Code:
file folio2.txt You'll want the reverse (dos2unix) fn when you put it back on MS. 2. 'Account' almost works, but sometimes the same num appears twice eg 277557771, but in-between you get lines like '0.00 will be billed to: Account 1273961' 3. That's a fairly free flow format; I'm guessing not all recs have identical(!) layout. Personally I would use Perl, but that's my weapon of choice for this sort of stuff and something Perl is very good at. If you do go Perl: http://perldoc.perl.org/ http://www.tizag.com/perlT/index.php |
Quote:
2.) it is not "Account" but "Account:" that I tested as "Record Separator" and it seemed to work. 3.) You are right, it is very much free flowing, I don't even know where to start. I would take it if you could make it in perl but I might not be able to do even the simplest maintenance in perl if needed. Thanks again. |
With Perl or awk I am not sure I see patterns in how to get some of your data?
Perhaps you could show us from a single account what would make a line unique enough that the required information could be extracted? Also, I would leave the RS as standard and use the finding of "Account:" as a resetting point for your variables storing the data, this way you are then not resorting to creating a loop to cycle over the fields. Just a thought. |
btacuso,
Using awk, it would seem possible to take, more or less, a "State Machine" style of approach to your task. With your sample input data, and this program: Code:
BEGIN { Code:
lname,fname booking invoice arrival departure rate tax balance From my awk program, you can see that it appeared the return chars just before the end of the line, did need to be handled, at least under Linux, and this is a Linux forum. The program is not complete. It's just that I had so many questions, it seemed easier to present code that might effectively raise the questions, rather than try to describe all the questions. I didn't format dates. I would guess that the output column for the names would need to be wider, or the names truncated, since names can be rather longer than the space allowed by your sample heading for the report output. I seemingly found various values missing from the input data. That's, IF, I'm interpreting the file correctly. Having what appear to be labels for values, following the values, two lines after the value, seems a rather unusual file format. I took the final "dollar" value after the phrase "Balance Due" within the "Folio Summary", to be the value for "Balance Due". In that case though, it seemed that it was missing in at least one case. Having the label for "Balance Due" before the value, seems an odd departure from having the label after the value, elsewhere in the file. It seemed that there might have been values for some of the taxes missing. If that's possible, you might want to consider outputting some error indication for that situation too. Also, if any other things can be missing, the "kludge"/assumption that I made, that the dollar values for taxes come in a certain order, can easily be wrong. Perhaps other expectations the program illustrates, could also be wrong. It may be necessary to grab a dollar value, save it, then explicitly look for a following label, where the label is expected to follow the value. If the output is sufficiently close to how you expected the file to be processed, maybe we can help you adjust the program to be exactly what you need. If so, maybe you could give us some additional details on how the file is to be processed. Hope this helps. |
Quote:
Thanks for the immediate response. I hope my problem is fixed with your and everybody's help. |
All times are GMT -5. The time now is 07:55 PM. |