I'm completely stumped on how to go about solving this. I produce 2 data sets from running the following commands. Each data set will have between 100,000 and 9million entries. I need to match a value from one of data sets and return a time difference as well as all line entries from the first data set, excluding the 2nd field.
I've read about using getline, but this seems to be used when you have 2 files. My data is not in 2 files. Any help is much grateful.
First data set is obtained from running this:
Code:
unzip -c 'logentries.zip' | awk 'BEGIN { FS = "\t" } ; { print $1 , $8 , $7 }' | awk '{ print $1 , $2 , $6 , $15 , $16 }' | sed 's/,//g' | sed 's/ID=//g'
2011-12-30 12:26:52716 474E6FEE-C539-C78A-C00E-C2C982FD95ED 1234 S2C
2011-12-30 12:26:52730 5741F23C-65B9-6048-4A3C-55CCEDC0A77E 1234S2C
2011-12-30 12:26:52737 28003DAF-E6AB-8C29-D4F3-56106ABD1A3B 1234 S2C
2011-12-30 12:26:52738 B6CB977A-2E0C-FE11-021D-9D581A155D54 1234 S2C
2011-12-30 12:26:52739 9AD1B1CD-1A1E-B9C3-AC63-CE205FB1CCFB 1234 S2C
2011-12-30 12:26:52741 706CEA6D-0B95-C8DD-ED24-3B962F958BD7 1234 S2C
2011-12-30 12:26:52747 58D7740A-120F-5EB4-E668-4B22AFDC4346 1234 S2C
2011-12-30 12:26:52773 11647C57-57E7-AD4F-AB00-91EF291C7B1C 1234 S2C
2011-12-30 12:26:52785 C643A71C-3BAE-FE64-222D-5C79C0FF4811 1234 S2C
2011-12-30 12:26:52792 03373FFE-C665-7360-8650-BAD960579CC2 1234 S2C
2nd data set is obtained from running
Code:
awk 'BEGIN { FS = "\t" } ; { print $3 , $9 }' logmessages.csv
2012-11-01 13:55:19.0 F784A6CD-27E0-C627-A1CF-D58829F6405E
2012-11-01 14:00:47.0 7AAD2091-B674-0C2A-E5AB-F5A914B7664A
2012-11-01 11:46:38.0 96E1F242-F74D-6843-9BC4-C596C467347A
2012-11-01 14:00:48.0 09345EBA-ADEB-B6EA-E68C-3F73E58D8D2D
2012-11-01 11:46:35.0 E4DDBB06-4EA1-24AD-EE9D-16AB4F5E18BD
2012-11-01 11:59:41.0 93665C67-C218-4B67-5CDD-CE3781FBC0F8
2012-11-01 14:00:47.0 723A3D9B-D7F3-1A65-5B54-E1A13A0D42AD
2012-11-01 11:59:44.0 6765868F-6794-8F0B-5A12-A9B72B92527E
2012-11-01 11:59:38.0 A656E7C3-C516-9ADB-8DFF-F85E652B7B30
2012-11-01 14:00:41.0 6A477119-E966-6B9E-9F44-1C03CF60DFD3
2012-11-01 11:46:38.0 A212C967-45FE-8247-B17E-F0DA1D3136C5
2011-12-30 12:25:52.0 03373FFE-C665-7360-8650-BAD960579CC2
Output
2011-12-30,00:01:00.0,03373FFE-C665-7360-8650-BAD960579CC2,1234 S2C
I matched on: 03373FFE-C665-7360-8650-BAD960579CC2 and returned:
date,time difference between 2 files on match,ID,[0-9]{4} [0-9A-Z]{3}