awk - rearrange column data in csv file to match columns
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
awk - rearrange column data in csv file to match columns
Hi,
I have a csv file that I created from two different device lists. One lists the devices in our trouble ticketing system, the other lists devices in our network monitoring application. In some cases, the device is in the ticketing system but not the network monitoring software, in some cases the device is in our network monitoring software, but not our ticketing system, and sometimes it is in both. I put the data in one csv file and I am trying to line up all of the records by IP address. The desired format when viewed with Open Office Calc (or Excel) is as follows (I changed the actual data to make it generic, but I can provide a snippet of the actual file if needed):
Code:
TS= Ticketing System
MS= Monitoring System
In backup | NodeID-TS | Name-TS | In TS | In TS but not MS | In Both | In MS but not TS | In MS | Name-TS | NodeID-MS
Yes 11111 node1 10.10.10.1 10.10.10.1 10.10.10.1 node_1 11111
No 22222 node2 10.10.10.2 10.10.10.2 10.10.10.2 node_2 22222
Yes 33333 node3 10.10.10.3 10.10.10.3
10.10.10.4 10.10.10.4 node_4 44444
Yes 55555 node5 10.10.10.5 10.10.10.5 10.10.10.5 node_5 55555
10.10.10.6 10.10.10.6 node_6 66666
No 77777 node7 10.10.10.7 10.10.10.7
Yes 88888 node8 10.10.10.8 10.10.10.8
10.10.10.9 10.10.10.9 node_9 99999
However, currently the "In MS" column is not lined up with the rest of the document and it looks like this:
Code:
In backup | NodeID-TS | Name-TS | In TS | In TS but not MS | In Both | In MS but not TS | In MS | Name-TS | NodeID-MS
Yes 11111 node1 10.10.10.1 10.10.10.1 10.10.10.1 node_1 11111
No 22222 node2 10.10.10.2 10.10.10.2 10.10.10.2 node_2 22222
Yes 33333 node3 10.10.10.3 10.10.10.3 10.10.10.4 10.10.10.4 node_4 44444
10.10.10.5 node_5 55555
Yes 55555 node5 10.10.10.5 10.10.10.5 10.10.10.6 10.10.10.6 node_6 66666
10.10.10.9 10.10.10.9 node_9 99999
No 77777 node7 10.10.10.7 10.10.10.7
Yes 88888 node8 10.10.10.8 10.10.10.8
Here's what it looks like as a .csv file in a text editor, for sample purposes:
Code:
In backup,NodeID-TS,Name-TS,In TS,In TS but not MS,In Both,In MS but not TS,In MS,Name-TS,NodeID-MS
"Yes",11111,"node1","10.10.10.1",,"10.10.10.1",,"10.10.10.1","node_1",11111
"No",22222,"node2","10.10.10.2",,"10.10.10.2",,"10.10.10.2","node_2",22222
"Yes",33333,"node3","10.10.10.3","10.10.10.3",,"10.10.10.4","10.10.10.4","node_4",44444
,,,,,,,"10.10.10.5","node_5",55555
"Yes",55555,"node5","10.10.10.5",,"10.10.10.5","10.10.10.6","10.10.10.6","node_6",66666
,,,,,,"10.10.10.9","10.10.10.9","node_9",99999
"No",77777,"node7","10.10.10.7","10.10.10.7",,,,,
"Yes",88888,"node8","10.10.10.8","10.10.10.8",,,,,
Basically I need it so that if a node is in both systems, the line pretty much stays the same. If it is in only one system, that IP and associated record information needs to have its own line. The IP's need to be in order so the records are easier to work with. So far, I've tried this:
#!/bin/bash
cat mes-SHEET_copy.csv | while read line
do
clmnF=$(echo "$line" | awk -F"," '{print $6}')
clmnG=$(echo "$line" | awk -F"," '{print $7}')
clmnH=$(echo "$line" | awk -F"," '{print $8}')
if [ $clmnF=$clmnH ]; then
echo $line
else echo $line | awk -F"," '{print $1,$2,$3,$4,$5,$6}'
fi
done
...but it has occurred to me that none of these will work, because it will not "save" the information that was already in columns 7, 8, and 9 and put it on the next line.
Anyone have any ideas? I really don't want to go through 7,000 lines manually. Thanks in advance.
Last edited by wolverene13; 12-20-2011 at 01:08 PM.
Reason: Added header fields to csv example
Please include the header information (ie what does each field represent) for the csv as this will help work out how to arrive at your required output.
BEGIN{
FS=","
OFS=","
in_ts = 4
in_ms = 8
ts_only = 5
both = 6
ms_only = 7
max = 10
}
$in_ts { ts[$in_ts]=$0 }
$in_ms { ms[$in_ms]=$0 }
END{
for(l in ts){
$0 = ts[l]
$both = ""
$ts_only = ""
$ms_only = ""
if(l in ms){
$both = l
split(ms[l], tmp)
for(i=in_ms; i<=max; i++)
$i = tmp[i]
}
else {
$ts_only = l
for(i=ts_only+1; i<=max; i++)
$i = ""
}
result[l] = $0
delete ms[l]
}
# Now ms contain "ms_only" records
for(l in ms) {
$0 = ms[l]
$ms_only = l
for(i=1; i<ms_only; i++)
$i = ""
result[l] = $0
}
for(l in result)
print result[l]
}
Interesting parts here (relevant quotes from `man awk'):
1)
Quote:
Fields need not be referenced by constants:
n = 5
print $n
prints the fifth field in the input record.
2)
Quote:
References to non-existent fields (i.e. fields after $NF) produce the null-string. However, assigning to a
non-existent field (e.g., $(NF+2) = 5) increases the value of NF, creates any intervening fields with the null
string as their value, and causes the value of $0 to be recomputed, with the fields being separated by the
value of OFS. References to negative numbered fields cause a fatal error. Decrementing NF causes the values
of fields past the new value to be lost, and the value of $0 to be recomputed, with the fields being separated
by the value of OFS.
Assigning a value to an existing field causes the whole record to be rebuilt when $0 is referenced. Similarly,
assigning a value to $0 causes the record to be resplit, creating new values for the fields.
You can use `asorti()' to sort `result' by IP address if you need to.
Let me know if you need further explanation.
Hope this helps.
Last edited by firstfire; 12-20-2011 at 01:53 AM.
Reason: Typo fixed.
Please include the header information (ie what does each field represent) for the csv as this will help work out how to arrive at your required output.
grail - I have the headers in the "excel-like" examples, but forgot to put them in the csv-style example. I added them just now. Thanks.
I'm getting a syntax error near line 13 for some reason:
Code:
awk -f awk_test_script.sh < mes-SHEET.csv > mes-out.csv
awk: syntax error near line 13
awk: bailing out near line 13
It looks like your awk is old (according to this link). Try using nawk or gawk instead. On my machine script works fine
Code:
$ awk -f rearrange.awk test.csv
In backup,NodeID-TS,Name-TS,In TS,In TS but not MS,In Both,In MS but not TS,In MS,Name-TS,NodeID-MS
"Yes",11111,"node1","10.10.10.1",,"10.10.10.1",,"10.10.10.1","node_1",11111
"No",22222,"node2","10.10.10.2",,"10.10.10.2",,"10.10.10.2","node_2",22222
"Yes",33333,"node3","10.10.10.3","10.10.10.3",,,,,
,,,,,,"10.10.10.4","10.10.10.4","node_4",44444
"Yes",55555,"node5","10.10.10.5",,"10.10.10.5",,"10.10.10.5","node_5",55555
,,,,,,"10.10.10.6","10.10.10.6","node_6",66666
"No",77777,"node7","10.10.10.7","10.10.10.7",,,,,
"Yes",88888,"node8","10.10.10.8","10.10.10.8",,,,,
,,,,,,"10.10.10.9","10.10.10.9","node_9",99999
It looks like your awk is old (according to this link). Try using nawk or gawk instead. On my machine script works fine
Code:
$ awk -f rearrange.awk test.csv
In backup,NodeID-TS,Name-TS,In TS,In TS but not MS,In Both,In MS but not TS,In MS,Name-TS,NodeID-MS
"Yes",11111,"node1","10.10.10.1",,"10.10.10.1",,"10.10.10.1","node_1",11111
"No",22222,"node2","10.10.10.2",,"10.10.10.2",,"10.10.10.2","node_2",22222
"Yes",33333,"node3","10.10.10.3","10.10.10.3",,,,,
,,,,,,"10.10.10.4","10.10.10.4","node_4",44444
"Yes",55555,"node5","10.10.10.5",,"10.10.10.5",,"10.10.10.5","node_5",55555
,,,,,,"10.10.10.6","10.10.10.6","node_6",66666
"No",77777,"node7","10.10.10.7","10.10.10.7",,,,,
"Yes",88888,"node8","10.10.10.8","10.10.10.8",,,,,
,,,,,,"10.10.10.9","10.10.10.9","node_9",99999
My awk vesion is GNU Awk 3.1.8.
Yes, that's the problem -- at work we are using Solaris, and it doesn't work there, but it looks like it works on my Debian machine at home without a problem. I'll run it for real and let you know if it worked.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.