LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 12-19-2011, 09:23 PM   #1
wolverene13
Member
 
Registered: May 2010
Location: Matiland, FL
Distribution: Debian Squeeze
Posts: 49

Rep: Reputation: 0
awk - rearrange column data in csv file to match columns


Hi,

I have a csv file that I created from two different device lists. One lists the devices in our trouble ticketing system, the other lists devices in our network monitoring application. In some cases, the device is in the ticketing system but not the network monitoring software, in some cases the device is in our network monitoring software, but not our ticketing system, and sometimes it is in both. I put the data in one csv file and I am trying to line up all of the records by IP address. The desired format when viewed with Open Office Calc (or Excel) is as follows (I changed the actual data to make it generic, but I can provide a snippet of the actual file if needed):

Code:
TS= Ticketing System
MS= Monitoring System
In backup | NodeID-TS | Name-TS | In TS     |  In TS but not MS | In Both  |  In MS but not TS | In MS      | Name-TS  | NodeID-MS

Yes         11111       node1    10.10.10.1                       10.10.10.1                     10.10.10.1   node_1     11111
No          22222       node2    10.10.10.2                       10.10.10.2                     10.10.10.2   node_2     22222
Yes         33333       node3    10.10.10.3    10.10.10.3                                                     
                                                                               10.10.10.4        10.10.10.4   node_4     44444
Yes         55555       node5    10.10.10.5                       10.10.10.5                     10.10.10.5   node_5     55555
                                                                               10.10.10.6        10.10.10.6   node_6     66666
No          77777       node7    10.10.10.7    10.10.10.7         
Yes         88888       node8    10.10.10.8    10.10.10.8         
                                                                               10.10.10.9        10.10.10.9   node_9     99999
However, currently the "In MS" column is not lined up with the rest of the document and it looks like this:

Code:
In backup | NodeID-TS | Name-TS | In TS     |  In TS but not MS | In Both  |  In MS but not TS | In MS      | Name-TS  | NodeID-MS

Yes         11111       node1    10.10.10.1                       10.10.10.1                     10.10.10.1   node_1     11111
No          22222       node2    10.10.10.2                       10.10.10.2                     10.10.10.2   node_2     22222
Yes         33333       node3    10.10.10.3    10.10.10.3                      10.10.10.4        10.10.10.4   node_4     44444
                                                                                                 10.10.10.5   node_5     55555
Yes         55555       node5    10.10.10.5                       10.10.10.5   10.10.10.6        10.10.10.6   node_6     66666                  
                                                                               10.10.10.9        10.10.10.9   node_9     99999
No          77777       node7    10.10.10.7    10.10.10.7         
Yes         88888       node8    10.10.10.8    10.10.10.8
Here's what it looks like as a .csv file in a text editor, for sample purposes:

Code:
In backup,NodeID-TS,Name-TS,In TS,In TS but not MS,In Both,In MS but not TS,In MS,Name-TS,NodeID-MS
"Yes",11111,"node1","10.10.10.1",,"10.10.10.1",,"10.10.10.1","node_1",11111
"No",22222,"node2","10.10.10.2",,"10.10.10.2",,"10.10.10.2","node_2",22222
"Yes",33333,"node3","10.10.10.3","10.10.10.3",,"10.10.10.4","10.10.10.4","node_4",44444
,,,,,,,"10.10.10.5","node_5",55555
"Yes",55555,"node5","10.10.10.5",,"10.10.10.5","10.10.10.6","10.10.10.6","node_6",66666
,,,,,,"10.10.10.9","10.10.10.9","node_9",99999
"No",77777,"node7","10.10.10.7","10.10.10.7",,,,,
"Yes",88888,"node8","10.10.10.8","10.10.10.8",,,,,
Basically I need it so that if a node is in both systems, the line pretty much stays the same. If it is in only one system, that IP and associated record information needs to have its own line. The IP's need to be in order so the records are easier to work with. So far, I've tried this:

Code:
#!/bin/bash

cat mes-SHEET_copy.csv | while read line
 do

clmnA=$(echo $line | cut -d"," -f1)
clmnB=$(echo "$line" | cut -d"," -f2)
clmnC=$(echo "$line" | cut -d"," -f3)
clmnD=$(echo "$line" | cut -d"," -f4)
clmnE=$(echo "$line" | cut -d"," -f5)
clmnF=$(echo "$line" | cut -d"," -f6)
clmnG=$(echo "$line" | cut -d"," -f7)
clmnH=$(echo "$line" | cut -d"," -f8)
clmnI=$(echo "$line" | cut -d"," -f9)
clmnJ=$(echo "$line" | cut -d"," -f10)

 if [ $clmnE != $clmnF ]; then
 
 echo "$clmnA,$clmnB,$clmnC,$clmnD,$clmnE,$clmnF,,,,,"
 
 elif [ $clmnF != $clmnG ]; then
 
 echo ",,,,,,$clmnF,$clmnG,$clmnH,$clmnI,$clmnJ,"

 elif [ -n $clmnF ]; then
 
 echo "$clmnA,$clmnB,$clmnC,$clmnD,$clmnE,$clmnF,$clmnG,$clmnH,$clmnI,$clmnJ"

 fi

 done
and this:

Code:
#!/bin/bash

cat mes-SHEET_copy.csv | while read line
 do

clmnF=$(echo "$line" | awk -F"," '{print $6}')
clmnG=$(echo "$line" | awk -F"," '{print $7}')
clmnH=$(echo "$line" | awk -F"," '{print $8}')

 if [ $clmnF=$clmnH ]; then

 echo $line

 else echo $line | awk -F"," '{print $1,$2,$3,$4,$5,$6}'

fi

done
...but it has occurred to me that none of these will work, because it will not "save" the information that was already in columns 7, 8, and 9 and put it on the next line.

Anyone have any ideas? I really don't want to go through 7,000 lines manually. Thanks in advance.

Last edited by wolverene13; 12-20-2011 at 01:08 PM. Reason: Added header fields to csv example
 
Old 12-20-2011, 01:44 AM   #2
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,561

Rep: Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939
Please include the header information (ie what does each field represent) for the csv as this will help work out how to arrive at your required output.
 
Old 12-20-2011, 01:53 AM   #3
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 638

Rep: Reputation: 373Reputation: 373Reputation: 373Reputation: 373
Hi.

Try the following awk script
Code:
BEGIN{
	FS=","
	OFS=","
	in_ts	= 4
	in_ms	= 8

	ts_only	= 5
	both    = 6
	ms_only	= 7

	max	= 10
}
$in_ts { ts[$in_ts]=$0 }
$in_ms { ms[$in_ms]=$0 }

END{
	for(l in ts){
		$0 = ts[l]
		$both = ""
		$ts_only = ""
		$ms_only = ""
		if(l in ms){
			$both = l

			split(ms[l], tmp)
			for(i=in_ms; i<=max; i++)
				$i = tmp[i]
		}
		else {
		       	$ts_only = l
			for(i=ts_only+1; i<=max; i++)
				$i = ""
		}

		result[l] = $0
		delete ms[l]
	}

	# Now ms contain "ms_only" records
	for(l in ms) {
		$0 = ms[l]
		$ms_only = l
		for(i=1; i<ms_only; i++)
			$i = ""
		result[l] = $0
	}

	for(l in result)
		print result[l]
}
Interesting parts here (relevant quotes from `man awk'):
1)
Quote:
Fields need not be referenced by constants:

n = 5
print $n

prints the fifth field in the input record.
2)
Quote:
References to non-existent fields (i.e. fields after $NF) produce the null-string. However, assigning to a
non-existent field (e.g., $(NF+2) = 5) increases the value of NF, creates any intervening fields with the null
string as their value, and causes the value of $0 to be recomputed, with the fields being separated by the
value of OFS. References to negative numbered fields cause a fatal error. Decrementing NF causes the values
of fields past the new value to be lost, and the value of $0 to be recomputed, with the fields being separated
by the value of OFS.

Assigning a value to an existing field causes the whole record to be rebuilt when $0 is referenced. Similarly,
assigning a value to $0 causes the record to be resplit, creating new values for the fields.
You can use `asorti()' to sort `result' by IP address if you need to.

Let me know if you need further explanation.
Hope this helps.

Last edited by firstfire; 12-20-2011 at 01:53 AM. Reason: Typo fixed.
 
Old 12-20-2011, 01:16 PM   #4
wolverene13
Member
 
Registered: May 2010
Location: Matiland, FL
Distribution: Debian Squeeze
Posts: 49

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by firstfire View Post
Hi.

Try the following awk script
Code:
BEGIN{
	FS=","
	OFS=","
	in_ts	= 4
	in_ms	= 8

	ts_only	= 5
	both    = 6
	ms_only	= 7

	max	= 10
}
$in_ts { ts[$in_ts]=$0 }
$in_ms { ms[$in_ms]=$0 }

END{
	for(l in ts){
		$0 = ts[l]
		$both = ""
		$ts_only = ""
		$ms_only = ""
		if(l in ms){
			$both = l

			split(ms[l], tmp)
			for(i=in_ms; i<=max; i++)
				$i = tmp[i]
		}
		else {
		       	$ts_only = l
			for(i=ts_only+1; i<=max; i++)
				$i = ""
		}

		result[l] = $0
		delete ms[l]
	}

	# Now ms contain "ms_only" records
	for(l in ms) {
		$0 = ms[l]
		$ms_only = l
		for(i=1; i<ms_only; i++)
			$i = ""
		result[l] = $0
	}

	for(l in result)
		print result[l]
}
Interesting parts here (relevant quotes from `man awk'):
1)


2)


You can use `asorti()' to sort `result' by IP address if you need to.

Let me know if you need further explanation.
Hope this helps.
One thing -- how do I execute this? is it ./script < file.csv ?
 
Old 12-20-2011, 01:17 PM   #5
wolverene13
Member
 
Registered: May 2010
Location: Matiland, FL
Distribution: Debian Squeeze
Posts: 49

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by grail View Post
Please include the header information (ie what does each field represent) for the csv as this will help work out how to arrive at your required output.
grail - I have the headers in the "excel-like" examples, but forgot to put them in the csv-style example. I added them just now. Thanks.
 
Old 12-20-2011, 01:24 PM   #6
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 638

Rep: Reputation: 373Reputation: 373Reputation: 373Reputation: 373
Quote:
Originally Posted by wolverene13 View Post
One thing -- how do I execute this? is it ./script < file.csv ?
Good question!
Code:
awk -f script.awk < file.csv > out.csv
"<" is optional.

P.S. to skip the header add the line (in bold)
Code:
...
        max     = 10
}
FNR == 1 {print; next}
$in_ts { ts[$in_ts]=$0 }
...

Last edited by firstfire; 12-20-2011 at 01:28 PM.
 
Old 12-20-2011, 02:15 PM   #7
wolverene13
Member
 
Registered: May 2010
Location: Matiland, FL
Distribution: Debian Squeeze
Posts: 49

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by firstfire View Post
Good question!
Code:
awk -f script.awk < file.csv > out.csv
"<" is optional.

P.S. to skip the header add the line (in bold)
Code:
...
        max     = 10
}
FNR == 1 {print; next}
$in_ts { ts[$in_ts]=$0 }
...
I'm getting a syntax error near line 13 for some reason:

Code:
awk -f awk_test_script.sh < mes-SHEET.csv > mes-out.csv
awk: syntax error near line 13
awk: bailing out near line 13
 
Old 12-20-2011, 11:04 PM   #8
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 638

Rep: Reputation: 373Reputation: 373Reputation: 373Reputation: 373
Quote:
Originally Posted by wolverene13 View Post
I'm getting a syntax error near line 13 for some reason:

Code:
awk -f awk_test_script.sh < mes-SHEET.csv > mes-out.csv
awk: syntax error near line 13
awk: bailing out near line 13
It looks like your awk is old (according to this link). Try using nawk or gawk instead. On my machine script works fine
Code:
$ awk -f rearrange.awk test.csv
In backup,NodeID-TS,Name-TS,In TS,In TS but not MS,In Both,In MS but not TS,In MS,Name-TS,NodeID-MS
"Yes",11111,"node1","10.10.10.1",,"10.10.10.1",,"10.10.10.1","node_1",11111
"No",22222,"node2","10.10.10.2",,"10.10.10.2",,"10.10.10.2","node_2",22222
"Yes",33333,"node3","10.10.10.3","10.10.10.3",,,,,
,,,,,,"10.10.10.4","10.10.10.4","node_4",44444
"Yes",55555,"node5","10.10.10.5",,"10.10.10.5",,"10.10.10.5","node_5",55555
,,,,,,"10.10.10.6","10.10.10.6","node_6",66666
"No",77777,"node7","10.10.10.7","10.10.10.7",,,,,
"Yes",88888,"node8","10.10.10.8","10.10.10.8",,,,,
,,,,,,"10.10.10.9","10.10.10.9","node_9",99999
My awk vesion is GNU Awk 3.1.8.

Last edited by firstfire; 12-20-2011 at 11:18 PM.
 
Old 12-21-2011, 12:44 AM   #9
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,561

Rep: Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939
Here is another direction for the same solution:
Code:
#!/usr/bin/awk -f

BEGIN{
        FS = ","
        front = "%-10s%-10s%-10s%-15s%-17s"
        back = "%-17s%-15s%-10s%-10s\n"
        format = front"%-15s"back
}

{   gsub(/"/,"")    }

NR == 1 || $2 == $10{ printf format,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10;next }

{
    printf front,$1,$2,$3,$4,$5

    if( end[x] && ( ! $2 || $2 == x ) ){
        mid = " "
        if( $2 == x )
                mid = $4

        printf "%-15s%s",mid,end[x]
        delete end
    }
    else
        printf "\n"

    if( ! end[x] ){
        end[$10] = sprintf(back,$7,$8,$9,$10)
        x = $10
    }
}

END{
    if( end[x] )
        printf "%-77s%s"," ",end[x]
}
Simply run as:
Code:
./script.awk file.csv
 
Old 12-21-2011, 04:55 AM   #10
wolverene13
Member
 
Registered: May 2010
Location: Matiland, FL
Distribution: Debian Squeeze
Posts: 49

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by firstfire View Post
It looks like your awk is old (according to this link). Try using nawk or gawk instead. On my machine script works fine
Code:
$ awk -f rearrange.awk test.csv
In backup,NodeID-TS,Name-TS,In TS,In TS but not MS,In Both,In MS but not TS,In MS,Name-TS,NodeID-MS
"Yes",11111,"node1","10.10.10.1",,"10.10.10.1",,"10.10.10.1","node_1",11111
"No",22222,"node2","10.10.10.2",,"10.10.10.2",,"10.10.10.2","node_2",22222
"Yes",33333,"node3","10.10.10.3","10.10.10.3",,,,,
,,,,,,"10.10.10.4","10.10.10.4","node_4",44444
"Yes",55555,"node5","10.10.10.5",,"10.10.10.5",,"10.10.10.5","node_5",55555
,,,,,,"10.10.10.6","10.10.10.6","node_6",66666
"No",77777,"node7","10.10.10.7","10.10.10.7",,,,,
"Yes",88888,"node8","10.10.10.8","10.10.10.8",,,,,
,,,,,,"10.10.10.9","10.10.10.9","node_9",99999
My awk vesion is GNU Awk 3.1.8.
Yes, that's the problem -- at work we are using Solaris, and it doesn't work there, but it looks like it works on my Debian machine at home without a problem. I'll run it for real and let you know if it worked.
 
  


Reply

Tags
awk, columns, sorting


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Map 1 CSV's columns to matching columns in another CSV 2legit2quit Programming 7 10-27-2011 08:53 AM
[SOLVED] Match codes in first column in 2 files and return data in other columns cgcamal Programming 12 09-30-2011 01:03 AM
[SOLVED] Insert column with awk or sed between two columns captainentropy Linux - Newbie 8 01-19-2011 11:03 PM
Rearrange the column in CSV file.. govi1234 Linux - Newbie 9 08-27-2010 07:15 PM
how to rearrange a text file with awk keenboy Programming 9 11-04-2009 07:07 AM


All times are GMT -5. The time now is 07:49 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration