LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-31-2012, 12:52 PM   #1
wolverene13
Member
 
Registered: May 2010
Location: Matiland, FL
Distribution: Debian Squeeze
Posts: 57

Rep: Reputation: 0
awk split single column into multiple columns based on RS


Hello,

I have a .csv file full of device configurations, listed in a single column. Each configuration starts with the word "version". I would like to split each configuration off and put them all in another file, one config in each column. I found a solution that puts each configuration in its own file here, using
Code:
awk '{print $0 "version"> "file" NR}' RS='version'  input-file
, but that does not get me to where I need to be, where each config is in a different column in the same file. Can anyone help?

Thanks in advance.
 
Old 10-31-2012, 01:28 PM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Please, could you post a sample of the input file? Have the different configurations the same number of items (lines)?
 
Old 10-31-2012, 03:15 PM   #3
wolverene13
Member
 
Registered: May 2010
Location: Matiland, FL
Distribution: Debian Squeeze
Posts: 57

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by colucix View Post
Please, could you post a sample of the input file? Have the different configurations the same number of items (lines)?
I have attached a sample file - there are two configs in there - keep in mind the configs may be various lengths depending on what is needed for that particular device. I have the file in .csv format, but LQ doesn't allow uploading that file type. Just change the extension and it should be the same as mine.
Attached Files
File Type: txt sample-csv.txt (63.6 KB, 153 views)
 
Old 10-31-2012, 04:45 PM   #4
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Looking at your sample file, I'm not sure about the desired output. Here is an awk code that takes the various configurations and put them side by side (independently from their length) using a fixed column width (previously computed):
Code:
BEGIN {
  
  RS = "\"    version  \""
  FS = "\n"
  
}

NR > 1 {

  for ( i = 1; i <= NF; i++ )
    i == 1 ? _[i,NR-1] = RS $i : _[i,NR-1] = $i
    
  NF > max_nf ? max_nf = NF : max_nf
    
}

END {

  for ( i = 1; i <= max_nf; i++ ) {
    for ( j = 1; j <= NR-1; j++)
      printf "%-85s", _[i,j]
    printf "\n"
  }
  
}
Moreover I'm not sure about the first two (recurring) lines of your file. Should they appear in the output file?

Finally note that I had to do some conversions to the sample file in order to avoid unpredictable results: I converted it from UTF-16 to UTF-8 encoding and changed the Windows-style line terminators (CR+LF) to Unix-style (LF):
Code:
$ iconv -fUTF-16 -tUTF-8 sample-csv.txt > sample.csv
$ dos2unix sample.csv
Feel free to ask for clarifications about the suggested code.
 
Old 11-01-2012, 07:37 AM   #5
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
Quote:
Originally Posted by wolverene13 View Post
I have attached a sample file - there are two configs in there - keep in mind the configs may be various lengths depending on what is needed for that particular device. I have the file in .csv format, but LQ doesn't allow uploading that file type. Just change the extension and it should be the same as mine.
the attached doesnt seem to be in comma-separated-values format ?

i would split the file by each line that contains version then paste each file together. (you mite need to work on space padding each line to the length that you want)
 
Old 11-01-2012, 11:30 AM   #6
wolverene13
Member
 
Registered: May 2010
Location: Matiland, FL
Distribution: Debian Squeeze
Posts: 57

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by schneidz View Post
the attached doesnt seem to be in comma-separated-values format ?

i would split the file by each line that contains version then paste each file together. (you mite need to work on space padding each line to the length that you want)
I played with this some more and talked to the guy who gave me the file in the first place, and he said that he got the information by reading each config file and appending it to a csv spreadsheet, which basically creates a long column of configs. Then I realized that it would be much easier to just put each file in it's own column to begin with. So I guess my question really is:

How do you append file contents to a new column in a an existing csv file (when the number of files is unknown and their lengths are different)?

Example:

|Column A | Column B | Column C
|contents of file 1 | contents of file 2 | contents of file 3
 
Old 11-01-2012, 11:41 AM   #7
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
^ what i mentioned above is that your input is just plain unformatted ascii text (not sure why you think it is in csv format: http://en.wikipedia.org/wiki/Comma-separated_values ).

edit: i found this via google for splitting the input:
Code:
awk '{print "version"$0 > "config-"++i".txt"}' RS="version"  largefile.csv
then use the paste command will probably yield you best results.

Last edited by schneidz; 11-01-2012 at 11:58 AM. Reason: added awk bit to do the splitting.
 
Old 11-01-2012, 12:27 PM   #8
wolverene13
Member
 
Registered: May 2010
Location: Matiland, FL
Distribution: Debian Squeeze
Posts: 57

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by schneidz View Post
^ what i mentioned above is that your input is just plain unformatted ascii text (not sure why you think it is in csv format: http://en.wikipedia.org/wiki/Comma-separated_values ).

edit: i found this via google for splitting the input:
Code:
awk '{print "version"$0 > "config-"++i".txt"}' RS="version"  largefile.csv
then use the paste command will probably yield you best results.
I am familiar with CSV files. In my previous post, I indicated that linux Questions will not allow uploading files with .csv as the extension - therefore, I had to change it to a .txt extension, and changing that to .csv on your end should put it back to the way it is on my end. Either way, I'll check the command out and see if it works.
 
Old 11-01-2012, 12:33 PM   #9
Snark1994
Senior Member
 
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
Blog Entries: 3

Rep: Reputation: 346Reputation: 346Reputation: 346Reputation: 346
I agree with schneidz re: not being a .csv file - are you sure you uploaded the right one?

Code:
[joshua:/tmp o]$ iconv -fUTF-16 -tUTF-8 sample-csv.txt > sample.csv
[joshua:/tmp o]$ head sample.csv
# configuration file
exit all
"    version  ""3.01A07"" sw  ""1.801"""
    configure
        system
"            name  ""mes-leap-400-m-ave"""
"            contact  ""123-456-7890"""
"            location  ""leap,400-m-ave, BLCY 633695"""
#           Inventory
            inventory  1
[joshua:/tmp o]$ hexdump -C sample.csv | head                      
00000000  23 20 63 6f 6e 66 69 67  75 72 61 74 69 6f 6e 20  |# configuration |
00000010  66 69 6c 65 0d 0a 65 78  69 74 20 61 6c 6c 0d 0a  |file..exit all..|
00000020  22 20 20 20 20 76 65 72  73 69 6f 6e 20 20 22 22  |"    version  ""|
00000030  33 2e 30 31 41 30 37 22  22 20 73 77 20 20 22 22  |3.01A07"" sw  ""|
00000040  31 2e 38 30 31 22 22 22  0d 0a 20 20 20 20 63 6f  |1.801"""..    co|
00000050  6e 66 69 67 75 72 65 0d  0a 20 20 20 20 20 20 20  |nfigure..       |
00000060  20 73 79 73 74 65 6d 0d  0a 22 20 20 20 20 20 20  | system.."      |
00000070  20 20 20 20 20 20 6e 61  6d 65 20 20 22 22 6d 65  |      name  ""me|
00000080  73 2d 6c 65 61 70 2d 34  30 30 2d 6d 2d 61 76 65  |s-leap-400-m-ave|
00000090  22 22 22 0d 0a 22 20 20  20 20 20 20 20 20 20 20  |""".."          |
[joshua:/tmp o]$
I see no commas...
 
Old 11-01-2012, 12:35 PM   #10
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
^ you still misunderstand although its not that important to your issue.

regardless of the filename, the content inside your file is not in csv format (i.e.: the open-office csv import funtion doesnt separate it out into columns correctly because the file isnt separated by comma according to the value of each feild).
 
Old 11-01-2012, 03:17 PM   #11
wolverene13
Member
 
Registered: May 2010
Location: Matiland, FL
Distribution: Debian Squeeze
Posts: 57

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by schneidz View Post
^ you still misunderstand although its not that important to your issue.

regardless of the filename, the content inside your file is not in csv format (i.e.: the open-office csv import funtion doesnt separate it out into columns correctly because the file isnt separated by comma according to the value of each feild).
Ah, correct - I misunderstood what you were saying - Yes, each word or phrase in the file is not separated by a comma as it normally is in a csv file (i.e. hello,everyone,I,am,on,the,Internet), despite having a "csv" file extension. That was done on my coworker's end merely to allow for putting the configs into Excel later on, and to facilitate the use of columns. I didn't bother asking him why this is a requirement, merely because I'm not involved in the project beyond this point.
 
Old 11-01-2012, 05:07 PM   #12
Reuti
Senior Member
 
Registered: Dec 2004
Location: Marburg, Germany
Distribution: openSUSE 15.2
Posts: 1,339

Rep: Reputation: 260Reputation: 260Reputation: 260
csplit can achieve it in a similar way:
Code:
$ csplit sample.csv '%^" *version%' '/^" *version/' '{*}'
With the awk version the quotation marks are no longer balanced.
 
  


Reply

Tags
awk, columns



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Split single line into multiple lines with 3 column each udiubu Programming 7 11-26-2017 09:41 AM
[SOLVED] Insert column with awk or sed between two columns captainentropy Linux - Newbie 9 11-27-2014 11:49 AM
BASH or AWK: extract columns in multiple files and combine to a single file cristalp Programming 2 03-15-2012 11:55 AM
[SOLVED] How to split single column string into two columns Colorinb Linux - Newbie 2 10-07-2011 09:06 PM
awk multiple column into single column ilukacevic Programming 49 07-19-2010 07:23 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:34 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration