Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
10-31-2012, 12:52 PM
|
#1
|
|
Member
Registered: May 2010
Location: Matiland, FL
Distribution: Debian Squeeze
Posts: 47
Rep:
|
awk split single column into multiple columns based on RS
Hello,
I have a .csv file full of device configurations, listed in a single column. Each configuration starts with the word "version". I would like to split each configuration off and put them all in another file, one config in each column. I found a solution that puts each configuration in its own file here, using
Code:
awk '{print $0 "version"> "file" NR}' RS='version' input-file
, but that does not get me to where I need to be, where each config is in a different column in the same file. Can anyone help?
Thanks in advance.
|
|
|
|
10-31-2012, 01:28 PM
|
#2
|
|
Moderator
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.4 OpenSuSE 12.2
Posts: 9,896
|
Please, could you post a sample of the input file? Have the different configurations the same number of items (lines)?
|
|
|
|
10-31-2012, 03:15 PM
|
#3
|
|
Member
Registered: May 2010
Location: Matiland, FL
Distribution: Debian Squeeze
Posts: 47
Original Poster
Rep:
|
Quote:
Originally Posted by colucix
Please, could you post a sample of the input file? Have the different configurations the same number of items (lines)?
|
I have attached a sample file - there are two configs in there - keep in mind the configs may be various lengths depending on what is needed for that particular device. I have the file in .csv format, but LQ doesn't allow uploading that file type. Just change the extension and it should be the same as mine.
|
|
|
|
10-31-2012, 04:45 PM
|
#4
|
|
Moderator
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.4 OpenSuSE 12.2
Posts: 9,896
|
Looking at your sample file, I'm not sure about the desired output. Here is an awk code that takes the various configurations and put them side by side (independently from their length) using a fixed column width (previously computed):
Code:
BEGIN {
RS = "\" version \""
FS = "\n"
}
NR > 1 {
for ( i = 1; i <= NF; i++ )
i == 1 ? _[i,NR-1] = RS $i : _[i,NR-1] = $i
NF > max_nf ? max_nf = NF : max_nf
}
END {
for ( i = 1; i <= max_nf; i++ ) {
for ( j = 1; j <= NR-1; j++)
printf "%-85s", _[i,j]
printf "\n"
}
}
Moreover I'm not sure about the first two (recurring) lines of your file. Should they appear in the output file?
Finally note that I had to do some conversions to the sample file in order to avoid unpredictable results: I converted it from UTF-16 to UTF-8 encoding and changed the Windows-style line terminators (CR+LF) to Unix-style (LF):
Code:
$ iconv -fUTF-16 -tUTF-8 sample-csv.txt > sample.csv
$ dos2unix sample.csv
Feel free to ask for clarifications about the suggested code.
|
|
|
|
11-01-2012, 07:37 AM
|
#5
|
|
Senior Member
Registered: May 2005
Location: boston, usa
Distribution: fc-12/ fc-11-live-usb/ aix
Posts: 2,668
|
Quote:
Originally Posted by wolverene13
I have attached a sample file - there are two configs in there - keep in mind the configs may be various lengths depending on what is needed for that particular device. I have the file in .csv format, but LQ doesn't allow uploading that file type. Just change the extension and it should be the same as mine.
|
the attached doesnt seem to be in comma-separated-values format ?
i would split the file by each line that contains version then paste each file together. (you mite need to work on space padding each line to the length that you want)
|
|
|
|
11-01-2012, 11:30 AM
|
#6
|
|
Member
Registered: May 2010
Location: Matiland, FL
Distribution: Debian Squeeze
Posts: 47
Original Poster
Rep:
|
Quote:
Originally Posted by schneidz
the attached doesnt seem to be in comma-separated-values format ?
i would split the file by each line that contains version then paste each file together. (you mite need to work on space padding each line to the length that you want)
|
I played with this some more and talked to the guy who gave me the file in the first place, and he said that he got the information by reading each config file and appending it to a csv spreadsheet, which basically creates a long column of configs. Then I realized that it would be much easier to just put each file in it's own column to begin with. So I guess my question really is:
How do you append file contents to a new column in a an existing csv file (when the number of files is unknown and their lengths are different)?
Example:
|Column A | Column B | Column C
|contents of file 1 | contents of file 2 | contents of file 3
|
|
|
|
11-01-2012, 11:41 AM
|
#7
|
|
Senior Member
Registered: May 2005
Location: boston, usa
Distribution: fc-12/ fc-11-live-usb/ aix
Posts: 2,668
|
^ what i mentioned above is that your input is just plain unformatted ascii text (not sure why you think it is in csv format: http://en.wikipedia.org/wiki/Comma-separated_values ).
edit: i found this via google for splitting the input:
Code:
awk '{print "version"$0 > "config-"++i".txt"}' RS="version" largefile.csv
then use the paste command will probably yield you best results.
Last edited by schneidz; 11-01-2012 at 11:58 AM.
Reason: added awk bit to do the splitting.
|
|
|
|
11-01-2012, 12:27 PM
|
#8
|
|
Member
Registered: May 2010
Location: Matiland, FL
Distribution: Debian Squeeze
Posts: 47
Original Poster
Rep:
|
Quote:
Originally Posted by schneidz
^ what i mentioned above is that your input is just plain unformatted ascii text (not sure why you think it is in csv format: http://en.wikipedia.org/wiki/Comma-separated_values ).
edit: i found this via google for splitting the input:
Code:
awk '{print "version"$0 > "config-"++i".txt"}' RS="version" largefile.csv
then use the paste command will probably yield you best results.
|
I am familiar with CSV files. In my previous post, I indicated that linux Questions will not allow uploading files with .csv as the extension - therefore, I had to change it to a .txt extension, and changing that to .csv on your end should put it back to the way it is on my end. Either way, I'll check the command out and see if it works.
|
|
|
|
11-01-2012, 12:33 PM
|
#9
|
|
Senior Member
Registered: Sep 2010
Location: Wales, UK
Distribution: Arch
Posts: 1,624
|
I agree with schneidz re: not being a .csv file - are you sure you uploaded the right one?
Code:
[joshua:/tmp o]$ iconv -fUTF-16 -tUTF-8 sample-csv.txt > sample.csv
[joshua:/tmp o]$ head sample.csv
# configuration file
exit all
" version ""3.01A07"" sw ""1.801"""
configure
system
" name ""mes-leap-400-m-ave"""
" contact ""123-456-7890"""
" location ""leap,400-m-ave, BLCY 633695"""
# Inventory
inventory 1
[joshua:/tmp o]$ hexdump -C sample.csv | head
00000000 23 20 63 6f 6e 66 69 67 75 72 61 74 69 6f 6e 20 |# configuration |
00000010 66 69 6c 65 0d 0a 65 78 69 74 20 61 6c 6c 0d 0a |file..exit all..|
00000020 22 20 20 20 20 76 65 72 73 69 6f 6e 20 20 22 22 |" version ""|
00000030 33 2e 30 31 41 30 37 22 22 20 73 77 20 20 22 22 |3.01A07"" sw ""|
00000040 31 2e 38 30 31 22 22 22 0d 0a 20 20 20 20 63 6f |1.801""".. co|
00000050 6e 66 69 67 75 72 65 0d 0a 20 20 20 20 20 20 20 |nfigure.. |
00000060 20 73 79 73 74 65 6d 0d 0a 22 20 20 20 20 20 20 | system.." |
00000070 20 20 20 20 20 20 6e 61 6d 65 20 20 22 22 6d 65 | name ""me|
00000080 73 2d 6c 65 61 70 2d 34 30 30 2d 6d 2d 61 76 65 |s-leap-400-m-ave|
00000090 22 22 22 0d 0a 22 20 20 20 20 20 20 20 20 20 20 |""".." |
[joshua:/tmp o]$
I see no commas...
|
|
|
|
11-01-2012, 12:35 PM
|
#10
|
|
Senior Member
Registered: May 2005
Location: boston, usa
Distribution: fc-12/ fc-11-live-usb/ aix
Posts: 2,668
|
^ you still misunderstand although its not that important to your issue.
regardless of the filename, the content inside your file is not in csv format (i.e.: the open-office csv import funtion doesnt separate it out into columns correctly because the file isnt separated by comma according to the value of each feild).
|
|
|
|
11-01-2012, 03:17 PM
|
#11
|
|
Member
Registered: May 2010
Location: Matiland, FL
Distribution: Debian Squeeze
Posts: 47
Original Poster
Rep:
|
Quote:
Originally Posted by schneidz
^ you still misunderstand although its not that important to your issue.
regardless of the filename, the content inside your file is not in csv format (i.e.: the open-office csv import funtion doesnt separate it out into columns correctly because the file isnt separated by comma according to the value of each feild).
|
Ah, correct - I misunderstood what you were saying - Yes, each word or phrase in the file is not separated by a comma as it normally is in a csv file (i.e. hello,everyone,I,am,on,the,Internet), despite having a "csv" file extension. That was done on my coworker's end merely to allow for putting the configs into Excel later on, and to facilitate the use of columns. I didn't bother asking him why this is a requirement, merely because I'm not involved in the project beyond this point.
|
|
|
|
11-01-2012, 05:07 PM
|
#12
|
|
Senior Member
Registered: Dec 2004
Location: Marburg, Germany
Distribution: openSUSE 11.4
Posts: 1,314
|
csplit can achieve it in a similar way:
Code:
$ csplit sample.csv '%^" *version%' '/^" *version/' '{*}'
With the awk version the quotation marks are no longer balanced.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 09:21 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|