LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-04-2012, 03:46 PM   #1
curosity
LQ Newbie
 
Registered: Oct 2012
Posts: 2

Rep: Reputation: Disabled
split comma and square brackets in csv file


Hi,

I have csv file with the fields in the format

hello,world,hds,[84,198,90],hdi,[89,92,200]

The elements within square brackets could vary from 0 to N(0..100). How do I split the file using awk?

I have tried different variations, but I cannot get it working.

`awk -F '[][,]' '{print "{ a:\""$1"\", b:\""$2"\", v:\""$3"\",f:"$4", j:"$5", k:"$6"}"}

Before :
hello,world,hds,[84,198,90],hdi,[89,92,200]

After is JSON object
{
a: "hello"
b: "world"
v: "hds"
f: [84,198,90]
j: "hdi"
k: [89,92,200]
}



Any help is appreciated,
Thanks,

Last edited by curosity; 10-04-2012 at 03:59 PM. Reason: Added Before and After
 
Old 10-04-2012, 03:49 PM   #2
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
before and after please.
 
Old 10-04-2012, 05:02 PM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Have:
Code:
hello,world,hds,[84,198,90],hdi,[89,92,200]
Want:
Code:
{
   a: "hello"
   b: "world"
   v: "hds"
   f: [84,198,90]
   j: "hdi"
   k: [89,92,200]
}
Try this:
Code:
echo "hello,world,hds,[84,198,90],hdi,[89,92,200]"  \
|awk -F ","                \
'{print "{"                \
      "\na: \""$1"\""      \
      "\nb: \""$2"\""      \
      "\nv: \""$3"\""      \
      "\nf: "$4","$5","$6  \
      "\nj: \""$7"\""      \
      "\nk: "$8","$9","$10 \
      "\n}"                \
 }'
Daniel B. Martin
 
1 members found this post helpful.
Old 10-04-2012, 05:36 PM   #4
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
^ thanks for the translation dannyb, the brackets can have between 0 or 100 items...
 
Old 10-04-2012, 05:54 PM   #5
curosity
LQ Newbie
 
Registered: Oct 2012
Posts: 2

Original Poster
Rep: Reputation: Disabled
Thanks dannyb, but elements within square brackets could vary. I don't care much for formatting, just that it has to be valid JSON format.
 
Old 10-04-2012, 06:45 PM   #6
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,780

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
Try this:
Code:
[rkn] ~ $ cat try.awk
BEGIN { expr = "^([^,]*),([^,]*),([^,]*),(\\[[^]]*\\]),([^,]*),(\\[[^]]*\\])$" }
{
    if(match($0, expr, aa)) {
	for(n=1; n <= 6; ++n)  print "field" n ": \"" aa[n] "\""
    }
    else {
	print "No match on line " FNR ":", $0 >"/dev/stderr"
    }
}
[rkn] ~ $ awk -f try.awk
hello,world,hds,[84,198,90],hdi,[89,92,200]
field1: "hello"
field2: "world"
field3: "hds"
field4: "[84,198,90]"
field5: "hdi"
field6: "[89,92,200]"
[rkn] ~ $
Explanation:
The expression consists of 6 subexpressions contained in parentheses, with a comma between each subexp. Subexpressions 1, 2, 3, and 5 just match any number of characters that are not a comma. Subexps 2 and 4 match a literal '[' followed by any number of characters that are not a ']' and then a literal ']'. The match() function stores in elements of array aa the characters that match each parenthesized subexpression.
 
1 members found this post helpful.
Old 10-04-2012, 07:37 PM   #7
ip_address
Member
 
Registered: Apr 2012
Distribution: RedHat
Posts: 42

Rep: Reputation: 2
can you try this ..

# Input file

Code:
$ cat curosity.txt 

hello,world,hds,[84,198,90],hdi,[89,92,200]
# Bash script

Code:
$ cat curosity.sh

#!/bin/bash

# name of the input file goes here

filename='curosity.txt'

# define labels

LABELS=(a: b: v: f: j: k:)

# substitution for delimiters

sed -e 's/\,\[/|[/g' -e 's/],/]|/g' "$filename" > temp_"$filename"

# outer loop

outer_loop=`awk -F"|" < "temp_$filename" '{print NF}'`

# counter for labels

counter=0

# open { for JSON object

echo "{"

for ((counter1=1;counter1<="$outer_loop";counter1++));do
	
	# assumption format [x,y,z ... N] is on even positions

	if ((counter1%2!=0));then

		xyz=`awk -F"|" -v var="$counter1" < "temp_$filename" '{print $var}' | awk -F"," '{print NF}'`

		for ((counter2=1;counter2<="$xyz";counter2++));do
			
			printf "${LABELS[counter]}"
			awk -F"|" -v var="$counter1" < "temp_$filename" '{print $var}' | awk -F"," -v counter2="$counter2" '{print $counter2}'
			counter=$((counter+1))
			
		done

	else
		
		printf "${LABELS[counter]}"
		awk -F"|" -v var="$counter1" < "temp_$filename" '{print $var}'
		counter=$((counter+1))

	fi

done

# open } for JSON object

echo "}"

# remove temporary file

rm temp_"$filename"
# STDOUT

Code:
$ ./curosity.sh 

{
a:hello
b:world
v:hds
f:[84,198,90]
j:hdi
k:[89,92,200]
}
 
Old 10-04-2012, 07:41 PM   #8
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Input file:
Code:
hello,world,hds,[84,198,90],hdi,[89,92,200]
Quoth,the,raven,[2,4,6,8],nevermore,[12,14,16]
Try this:
Code:
sed -e 's/\([^0-9],\)/&~/g' $InFile  \
|awk -F ",~"          \
'{print "{"           \
 "\na: \""$1"\""      \
 "\nb: \""$2"\""      \
 "\nv: \""$3"\""      \
 "\nf: \""$4"\""      \
 "\nj: \""$5"\""      \
 "\nk: \""$6"\""      \
 "\n}"                \
 }'
Daniel B. Martin

Last edited by danielbmartin; 10-04-2012 at 07:50 PM. Reason: Tighten the code, slightly
 
Old 10-05-2012, 09:54 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
As the formatting is trivial once you have the correct fields, if you are using gawk 4+, you can use:
Code:
awk 'BEGIN{FPAT = "[^,]+|\\[[^]]+\\]"}{...}' file
 
Old 10-07-2012, 03:29 PM   #10
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Here's a version that doesn't require any special extensions:

Code:
awk -F'[,][[]|[]][,]?' -v OFS='\n' \
'{ split( $1 , a , "," ); \
print "{" , "a: "a[1] , "b: "a[2] , "v: "a[3] , "f: ["$2"]" , "j: "$3 , "k: ["$4"]" , "}" }'
It sets the field delimiter to ",[" or "],", with the final comma being optional (so that it matches the final bracket). Then it splits the first field into three separate array entries for printing.
 
Old 10-07-2012, 04:40 PM   #11
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by David the H. View Post
Code:
awk -F'[,][[]|[]][,]?' -v OFS='\n' \
'{ split( $1 , a , "," ); \
print "{" , "a: "a[1] , "b: "a[2] , "v: "a[3] , "f: ["$2"]" , "j: "$3 , "k: ["$4"]" , "}" }'
Minor nitpick... OP asked for double-quote marks which your code does not deliver.

Daniel B. Martin
 
Old 10-07-2012, 07:28 PM   #12
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Ah, so there are. But yeah, that's just a problem of formatting the print command. Easy enough to add them in.

It might be a good idea to switch to using printf for this, too.
 
Old 10-08-2012, 06:34 AM   #13
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Just for completeness and slight alternative:
Code:
awk 'BEGIN{FPAT = "[^,]+|\\[[^]]+\\]";split("abvfjk",a,"")}{l="{\n";for(i=1;i<=NF;i++)l=l a[i]":"$i RS;print l "}"}' file
 
Old 10-08-2012, 07:01 AM   #14
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by grail View Post
Just for completeness and slight alternative:
Code:
awk 'BEGIN{FPAT = "[^,]+|\\[[^]]+\\]";split("abvfjk",a,"")}{l="{\n";for(i=1;i<=NF;i++)l=l a[i]":"$i RS;print l "}"}' file
Please check this. It doesn't work here. I get an "a:" but the characters b:, v:, f:, j:, and k: never appear in the output.

Daniel B. Martin
 
Old 10-08-2012, 09:02 AM   #15
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Works just fine for me using your example
Code:
grail@pilgrim:~$ awk 'BEGIN{FPAT = "[^,]+|\\[[^]]+\\]";split("abvfjk",a,"")}{l="{\n";for(i=1;i<=NF;i++)l=l a[i]":"$i RS;print l "}"}' f2
{
a:hello
b:world
v:hds
f:[84,198,90]
j:hdi
k:[89,92,200]
}
{
a:Quoth
b:the
v:raven
f:[2,4,6,8]
j:nevermore
k:[12,14,16]
}
grail@pilgrim:~$ awk --version
GNU Awk 4.0.1
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] AWK / SED - Parsing a CSV file with comma delimiter, and some extra needs. PenguinJr Programming 8 05-24-2011 06:28 PM
zsh completion and square brackets Stigius Linux - Software 0 06-27-2010 09:39 AM
Two square brackets in bash condition tirengarfio Programming 1 07-07-2009 12:36 PM
Parsing a comma separated CSV file where fields have commas in to trickyflash Linux - General 7 03-26-2009 03:30 PM
Konqueror and Square Brackets in Folder/File Names fortezza Linux - Software 2 12-19-2005 10:10 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:38 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration