LinuxQuestions.org - split comma and square brackets in csv file

Page 1 of 2

Show 50 post(s) from this thread on one page

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - split comma and square brackets in csv file (https://www.linuxquestions.org/questions/programming-9/split-comma-and-square-brackets-in-csv-file-4175430553/)

curosity

10-04-2012 03:46 PM

split comma and square brackets in csv file

Hi,

I have csv file with the fields in the format

hello,world,hds,[84,198,90],hdi,[89,92,200]

The elements within square brackets could vary from 0 to N(0..100). How do I split the file using awk?

I have tried different variations, but I cannot get it working.

`awk -F '[][,]' '{print "{ a:\""$1"\", b:\""$2"\", v:\""$3"\",f:"$4", j:"$5", k:"$6"}"}

Before :
hello,world,hds,[84,198,90],hdi,[89,92,200]

After is JSON object
{
a: "hello"
b: "world"
v: "hds"
f: [84,198,90]
j: "hdi"
k: [89,92,200]
}

Any help is appreciated,
Thanks,

schneidz

10-04-2012 03:49 PM

before and after please.

danielbmartin

10-04-2012 05:02 PM

Have:

Code:

hello,world,hds,[84,198,90],hdi,[89,92,200]

Want:

Code:

{

  a: "hello"

  b: "world"

  v: "hds"

  f: [84,198,90]

  j: "hdi"

  k: [89,92,200]

}

Try this:

Code:

echo "hello,world,hds,[84,198,90],hdi,[89,92,200]"  \

|awk -F ","                \

'{print "{"                \

      "\na: \""$1"\""      \

      "\nb: \""$2"\""      \

      "\nv: \""$3"\""      \

      "\nf: "$4","$5","$6  \

      "\nj: \""$7"\""      \

      "\nk: "$8","$9","$10 \

      "\n}"                \

 }'

Daniel B. Martin

schneidz

10-04-2012 05:36 PM

^ thanks for the translation dannyb, the brackets can have between 0 or 100 items...

curosity

10-04-2012 05:54 PM

Thanks dannyb, but elements within square brackets could vary. I don't care much for formatting, just that it has to be valid JSON format.

rknichols

10-04-2012 06:45 PM

Try this:

Code:

[rkn] ~ $ cat try.awk

BEGIN { expr = "^([^,]*),([^,]*),([^,]*),(\\[[^]]*\\]),([^,]*),(\\[[^]]*\\])$" }

{

    if(match($0, expr, aa)) {

        for(n=1; n <= 6; ++n)  print "field" n ": \"" aa[n] "\""

    }

    else {

        print "No match on line " FNR ":", $0 >"/dev/stderr"

    }

}

[rkn] ~ $ awk -f try.awk

hello,world,hds,[84,198,90],hdi,[89,92,200]

field1: "hello"

field2: "world"

field3: "hds"

field4: "[84,198,90]"

field5: "hdi"

field6: "[89,92,200]"

[rkn] ~ $

Explanation:
The expression consists of 6 subexpressions contained in parentheses, with a comma between each subexp. Subexpressions 1, 2, 3, and 5 just match any number of characters that are not a comma. Subexps 2 and 4 match a literal '[' followed by any number of characters that are not a ']' and then a literal ']'. The match() function stores in elements of array aa the characters that match each parenthesized subexpression.

ip_address

10-04-2012 07:37 PM

can you try this ..

# Input file

Code:

$ cat curosity.txt 



hello,world,hds,[84,198,90],hdi,[89,92,200]

# Bash script

Code:

$ cat curosity.sh



#!/bin/bash



# name of the input file goes here



filename='curosity.txt'



# define labels



LABELS=(a: b: v: f: j: k:)



# substitution for delimiters



sed -e 's/\,\[/|[/g' -e 's/],/]|/g' "$filename" > temp_"$filename"



# outer loop



outer_loop=`awk -F"|" < "temp_$filename" '{print NF}'`



# counter for labels



counter=0



# open { for JSON object



echo "{"



for ((counter1=1;counter1<="$outer_loop";counter1++));do

        

        # assumption format [x,y,z ... N] is on even positions



        if ((counter1%2!=0));then



                xyz=`awk -F"|" -v var="$counter1" < "temp_$filename" '{print $var}' | awk -F"," '{print NF}'`



                for ((counter2=1;counter2<="$xyz";counter2++));do

                        

                        printf "${LABELS[counter]}"

                        awk -F"|" -v var="$counter1" < "temp_$filename" '{print $var}' | awk -F"," -v counter2="$counter2" '{print $counter2}'

                        counter=$((counter+1))

                        

                done



        else

                

                printf "${LABELS[counter]}"

                awk -F"|" -v var="$counter1" < "temp_$filename" '{print $var}'

                counter=$((counter+1))



        fi



done



# open } for JSON object



echo "}"



# remove temporary file



rm temp_"$filename"

# STDOUT

Code:

$ ./curosity.sh 



{

a:hello

b:world

v:hds

f:[84,198,90]

j:hdi

k:[89,92,200]

}

danielbmartin

10-04-2012 07:41 PM

Input file:

Code:

hello,world,hds,[84,198,90],hdi,[89,92,200]

Quoth,the,raven,[2,4,6,8],nevermore,[12,14,16]

Try this:

Code:

sed -e 's/\([^0-9],\)/&~/g' $InFile  \

|awk -F ",~"          \

'{print "{"          \

 "\na: \""$1"\""      \

 "\nb: \""$2"\""      \

 "\nv: \""$3"\""      \

 "\nf: \""$4"\""      \

 "\nj: \""$5"\""      \

 "\nk: \""$6"\""      \

 "\n}"                \

 }'

Daniel B. Martin

grail

10-05-2012 09:54 AM

As the formatting is trivial once you have the correct fields, if you are using gawk 4+, you can use:

Code:

awk 'BEGIN{FPAT = "[^,]+|\\[[^]]+\\]"}{...}' file

David the H.

10-07-2012 03:29 PM

Here's a version that doesn't require any special extensions:

Code:

awk -F'[,][[]|[]][,]?' -v OFS='\n' \

'{ split( $1 , a , "," ); \

print "{" , "a: "a[1] , "b: "a[2] , "v: "a[3] , "f: ["$2"]" , "j: "$3 , "k: ["$4"]" , "}" }'

It sets the field delimiter to ",[" or "],", with the final comma being optional (so that it matches the final bracket). Then it splits the first field into three separate array entries for printing.

danielbmartin

10-07-2012 04:40 PM

Quote:

Originally Posted by David the H. (Post 4799652)

Code:

awk -F'[,][[]|[]][,]?' -v OFS='\n' \

'{ split( $1 , a , "," ); \

print "{" , "a: "a[1] , "b: "a[2] , "v: "a[3] , "f: ["$2"]" , "j: "$3 , "k: ["$4"]" , "}" }'

Minor nitpick... OP asked for double-quote marks which your code does not deliver.

Daniel B. Martin

David the H.

10-07-2012 07:28 PM

Ah, so there are. But yeah, that's just a problem of formatting the print command. Easy enough to add them in.

It might be a good idea to switch to using printf for this, too.

grail

10-08-2012 06:34 AM

Just for completeness and slight alternative:

Code:

awk 'BEGIN{FPAT = "[^,]+|\\[[^]]+\\]";split("abvfjk",a,"")}{l="{\n";for(i=1;i<=NF;i++)l=l a[i]":"$i RS;print l "}"}' file

danielbmartin

10-08-2012 07:01 AM

Quote:

Originally Posted by grail (Post 4800142)

Just for completeness and slight alternative:

Code:

awk 'BEGIN{FPAT = "[^,]+|\\[[^]]+\\]";split("abvfjk",a,"")}{l="{\n";for(i=1;i<=NF;i++)l=l a[i]":"$i RS;print l "}"}' file

Please check this. It doesn't work here. I get an "a:" but the characters b:, v:, f:, j:, and k: never appear in the output.

Daniel B. Martin

grail

10-08-2012 09:02 AM

Works just fine for me using your example :)

Code:

grail@pilgrim:~$ awk 'BEGIN{FPAT = "[^,]+|\\[[^]]+\\]";split("abvfjk",a,"")}{l="{\n";for(i=1;i<=NF;i++)l=l a[i]":"$i RS;print l "}"}' f2

{

a:hello

b:world

v:hds

f:[84,198,90]

j:hdi

k:[89,92,200]

}

{

a:Quoth

b:the

v:raven

f:[2,4,6,8]

j:nevermore

k:[12,14,16]

}

grail@pilgrim:~$ awk --version

GNU Awk 4.0.1

All times are GMT -5. The time now is 06:02 PM.

Page 1 of 2

Show 50 post(s) from this thread on one page