LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   split comma and square brackets in csv file (https://www.linuxquestions.org/questions/programming-9/split-comma-and-square-brackets-in-csv-file-4175430553/)

curosity 10-04-2012 03:46 PM

split comma and square brackets in csv file
 
Hi,

I have csv file with the fields in the format

hello,world,hds,[84,198,90],hdi,[89,92,200]

The elements within square brackets could vary from 0 to N(0..100). How do I split the file using awk?

I have tried different variations, but I cannot get it working.

`awk -F '[][,]' '{print "{ a:\""$1"\", b:\""$2"\", v:\""$3"\",f:"$4", j:"$5", k:"$6"}"}

Before :
hello,world,hds,[84,198,90],hdi,[89,92,200]

After is JSON object
{
a: "hello"
b: "world"
v: "hds"
f: [84,198,90]
j: "hdi"
k: [89,92,200]
}



Any help is appreciated,
Thanks,

schneidz 10-04-2012 03:49 PM

before and after please.

danielbmartin 10-04-2012 05:02 PM

Have:
Code:

hello,world,hds,[84,198,90],hdi,[89,92,200]
Want:
Code:

{
  a: "hello"
  b: "world"
  v: "hds"
  f: [84,198,90]
  j: "hdi"
  k: [89,92,200]
}

Try this:
Code:

echo "hello,world,hds,[84,198,90],hdi,[89,92,200]"  \
|awk -F ","                \
'{print "{"                \
      "\na: \""$1"\""      \
      "\nb: \""$2"\""      \
      "\nv: \""$3"\""      \
      "\nf: "$4","$5","$6  \
      "\nj: \""$7"\""      \
      "\nk: "$8","$9","$10 \
      "\n}"                \
 }'

Daniel B. Martin

schneidz 10-04-2012 05:36 PM

^ thanks for the translation dannyb, the brackets can have between 0 or 100 items...

curosity 10-04-2012 05:54 PM

Thanks dannyb, but elements within square brackets could vary. I don't care much for formatting, just that it has to be valid JSON format.

rknichols 10-04-2012 06:45 PM

Try this:
Code:

[rkn] ~ $ cat try.awk
BEGIN { expr = "^([^,]*),([^,]*),([^,]*),(\\[[^]]*\\]),([^,]*),(\\[[^]]*\\])$" }
{
    if(match($0, expr, aa)) {
        for(n=1; n <= 6; ++n)  print "field" n ": \"" aa[n] "\""
    }
    else {
        print "No match on line " FNR ":", $0 >"/dev/stderr"
    }
}
[rkn] ~ $ awk -f try.awk
hello,world,hds,[84,198,90],hdi,[89,92,200]
field1: "hello"
field2: "world"
field3: "hds"
field4: "[84,198,90]"
field5: "hdi"
field6: "[89,92,200]"
[rkn] ~ $

Explanation:
The expression consists of 6 subexpressions contained in parentheses, with a comma between each subexp. Subexpressions 1, 2, 3, and 5 just match any number of characters that are not a comma. Subexps 2 and 4 match a literal '[' followed by any number of characters that are not a ']' and then a literal ']'. The match() function stores in elements of array aa the characters that match each parenthesized subexpression.

ip_address 10-04-2012 07:37 PM

can you try this ..

# Input file

Code:

$ cat curosity.txt

hello,world,hds,[84,198,90],hdi,[89,92,200]

# Bash script

Code:

$ cat curosity.sh

#!/bin/bash

# name of the input file goes here

filename='curosity.txt'

# define labels

LABELS=(a: b: v: f: j: k:)

# substitution for delimiters

sed -e 's/\,\[/|[/g' -e 's/],/]|/g' "$filename" > temp_"$filename"

# outer loop

outer_loop=`awk -F"|" < "temp_$filename" '{print NF}'`

# counter for labels

counter=0

# open { for JSON object

echo "{"

for ((counter1=1;counter1<="$outer_loop";counter1++));do
       
        # assumption format [x,y,z ... N] is on even positions

        if ((counter1%2!=0));then

                xyz=`awk -F"|" -v var="$counter1" < "temp_$filename" '{print $var}' | awk -F"," '{print NF}'`

                for ((counter2=1;counter2<="$xyz";counter2++));do
                       
                        printf "${LABELS[counter]}"
                        awk -F"|" -v var="$counter1" < "temp_$filename" '{print $var}' | awk -F"," -v counter2="$counter2" '{print $counter2}'
                        counter=$((counter+1))
                       
                done

        else
               
                printf "${LABELS[counter]}"
                awk -F"|" -v var="$counter1" < "temp_$filename" '{print $var}'
                counter=$((counter+1))

        fi

done

# open } for JSON object

echo "}"

# remove temporary file

rm temp_"$filename"

# STDOUT

Code:

$ ./curosity.sh

{
a:hello
b:world
v:hds
f:[84,198,90]
j:hdi
k:[89,92,200]
}


danielbmartin 10-04-2012 07:41 PM

Input file:
Code:

hello,world,hds,[84,198,90],hdi,[89,92,200]
Quoth,the,raven,[2,4,6,8],nevermore,[12,14,16]

Try this:
Code:

sed -e 's/\([^0-9],\)/&~/g' $InFile  \
|awk -F ",~"          \
'{print "{"          \
 "\na: \""$1"\""      \
 "\nb: \""$2"\""      \
 "\nv: \""$3"\""      \
 "\nf: \""$4"\""      \
 "\nj: \""$5"\""      \
 "\nk: \""$6"\""      \
 "\n}"                \
 }'

Daniel B. Martin

grail 10-05-2012 09:54 AM

As the formatting is trivial once you have the correct fields, if you are using gawk 4+, you can use:
Code:

awk 'BEGIN{FPAT = "[^,]+|\\[[^]]+\\]"}{...}' file

David the H. 10-07-2012 03:29 PM

Here's a version that doesn't require any special extensions:

Code:

awk -F'[,][[]|[]][,]?' -v OFS='\n' \
'{ split( $1 , a , "," ); \
print "{" , "a: "a[1] , "b: "a[2] , "v: "a[3] , "f: ["$2"]" , "j: "$3 , "k: ["$4"]" , "}" }'

It sets the field delimiter to ",[" or "],", with the final comma being optional (so that it matches the final bracket). Then it splits the first field into three separate array entries for printing.

danielbmartin 10-07-2012 04:40 PM

Quote:

Originally Posted by David the H. (Post 4799652)
Code:

awk -F'[,][[]|[]][,]?' -v OFS='\n' \
'{ split( $1 , a , "," ); \
print "{" , "a: "a[1] , "b: "a[2] , "v: "a[3] , "f: ["$2"]" , "j: "$3 , "k: ["$4"]" , "}" }'


Minor nitpick... OP asked for double-quote marks which your code does not deliver.

Daniel B. Martin

David the H. 10-07-2012 07:28 PM

Ah, so there are. But yeah, that's just a problem of formatting the print command. Easy enough to add them in.

It might be a good idea to switch to using printf for this, too.

grail 10-08-2012 06:34 AM

Just for completeness and slight alternative:
Code:

awk 'BEGIN{FPAT = "[^,]+|\\[[^]]+\\]";split("abvfjk",a,"")}{l="{\n";for(i=1;i<=NF;i++)l=l a[i]":"$i RS;print l "}"}' file

danielbmartin 10-08-2012 07:01 AM

Quote:

Originally Posted by grail (Post 4800142)
Just for completeness and slight alternative:
Code:

awk 'BEGIN{FPAT = "[^,]+|\\[[^]]+\\]";split("abvfjk",a,"")}{l="{\n";for(i=1;i<=NF;i++)l=l a[i]":"$i RS;print l "}"}' file

Please check this. It doesn't work here. I get an "a:" but the characters b:, v:, f:, j:, and k: never appear in the output.

Daniel B. Martin

grail 10-08-2012 09:02 AM

Works just fine for me using your example :)
Code:

grail@pilgrim:~$ awk 'BEGIN{FPAT = "[^,]+|\\[[^]]+\\]";split("abvfjk",a,"")}{l="{\n";for(i=1;i<=NF;i++)l=l a[i]":"$i RS;print l "}"}' f2
{
a:hello
b:world
v:hds
f:[84,198,90]
j:hdi
k:[89,92,200]
}
{
a:Quoth
b:the
v:raven
f:[2,4,6,8]
j:nevermore
k:[12,14,16]
}
grail@pilgrim:~$ awk --version
GNU Awk 4.0.1



All times are GMT -5. The time now is 06:02 PM.