trying to write shell snip to import CSV data into BASH array
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Thanks for the reply, but how do I get the first, second, third, etc. lines of the CSV file into $var1?
So if I issue:
host# cat ./mydatafile.csv
record1,item1,item2,item3,item4
record2,item1,item2,item3,item4
record3,item1,item2,item3,item4
record4,item1,item2,item3,item4
host#
Not sure how to stream this into the var1, var2, var3, var4, etc.
Following is what I have so far ... and it seems to produce the desired output. But .... this is only a one line CSV file (i.e. record1) and I still am having some trouble getting the first element (record1) set as the array name. I tried to rework the snip with the "eval" suggestion and it didn't seem to work properly.
Sorry for all the newbie ?s people ... I really do appreciate all the help!!!!
#!/bin/bash
testfile="./test-dev/test-global-data-import.csv"
var1=`cat $testfile`
IFS=',' record1=( ${var1} )
echo ""
echo "Here is the data without the comma separators."
echo ""
echo ${record1[*]:0}
echo ""
echo "Here is the data native (with comma separators)."
echo ""
echo "${record1[*]:0}"
echo ""
First of all, what you are doing sounds a bit strange. You might get much better results by if you told us what you wanted to ultimately accomplish; we might know much more efficient approaches. For example, I suspect your next question is 'How do I find out which fields I read in the while loop?'
Please, introduce yourself to the Bash Reference Manual first. At least the introductory chapters. It really is worth the effort.
Also note that GNU Awk is much more suitable for processing tabulated data. It has an atypical approach to processing its input, but after you understand it, it is rather easy and powerful.
However, to encourage you, here is a working example:
Code:
#!/bin/bash
inputfile="./test-dev/test-global-data-import.csv"
OLDIFS="$IFS"
IFS=","
while read NAME VALUES ; do
eval "$NAME=($VALUES)"
done < "$inputfile"
IFS="$OLDIFS"
And here are some notes to help you on your way:
Save and restore the field separator, unless you work with exclusively CSV data.
The while loop reads each line (record), until there is nothing else to read.
Read saves the first field to NAME and all other fields to VALUES. This is how 'read' always works; the last variable will get the rest of the record, even if it contains multiple fields.
eval evaluates the quoted string. What happens is that all variable references are expanded first, then the result is executed just as if it were normal code.
The redirection is done at the end of the while loop, so that the input is available for the entire loop construct, without a subshell. (If you use cat "$inputfile" | while ..., the variables will not be accessible after the loop, because they are assigned in the subshell, not the actual shell.)
If you don't know what a subshell is, please read this and this.
${#record[@]} tells you the number of fields in the array stored in variable record.
${#record} tells you the total number of bytes in the array record.
${record[0]} is the first field in the array, ${record[1]} second and so on. You can use even ${record[N]} where N is a variable.
"${record[@]}" expands each field as a separate quoted strings, "${record[*]}" expands all fields into a single string.
Hope this helps,
Nominal Animal
Last edited by Nominal Animal; 03-21-2011 at 01:47 AM.
#!/bin/bash
inputfile="./test-dev/test-global-data-import.csv"
OLDIFS="$IFS"
IFS=","
while read NAME VALUES ; do
eval "$NAME=($VALUES)"
done < "$inputfile"
IFS="$OLDIFS"
Thanks for this ... I was in fact doing this with awk before, but found it to cumbersome to change the files for each individual device platform.
Okay, so here is the 10k fly by ....
I have one group of persons inputting data about sites into a spreadsheet and saving it as someinputdata.csv. Then, I have several template files with VAR_1, VAR_2, VAR_3, and so on declared. I am trying to assign the CSV data to array elements so I can pluck out the data I need with a sed statement like this ...
I really thought it was a simple problem to solve. One group of people is collecting data and one group will be mass producing network devices with the cfg files.
+SNIP+
# set the template variable
TEMPLATE="some text file with VAR_1 in it"
# code here to insert the CSV data into the array ....
..... MISSING ......
# set the variable to the array element
VAR_1=$record1[12]
#
cat $TEMPLATE | sed -e "s/VAR_1/$VAR_1/"g | tee ./output/$VAR_1.cfg >/dev/null 2>&1
Oh, its basically just a record-driven template engine.
Think from the other end instead: When you generate a single new file, which pieces of information you need?
Obviously you need the template for that file. What else, what about the data?
Is it just one record (one line) from the CSV file, or do you need to collate the data from multiple records?
After that is sorted out, it really is simple to extend to generate a batch of files.
Assuming it's just one record, you could do something like this:
Code:
#!/bin/bash
DATAFILE="path-to-the-CSV-file"
TEMPLATE="path-to-the-template"
NAME="The name of the record in the CSV file, i.e. the text in the first field"
# Get only the relevant record from the CSV data
OLDIFS="$IFS"
IFS=","
DATA=(`sed -ne "s|^$NAME,||p" "$DATAFILE"`)
IFS="$OLDIFS"
# Construct a list of corresponding field names.
# These are replaced in the template with corresponding CSV data fields.
# You could even read this from a config file -- or even from the CSV file, if it has a header row.
FIELDS=('FIRST' 'SECOND' 'THIRD' 'FOURTH')
# Construct a sed pattern to replace field names with data fields
PATTERN=""
I=0
for FIELD in "${FIELDS[@]}" ; do
# Data string from the CSV record:
STRING="${DATA[I]}"
I=$[I+1]
# TODO: Escape any characters that might bork up sed.
STRING="${STRING//\\/\\\\}"
STRING="${STRING//&/\\&}"
FIELD="${FIELD//\\/\\\\}"
FIELD="${FIELD//$/\\$}"
# Using | as the separator. Escape those too.
FIELD="${FIELD//|/\\|}"
STRING="${STRING//|/\\|}"
PATTERN="$PATTERN;s|$FIELD|$STRING|g"
done
PATTERN="${PATTERN#;}"
# Output the processed template in one fell swoop.
sed -e "$PATTERN" "$TEMPLATE"
I've only quick-tested the above, but it seems to work okay. You'll probably have to escape more characters than above, otherwise sed will bork. (At least {, }, [, ], and ^.)
Nominal Animal
Last edited by Nominal Animal; 03-21-2011 at 01:46 AM.
Theoretically, we will never need data from multiple records. Each record holds all the data for each device being configured. Each site will have multiple devices at it. So ideally we would like to pick different data out of the record based on what model of device we are configuring - having only one CSV source.
Example:
DEVICES:
device model A
device model B
device model C
So lets say for instance that all three devices (A,B,C) need the "sitename" in their configuration.
device-model-A only will use var1,2,3,7,9 in its configuration
device-model-B will only use var1,3,5,7,9 in its configuration
device-model-C will only use var10 in its configuration
and to top it all off the output configuration files should carry the sitename.cfg file name.
We had been doing it with a separate CSV file and a separate temple for each device model (I realize that the template file will always be different based on the device model). I could only pull off the most basic of sed commands within a script and could never seem to be able to put "ALL" information in one CSV source and execute a script against it to get the required data out.
Thank you for all of your help on this ... I will keep plugging and learning as I go. You've given me much to look at - THANKS AGAIN!!!!
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.