[SOLVED] Bash scripting problem: CSV list of filenames and actual files compare; Issue with plus symbol
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Bash scripting problem: CSV list of filenames and actual files compare; Issue with plus symbol
Dear Bash geniuses,
I have tried to write a bash script to compare a list of filenames in a CSV and compare them with a list in a directory. Some of the filenames are like so:
With the first filename, it seem to get string comparison issues as it says it doesn't exist (when I know it does); the comparison does not work in both the CSV string and in directory string comparison checks, for that file.
I have tried to remove most of the special characters using sed (as seen below), but with no avail. I do need them to exist, however, as a concrete way of detecting that the file exists.
Any help would be most appreciated.
Code:
#!/bin/bash
#check to see if they have equal package numbers (in the csv and in the directory)
real_file_count=`ls ./packages | wc -l`
csv_file_count=`awk -F, 'NR>1 {print $1}' packages.csv | wc -l`
echo "Number of actual files existing is $real_file_count"
echo "Number of files in the csv is $csv_file_count"
if [ $real_file_count -eq $csv_file_count ]
then
echo "Counts equal"
else
echo "Warning: not the same number of packs"
fi
#check that all the files mentioned in csv file are in the directory
#check that all the directory files are in the csv
#make arrays of packs
i=0
for line in `awk -F, 'NR>1 {print $1}' packages.csv | sed 's/[\.]/\\./g' | sed 's/[\-]/\\-/g' | sed 's/[\_]/\\_/g' | sed 's/[\+]/\\+/g'`; do
arrCSV[$i]=$line
i=`expr $i + 1`
done
#reset i
i=0
for line in `ls ./packages | sed "s/.zip//g" | sed 's/[\.]/\\./g' | sed 's/[\-]/\\-/g' | sed 's/[\_]/\\_/g' | sed 's/[\+]/\\+/g'`; do
arrPacks[$i]=$line
i=`expr $i + 1`
done
echo "Array of CSV packs; compare with directory packs:"
#reset i
i=1
for pack in ${arrCSV[@]:0}; do
echo "-$i:$pack-"
i=`expr $i + 1`
compare_result="No"
for packCompare in ${arrPacks[@]:0}; do
#echo "Does $packCompare equal $pack"
if [ "$packCompare" == "$pack" ]; then
compare_result="Yes"
fi
done
echo $compare_result
if [ "$compare_result" == "Yes" ]; then
echo "Exists in directory"
else
echo "Not in directory!"
fi
done
echo "Array of packs in the directory; compare with CSV packs:"
#echo ${arrPacks[@]:0}
i=1
for pack in ${arrPacks[@]:0}; do
echo "-$i:$pack-"
i=`expr $i + 1`
compare_result="No"
for packCompare in ${arrCSV[@]:0}; do
if [ "$packCompare" == "$pack" ]; then
compare_result="Yes"
fi
done
echo $compare_result
if [ "$compare_result" == "Yes" ]; then
echo "Exists in CSV"
else
echo "Not in CSV!"
fi
done
Actually it works for me, I don't get any issue with the string comparison. I tested both your script and a simplified version, without all the escaping and sed stuff:
Code:
#!/bin/bash
#
arrCSV=( $(awk -F, 'NR > 1 {print $1}' packages.csv) )
arrPacks=( $(echo packages/*) )
for pack in ${arrCSV[@]:0}
do
compare_result=No
for packCompare in ${arrPacks[@]:0}
do
packCompare=${packCompare#packages/}
packCompare=${packCompare%.zip}
if [[ "$packCompare" == "$pack" ]]
then
compare_result=Yes
fi
done
echo $pack: $compare_result
done
using your csv file and having a packages directory with the following files in it:
Code:
$ ls -1 packages
A-File-for-checking-8.3+.zip
Another_File_for_checking.zip
Yet_Another_File_for_checking.zip
Number of actual files existing is 3
Number of files in the csv is 3
Counts equal
Array of CSV packs; compare with directory packs:
-1:A-File-for-checking-8.3+-
Yes
Exists in directory
-2:Another_File_for_checking-
Yes
Exists in directory
-3:Yet_Another_File_for_checking-
Yes
Exists in directory
Array of packs in the directory; compare with CSV packs:
-1:A-File-for-checking-8.3+-
Yes
Exists in CSV
-2:Another_File_for_checking-
Yes
Exists in CSV
-3:Yet_Another_File_for_checking-
Yes
Exists in CSV
I don't see any issue here. Can you show the output where you did see issues with the first comparison? Basically it performs a plain string comparison and special characters should not be an issue, whereas the variable names are enclosed in double quotes. It would be different if you used the regular expression matching operator =~ but this is not the case.
Thank you for your response.
It seems it was my fault . I think I find bash script hard to debug and read (not used to it!). I thought it was a string comparison issue. I saw the issue when I rewrote the script in python. For some reason the files and the csv names were different. I assumed the filenames were the same (had a '_' not a '.' in the '8.3+' part, e.g. A-File-for-checking-8.3+.zip was names A-File-for-checking-8_3+).
Quote:
Assumptions lead to major fsck ups
Still, I created a new Python script (good to reflex my Python mental muscles again ). It requires an additional mapping file (to convert csv names to the correct filenames).
I think due to my experience and background, it took me two days to fiddle the bash script to work for me, and it only took me two hours to knock this up:
Code:
#! /usr/bin/python
# use /usr/local/bin/python if actual path
'''
Use chmod to set the file permissions on your script to make it executable. If the script is for you alone, type chmod 0700 scriptfilename.py; if you want to share it with others in your group but not let them edit it, use 0750 as the chmod value; if you want to give access to everyone else, use the value 0755. For help with the chmod command, type man chmod.
'''
#import for csv access
import csv
#import os so you get the os access methods (directory listing)
import os
#need additional csv for file conversions i.e. ONLY need two columns INDIR,INCSV
#used for injection into csv array with correct filename
files2namescsv="FilesMatchToName.csv"
injectCorrectFilenameDict = {}
try:
test = open(files2namescsv, 'r').read() #find the file
except:
# if the file cant be found if there is an error
print("Could not open the csv file (1)")
else:
with open(files2namescsv) as f:
rows = csv.reader(f, delimiter=',')
next(rows, None) #skip header
for row in rows:
injectCorrectFilenameDict.update({row[1]:row[0]}) #CSV NAME AS KEY, THEN POINT TO FILENAME AS VALUE (TWISTS DATA FOR INJECT)
filenameArr = []
csvfile = "packages.csv"
try:
test = open(csvfile, 'r').read() #find the file
except:
# if the file cant be found if there is an error
print("Could not open the csv file (2)")
else:
with open(csvfile) as f:
rows = csv.reader(f, delimiter=',')
next(rows, None) #skip header
for row in rows:
filenameArr.append(row[0])
#get directory listing
# set a directory you're interested in
workingDir = r'packages'
directoryArr = []
# get a list of all the files in the directory
try:
names = os.listdir(workingDir)
except:
print("Unable to list files in directory")
else:
# look at each file and print name
for name in names:
fileExt = os.path.splitext(name)[-1]
#print name.replace(fileExt,'')
directoryArr.append(name.replace(fileExt,''))
#print fileExt
#inject correct filename for CSV names
print "INJECTION FOR CSV"
for index, item in enumerate(filenameArr):
if item in injectCorrectFilenameDict:
print "Convert %s to %s" % (item, injectCorrectFilenameDict[item])
filenameArr[index] = injectCorrectFilenameDict[item]
#check
for item in filenameArr:
print "First for %s the item is %s" % (csvfile, item)
#check if in directory array
if item in directoryArr:
print "IN DIR"
else:
print "NOT IN DIR"
#check directory file names
for item in directoryArr:
print "Second method for dir %s the item is %s" % (workingDir, item)
#check if in csv file
if item in filenameArr:
print "IN CSV"
else:
print "NOT IN CSV"
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.