LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 08-05-2013, 08:09 AM   #1
gn000we
Member
 
Registered: Jan 2003
Location: UK
Distribution: Ubuntu
Posts: 33

Rep: Reputation: 16
Question Bash scripting problem: CSV list of filenames and actual files compare; Issue with plus symbol


Dear Bash geniuses,

I have tried to write a bash script to compare a list of filenames in a CSV and compare them with a list in a directory. Some of the filenames are like so:

Hello-this-is-an-example-filename.maybe.dots._8.3+
Another_file
A_test_file

And the files in the directory are:

Hello-this-is-an-example-filename.maybe.dots._8.3+.zip
Another_file.zip
A_test_file.zip


With the first filename, it seem to get string comparison issues as it says it doesn't exist (when I know it does); the comparison does not work in both the CSV string and in directory string comparison checks, for that file.

I have tried to remove most of the special characters using sed (as seen below), but with no avail. I do need them to exist, however, as a concrete way of detecting that the file exists.

Any help would be most appreciated.

Code:
#!/bin/bash


#check to see if they have equal package numbers (in the csv and in the directory)

real_file_count=`ls ./packages | wc -l`
csv_file_count=`awk -F, 'NR>1 {print $1}' packages.csv | wc -l`

echo "Number of actual files existing is $real_file_count"
echo "Number of files in the csv is $csv_file_count"

if [ $real_file_count -eq $csv_file_count ]
then
	echo "Counts equal"
else
	echo "Warning: not the same number of packs"
fi
#check that all the files mentioned in csv file are in the directory
#check that all the directory files are in the csv

#make arrays of packs
i=0

for line in `awk -F, 'NR>1 {print $1}' packages.csv | sed 's/[\.]/\\./g' | sed 's/[\-]/\\-/g' | sed 's/[\_]/\\_/g' | sed 's/[\+]/\\+/g'`; do
	arrCSV[$i]=$line
	i=`expr $i + 1`
done

#reset i
i=0

for line in `ls ./packages | sed "s/.zip//g" | sed 's/[\.]/\\./g' | sed 's/[\-]/\\-/g' | sed 's/[\_]/\\_/g' | sed 's/[\+]/\\+/g'`; do
	arrPacks[$i]=$line
	i=`expr $i + 1`
done


echo "Array of CSV packs; compare with directory packs:"

#reset i
i=1

for pack in ${arrCSV[@]:0}; do
	echo "-$i:$pack-"
	i=`expr $i + 1`
	compare_result="No"
	for packCompare in ${arrPacks[@]:0}; do
		#echo "Does $packCompare equal $pack"
		if [ "$packCompare" == "$pack" ]; then
			compare_result="Yes"
		fi
	done
	echo $compare_result
	if [ "$compare_result" == "Yes" ]; then
		echo "Exists in directory"
	else
		echo "Not in directory!"
	fi
done

echo "Array of packs in the directory; compare with CSV packs:"
#echo ${arrPacks[@]:0}

i=1

for pack in ${arrPacks[@]:0}; do
	echo "-$i:$pack-"
	i=`expr $i + 1`
	compare_result="No"
	for packCompare in ${arrCSV[@]:0}; do
		if [ "$packCompare" == "$pack" ]; then
			compare_result="Yes"
		fi
	done
	echo $compare_result
	if [ "$compare_result" == "Yes" ]; then
		echo "Exists in CSV"
	else
		echo "Not in CSV!"
	fi
done
Csv file:

Quote:
Package,Submission date,count,enabled,active
A-File-for-checking-8.3+,2011-09-29 22:56:21.70,557,557,True
Another_File_for_checking,2011-09-29 22:57:44.67,6,6,True
Yet_Another_File_for_checking,2011-09-29 22:57:45.93,6,6,True

Last edited by gn000we; 08-05-2013 at 06:58 PM. Reason: needed clearer title and clarify names of files
 
Old 08-05-2013, 10:10 AM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Actually it works for me, I don't get any issue with the string comparison. I tested both your script and a simplified version, without all the escaping and sed stuff:
Code:
#!/bin/bash
#
arrCSV=( $(awk -F, 'NR > 1 {print $1}' packages.csv) )
arrPacks=( $(echo packages/*) )

for pack in ${arrCSV[@]:0}
do
  compare_result=No
  for packCompare in ${arrPacks[@]:0}
  do
    packCompare=${packCompare#packages/}
    packCompare=${packCompare%.zip}
    if [[ "$packCompare" == "$pack" ]]
    then
      compare_result=Yes
    fi
  done
  echo $pack: $compare_result
done
using your csv file and having a packages directory with the following files in it:
Code:
$ ls -1 packages
A-File-for-checking-8.3+.zip
Another_File_for_checking.zip
Yet_Another_File_for_checking.zip
Here is the output from my script
Code:
A-File-for-checking-8.3+: Yes
Another_File_for_checking: Yes
Yet_Another_File_for_checking: Yes
and this is the output from yours
Code:
Number of actual files existing is 3
Number of files in the csv is 3
Counts equal
Array of CSV packs; compare with directory packs:
-1:A-File-for-checking-8.3+-
Yes
Exists in directory
-2:Another_File_for_checking-
Yes
Exists in directory
-3:Yet_Another_File_for_checking-
Yes
Exists in directory
Array of packs in the directory; compare with CSV packs:
-1:A-File-for-checking-8.3+-
Yes
Exists in CSV
-2:Another_File_for_checking-
Yes
Exists in CSV
-3:Yet_Another_File_for_checking-
Yes
Exists in CSV
I don't see any issue here. Can you show the output where you did see issues with the first comparison? Basically it performs a plain string comparison and special characters should not be an issue, whereas the variable names are enclosed in double quotes. It would be different if you used the regular expression matching operator =~ but this is not the case.
 
1 members found this post helpful.
Old 08-05-2013, 06:57 PM   #3
gn000we
Member
 
Registered: Jan 2003
Location: UK
Distribution: Ubuntu
Posts: 33

Original Poster
Rep: Reputation: 16
Colucix,

Thank you for your response.
It seems it was my fault . I think I find bash script hard to debug and read (not used to it!). I thought it was a string comparison issue. I saw the issue when I rewrote the script in python. For some reason the files and the csv names were different. I assumed the filenames were the same (had a '_' not a '.' in the '8.3+' part, e.g. A-File-for-checking-8.3+.zip was names A-File-for-checking-8_3+).
Quote:
Assumptions lead to major fsck ups
Still, I created a new Python script (good to reflex my Python mental muscles again ). It requires an additional mapping file (to convert csv names to the correct filenames).

I think due to my experience and background, it took me two days to fiddle the bash script to work for me, and it only took me two hours to knock this up:

Code:
#! /usr/bin/python
# use /usr/local/bin/python if actual path
'''
Use chmod to set the file permissions on your script to make it executable. If the script is for you alone, type chmod 0700 scriptfilename.py; if you want to share it with others in your group but not let them edit it, use 0750 as the chmod value; if you want to give access to everyone else, use the value 0755. For help with the chmod command, type man chmod.
'''
#import for csv access
import csv
#import os so you get the os access methods (directory listing)
import os

#need additional csv for file conversions i.e. ONLY need two columns INDIR,INCSV
#used for injection into csv array with correct filename
files2namescsv="FilesMatchToName.csv"
injectCorrectFilenameDict = {}

try:
	test = open(files2namescsv, 'r').read() #find the file

except:
	# if the file cant be found if there is an error
	print("Could not open the csv file (1)")
else:
	with open(files2namescsv) as f:
		rows = csv.reader(f, delimiter=',')
		next(rows, None) #skip header
		for row in rows:
			injectCorrectFilenameDict.update({row[1]:row[0]}) #CSV NAME AS KEY, THEN POINT TO FILENAME AS VALUE (TWISTS DATA FOR INJECT)

filenameArr = []
csvfile = "packages.csv"

try:
    test = open(csvfile, 'r').read() #find the file

except:
	# if the file cant be found if there is an error
	print("Could not open the csv file (2)")
else:
	with open(csvfile) as f:
		rows = csv.reader(f, delimiter=',')
		next(rows, None) #skip header
		for row in rows:
			filenameArr.append(row[0])

#get directory listing

# set a directory you're interested in
workingDir = r'packages'
directoryArr = []
# get a list of all the files in the directory
try:
	names = os.listdir(workingDir)
except:
	print("Unable to list files in directory")
else:
	# look at each file and print name
	for name in names:
		fileExt = os.path.splitext(name)[-1]
		#print name.replace(fileExt,'')
		directoryArr.append(name.replace(fileExt,''))
		#print fileExt

#inject correct filename for CSV names
print "INJECTION FOR CSV"
for index, item in enumerate(filenameArr):
	if item in injectCorrectFilenameDict:
		print "Convert %s to %s" % (item, injectCorrectFilenameDict[item])
		filenameArr[index] = injectCorrectFilenameDict[item]

#check
for item in filenameArr:
	print "First for %s the item is %s" % (csvfile, item)
	#check if in directory array
	if item in directoryArr:
		print "IN DIR"
	else:
		print "NOT IN DIR"

#check directory file names
for item in directoryArr:
	print "Second method for dir %s the item is %s" % (workingDir, item)
	#check if in csv file
	if item in filenameArr:
		print "IN CSV"
	else:
		print "NOT IN CSV"
Thanks for your help and time put into it.
 
  


Reply

Tags
bash scripting, comparison, string, symbol



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] How to compare a list of files in two directories: compare content and print size Batistuta_g_2000 Linux - Newbie 9 03-24-2013 07:05 AM
bash script: compare two group list Dr_Death_UAE Linux - General 7 09-03-2009 07:32 AM
bash scripting issue thesav Programming 7 06-20-2009 02:38 AM
Bash scripting, deleting an entry from a list Dee-ehn Linux - Software 2 04-06-2007 06:15 AM
Bash scripting problem: Can't get a list of all files, including hidden ones oxi Programming 24 03-12-2007 06:19 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 04:02 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration