LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Shell script to compare blocks of strings? (https://www.linuxquestions.org/questions/programming-9/shell-script-to-compare-blocks-of-strings-435211/)

bruno buys 04-14-2006 04:46 PM

Shell script to compare blocks of strings?
 
I am trying to figure how I can compare a block of strings and discard repeated ones, in a bash script.
Here's an exemple:

I am storing names of files inside a var:

FILES='file1.jpg file1.jpg file2.jpg file3.jpg file3.jpg'

I need a block of code that will look at this is and build a new list, discarding the repetitions of file1.jpg and file3.jpg, resulting

FILES2='file1.jpg file2.jpg file3.jpg'


Thanks for any inputs!

Dark_Helmet 04-14-2006 06:03 PM

I'm not at my Linux box. So I haven't tested this code (or verified its syntax). User beware...

Code:

#!/bin/bash

FILES='file1.jpg file1.jpg file2.jpg file3.jpg file3.jpg'
FILES_NEW=""  # Contains the no-repeat list
duplicate=0    # Set to 1 when a duplicate filename found - initialize to 0

# Loop through the original list of files to add each filename
# one at a time as we go through the list
for filename in ${FILES}
do
  # If our no-repeat list is empty, include the filename in the list,
  # otherwise, check for a repeat
  if [ -z FILES_NEW ] ; then
    FILES_NEW=${filename}
  else
    # Check for a repeat by using a nested loop. This one proceeds
    # through the no-repeat list and compares filename against each
    # item in the no-repeat list. If a match, set the duplicate flag to 1
    for existing_filename in ${FILES_NEW}
    do
      if [ "${existing_filename}" = "${filename}" ] ; then
        duplicate=1
      fi
    done

    # If the duplicate flag is not equal to 1, no filename repeat
    # was found. Add the filename to our list.
    if [ ${duplicate} -ne 1 ] ; then
      FILES_NEW="${FILES_NEW} ${filename}"
    fi

    # reset the duplicate flag to 0.
    duplicate=0
  fi
done

# Assign the no-repeat list to the original variable.
FILES="${FILES_NEW}"


bruno buys 04-15-2006 07:10 AM

Hey dark helmet!
The code simply works, very thanks friend!
This is going to simplify my life a great deal...

Gins 04-15-2006 08:28 AM

FILES='file1.jpg file1.jpg file2.jpg file3.jpg file3.jpg'

I need a block of code that will look at this is and build a new list, discarding the repetitions of file1.jpg and file3.jpg, resulting

FILES2='file1.jpg file2.jpg file3.jpg'
----------------------------------------------------------------------------

You want to store some files in a variable. The name of the variable is '' FILES ''.

Then you want to create a new variable. The name of that variable is '' FILES2 ''.

In the new variable, you will eliminate some files.

However, Dark Helmet has named it ''FILES_NEW ''. The name won't make a big difference.

Please tell me if my understanding is incorrect.
----------------------------------------------------------------------------------------------

if [ -z FILES_NEW ] ; then
FILES_NEW=${filename}
What is '' -z '' doing here

Dark_Helmet 04-15-2006 10:41 AM

The naming of the variable doesn't make a lot of difference. It can be "Files_New" or "Files2" -- just make sure the entire script is changed to be consistent.

The "-z" is a test for an empty string (man test for details). Basically, the test evaluates to true if there is nothing in the string. If the string has anything in it (even spaces), then the -z test will evaluate to false.

The idea is, if the string has nothing in it, there's no point to look through the list for duplicates; there's nothing there -- just add the filename and continue.

Gins 04-15-2006 11:05 AM

Dark Helmet

It seems you should not use the word '' test '' when writing a programme. It is a standard word in Scripting.

Am I wrong?

I looked it at man pages, as you have suggested.
------------------------------------------------------------------------------------------
FILES='file1.jpg file1.jpg file2.jpg file3.jpg file3.jpg'
Recently or rather almost 24 hours ago I had a big problem of not understanding the words 'backticks'.

In the above, are they simple single quotes or backticks?

Dark_Helmet 04-15-2006 11:23 AM

"Test" is accurate. Examining the /bin directory causes two things to pop out:
1. An executable file named "test"
2. A file named "[" which is a symbolic link to the "test" executable

In other words, in shell scripts, when you see "if [ ..." you are actually telling the script to execute the program "test" by using the "[" symbolic link. So saying the "test fails" or the like is accurate. Though, there are some who prefer different ways to reference things.

Back ticks and single quotes are very different. In the script above, they are single quotes.

Single quotes ('): a literal string - the string is typed in as-is, and the shell does no interpretation on its contents

Backticks (`): The string contains a command - the shell executes the command and substitutes the output of the command in place of the command itself in the script. In other words
VAR1=`date`
Would cause the shell to execute the date command, and substitute the output for the original command. That turns the above line into something like:
VAR1=Saturday April 15 ...
Then the shell performs the variable assignment.
NOTE: The modern way of handling command substitution is to abandon backticks altogether. I use the newer form: $( ... )

Double quotes: a semi-literal string - variable substitution is performed on the string. If the shell encounters any variable referenced with a $, it substitutes the variable's value. It also allows command substitution (in other words you can use backticks within a double-quote string).

Sorry if that doesn't clear it up or if you were familiar with it already.

Gins 04-15-2006 12:49 PM

On my system, I have a group of files under the name temp1, temp2, temp3, temp4, temp5, temp6. ....... temp31

I just want to test this program on those files.

--------------------------------------------------------------------



#!/bin/bash

FILES='temp1 temp2 temp3 temp4 temp5 temp6 temp7'
FILES_NEW="" # Contains the no-repeat list
duplicate=0 # Set to 1 when a duplicate filename found - initialize to 0

# Loop through the original list of files to add each filename
# one at a time as we go through the list
for filename in ${FILES}
do
# If our no-repeat list is empty, include the filename in the list,
# otherwise, check for a repeat
if [ -z FILES_NEW ] ; then
FILES_NEW=${filename}
else
# Check for a repeat by using a nested loop. This one proceeds
# through the no-repeat list and compares filename against each
# item in the no-repeat list. If a match, set the duplicate flag to 1
for existing_filename in ${FILES_NEW}
do
if [ "${existing_filename}" = "${filename}" ] ; then
duplicate=1
fi
done

# If the duplicate flag is not equal to 1, no filename repeat
# was found. Add the filename to our list.
if [ ${duplicate} -ne 1 ] ; then
FILES_NEW="${FILES_NEW} ${filename}"
fi

# reset the duplicate flag to 0.
duplicate=0
fi
done

# Assign the no-repeat list to the original variable.
FILES="${FILES_NEW}"

------------------------------------------------
The name is the file is 'unwanted7'.


[nissanka@c83-250-104-214 ~]$ chmod 755 unwanted7
[nissanka@c83-250-104-214 ~]$ ./unwanted7

Have I made a mistake?

Dark_Helmet 04-15-2006 01:06 PM

Well, I don't know what you're expecting to happen.

The script I wrote doesn't *DO* anything except to manipulate variables inside the shell script. You need to add more code to have the script actually do something with that information. As it is, the script doesn't change anything in the filesystem nor does it even echo any output to the terminal.

Just to make sure we're on the same page, the original poster had a problem: he wanted to remove multiple copies of text within a shell script variable. His FILES variable had a repeat of "file1.jpg" for instance. The code above just removes the repeat from the variable inside the script. Looking at your code above, your FILES variable has no repeats in it. So the script won't change anything. FILES will have the same value after the code as it did in the beginning.

Gins 04-15-2006 02:07 PM

Thank you Dark Helmet

I am learning with help from you all.

I will post more problems in order to learn Scripting.

bruno buys 04-15-2006 02:16 PM

Hope this will clear up: Gins, there's no point in running this piece of code against a group of files or dirs under the same dir, because the filesystem won't let you have multiple equal filenames. So, youŽll never have the problem the code was written to solve.
The way I am using it is cat'ing a file with references to files, like this:

FILES=`cat index.htm|grep *.jpg`

which yields

file1.jpg
file1.jpg
file2.jpg
file3.jpg
file3.jpg

because the html has several recerences to the same file.


then, if you guys are curious, I wget those files.
And, I said before, the code works perfectly.


All times are GMT -5. The time now is 05:53 PM.