-   Programming (
-   -   shell script question (

psynce_friction 11-18-2009 07:36 AM

shell script question
hello all

I have a tiny script which I am trying to get working. It takes a list of file names e.g XXXXXXXXXXXXXX_right, XXXXXXXXXXXXXX_left

The script reads through the list strips the _left and _right identifies uniq names and prints a list of unique names with the _left, _right reattached
Do to this i have

for i in $(cat input.file|sed 's/_[a-z]*//' |sort |uniq -c |grep -v '2 ');
do grep $i input.file ;done

this should produce a list of approx 14600 file names however what i actually get is a list of 223512606 file names

can anyone see where i have gone wrong?

Thanks in advance

indiajoe 11-18-2009 08:43 AM

Hint: You did the loop for each 14600 files 14600 times. The number of files you got is square of the number of files.
The problem is in how piping works.

Disillusionist 11-18-2009 08:49 AM

Have you thought about using a while loop?


$(cat input.file|sed 's/_[a-z]*//' |sort |uniq -c |grep -v '2 ')|while read i
 grep $i input.file

indiajoe 11-18-2009 09:00 AM

Just check whether the following script will work.

for i in $(cat input.file|sed 's/_[a-z]*//' |sort |uniq -c |grep -v '2 '| cut -b 9-); do grep $i input.file ;done

Replace 9- with the number of characters you want remove to get the file name without any white spaces or number.
Somebody pls point out a neater way of doing the above step without all pipings. There should be a better way.

psynce_friction 11-18-2009 09:18 AM

Hi indiajoe

ran your script and it says
zsh: argument list too long: grep

doing some error checking and i ran

head step2b.out|sed 's/_[a-z]*//' |sort |uniq -c |sed 's/\t1//' |grep -v '2 '

which produces an out put

1 FE4758201C23O2
1 FE4758201CV20O
1 FE4758201D3YER
1 FE4758201DCUH9
1 FE4758201DN2UI
1 FE4758201EFEIT
1 FE4758201EKY4O
1 FE4758201ENBE3
1 FE4758201EQESX
1 FE4758201EUG4B

is the space and one before the file name causing a problem. I have tried to add another grep cmd to remove it but can't get it to work

psynce_friction 11-18-2009 09:28 AM

yep that was it thanks all much appreciated

indiajoe 11-18-2009 09:48 AM

Sorry for the mistake.
I have corrected it. Please post back if you get a better method.

ghostdog74 11-18-2009 10:12 AM

how does your input file look like? what should your output be?

Disillusionist 11-18-2009 12:32 PM

use sed:


sed 's/[1-9] //'

use the -u option on sort.

for i in $(cat input.file|sed 's/_[a-z]*//' |sort -u| cut -b 9-); do grep $i input.file ;done

psynce_friction 11-20-2009 09:27 AM

as i mentioned there was some space and a 1 followed by a spave before the file name i.e. -----1-filename and the cript was being run for each word in the line. I included sed 's/ 1 //'to get rid of this and it worked fine

for i in $(cat|sed 's/_[a-z]*//' |sort |uniq -c |sed 's/ 1 //' |grep -v '2 '); do grep $i ;done



Disillusionist 11-21-2009 05:26 AM

The spaces followed by the number and then a space is because you are using uniq -c

If you removed the uniq -c and used sort -u, you would not then need to change the output.

All times are GMT -5. The time now is 08:40 AM.