LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 10-17-2007, 10:03 PM   #1
BrianK
Senior Member
 
Registered: Mar 2002
Location: Los Angeles, CA
Distribution: Debian, Ubuntu
Posts: 1,334

Rep: Reputation: 51
tough one: how do you find patterns/sequences in file names?


say you have a dir with files:

file.1.foo
file.2.foo
file.3.foo
bar.100.gah
bar.101.gah
bar.102.gah
someFile1
otherThing2


... how would you go about finding that there are 2 sequences there and two files that are not part of any sequence? i.e.:

seq: file.#.foo 1-3
seq: bar.#.gah 100-102
sngl: someFile1
sngl: otherThing2

I have a couple really complex examples, but it doesn't seem like it should take over 300 lines of code to do this. Does anyone know of a good way to find this info? Some super cool regex or something?

Language doesn't matter much as long as it's nice & tidy. If I had my druthers, the answer would be in done in python, but I can translate if need be.
 
Old 10-17-2007, 10:20 PM   #2
ray_80
Member
 
Registered: Oct 2007
Posts: 75

Rep: Reputation: 15
Have you read the man page for grep?

man grep


If I understand your question correctly, then I believe that grep is what you are looking for.

Regards
 
Old 10-17-2007, 10:36 PM   #3
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
What about the base where you have something like this:
Code:
file.1.middle.3.end
file.1.middle.4.end
file.1.middle.5.end
file.2.middle.6.end
file.3.middle.7.end
What desired output do you have for this list?

Also, how would you handle something like this (where a number in a range is missing)?
Code:
file.1.end
file.2.end
file.5.end
file.6.end
 
Old 10-17-2007, 10:47 PM   #4
angrybanana
Member
 
Registered: Oct 2003
Distribution: Archlinux
Posts: 147

Rep: Reputation: 21
I really hate Perl, and I really suck with Perl.
That being said, here's my solution.

Code:
use strict;
use warnings;

my (%files, @info, $info, @single);
while(<>){
        chomp;
        s/([0-9]+)/#/g;
        $info = $files{$_} or ();
        @{$info}[0]++;
        if(@{$info}[1]){
                @{$info}[2] = $1;
        } else {
                @{$info}[1] = $1;
        }
        $files{$_} = $info;
        #print "$files{$_}->[0]"
}
for (keys %files){
        if ($files{$_}->[0]>1){
                print "seq: $_ $files{$_}->[1]-$files{$_}->[2]\n"
        } else {
                s/#/$files{$_}->[1]/;
                push @single, "sngl: $_\n";
        }
}
print $_ for @single;
I think this is what you wanted, though this is very basic and will have issues with some of the things matthewg42 mentioned (more then one number, missing numbers in set). oh...and It will garble up the name if '#' is part of the name :/. I know this is a very sloppy job..but it gets the job done to some extent. If you need a more complicated solution, provide more complicated examples.

Code:
$ cat list
file.1.foo
file.2.foo
file.3.foo
bar.100.gah
bar.101.gah
bar.102.gah
someFile1
otherThing2

$ perl findpattern.pl list
seq: bar.#.gah 100-102
seq: file.#.foo 1-3
sngl: otherThing2
sngl: someFile1

Last edited by angrybanana; 10-17-2007 at 10:50 PM.
 
Old 10-18-2007, 01:43 AM   #5
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
just one way out of the many with GNUawk
Code:
ls -1 | awk 'BEGIN{FS="."}
{
     x[$1]++     
     y[$1] = y[$1]","$2
     
}
END{
     for(i in x) {
	  if ( x[i] > 1 ){
	       for(k in y){
		    if( k==i){
			 sub(/^,/,"",y[k])
			 j=split(y[k],calc,",")
			 b=asort(calc,dest)
			 print "seq: " k,dest[1]"-"dest[b]
		    }
	       }
	  }
	  else{
	       print i
	  }
     }
}
'
 
Old 10-18-2007, 12:33 PM   #6
BrianK
Senior Member
 
Registered: Mar 2002
Location: Los Angeles, CA
Distribution: Debian, Ubuntu
Posts: 1,334

Original Poster
Rep: Reputation: 51
Wow, thanks guys! I wasn't expecting to get answers on this one.

In answer matthewg42 questions -
all the file1's are one group, the file2 & file3 are singles.
if a number is missing, you can consider it two sequences. If the report is on the smart side, it would report something like:

foo.#.end 1-3,5-6

but

foo.#.end 1-3
foo.#.end 5-6


is also acceptable

'#' is arbitrary - it could be anything.. '#' makes sense. '%04d' makes a lot of sense.

but yeah... these give me great starting points. Thanks!

Last edited by BrianK; 10-18-2007 at 12:34 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
linux find to find files with multiple patterns subu_s Programming 6 12-15-2010 12:15 AM
find command and file names with whitespace sir_woland Linux - General 8 10-04-2007 07:10 PM
Find patterns programatically ... Four Programming 1 02-16-2007 09:41 PM
Command to find similar file names in a directory Ottoguy Linux - Newbie 4 02-02-2006 05:42 AM
Find and special file names CritterZ Linux - Newbie 2 12-23-2004 09:49 PM


All times are GMT -5. The time now is 12:55 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration