[SOLVED] sequential : how to find the missing numbers within a sequence of files that have sequential numbers attached to them?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
ok I see where you're coming from the logic behind that. as I couldn't understand the code. hehe
999 being the highest set or 3 digits.
so just off the top of my head this is interatated as
Code:
{num:03d}
3 digits leading with zero's?
Yes. Well 1 is represented as 001, 456 is represented as 456.
However, it occurs to me this wasn't a very flexible way of doing this! I changed a line:
Code:
from
full_range = set('{num:03d}'.format(num=num) for num in range(1, max_range + 1))
to
full_range = set(str(num).rjust(len(str(max_range)), '0') for num in range(1, max_range + 1))
To explain this line:
It creates a set of strings that is right justified to the length of the max_range (which is detected by finding the maximum value discovered by regex. For example file-9999.ext or file-9999.adsghsdfg.ext.
So If the max found is 99999
instead of printing:
1
it prints
00001
It does it for each value using a concept called comprehensions in python. You can be incredibly expressive with them! It would be similar to writing in english
for num in range 1 to 99999
create string num. If string num is less then 5 characters long, pad enough to make it 5 long.
In this case, create 00001.
Or written out in longer python
Code:
# full_range = set(str(num).rjust(len(str(max_range)), '0') for num in range(1, max_range + 1))
full_range = set()
for num in range(1, max_range + 1):
string_width = len(str(max_range)
justified_string = str(num).rjust(string_width, 0)
full_range.add(justified_string)
Yes. Well 1 is represented as 001, 456 is represented as 456.
However, it occurs to me this wasn't a very flexible way of doing this! I changed a line:
Code:
from
full_range = set('{num:03d}'.format(num=num) for num in
range(1, max_range + 1))
to
full_range = set(str(num).rjust(len(str(max_range)), '0') for num in range(1, max_range + 1))
To explain this line:
It creates a set of strings that is right justified to the length of the max_range (which is detected by finding the maximum value discovered by regex. For example file-9999.ext or file-9999.adsghsdfg.ext.
So If the max found is 99999
instead of printing:
1
it prints
00001
It does it for each value using a concept called comprehensions in python. You can be incredibly expressive with them! It would be similar to writing in english
for num in range 1 to 99999
create string num. If string num is less then 5 characters long, pad enough to make it 5 long.
In this case, create 00001.
Or written out in longer python
Code:
# full_range = set(str(num).rjust(len(str(max_range)), '0') for num in range(1, max_range + 1))
full_range = set()
for num in range(1, max_range + 1):
string_width = len(str(max_range)
justified_string = str(num).rjust(string_width, 0)
full_range.add(justified_string)
Passing function call return values in prams is always cooler
I decided to make a small project out of that code (referenced here) and implemented it using classes.
I simplified the regex a bit, to just (\d+) so that it matches the first sequence of digits. I suspect this is pretty common. 001.ext is fine, so is file001.ext.
Be careful having things like 001.ext and file001.ext in the same directory however, since that's now effectively the same thing
It's now quite usable as a cli tool.
You can do tricky stuff like
Code:
$ echo './' | ./SequencyConsistency.py
Missing from /foo/bar/numbered_testing:
0006
7829
Code:
find . -type d | ./SequencyConsistency.py
Missing from /foo/bar/numbered_testing/4527_a:
945
Missing from /foo/bar/numbered_testing/4526_a:
159
Code:
./SequencyConsistency.py */
Missing from /foo/bar/numbered_testing/4527_a:
945
Missing from /foo/bar/numbered_testing/4526_a:
159
I decided to make a small project out of that code (referenced here) and implemented it using classes.
I simplified the regex a bit, to just (\d+) so that it matches the first sequence of digits. I suspect this is pretty common. 001.ext is fine, so is file001.ext.
Be careful having things like 001.ext and file001.ext in the same directory however, since that's now effectively the same thing
It's now quite usable as a cli tool.
You can do tricky stuff like
Code:
$ echo './' | ./SequencyConsistency.py
Missing from /foo/bar/numbered_testing:
0006
7829
Code:
find . -type d | ./SequencyConsistency.py
Missing from /foo/bar/numbered_testing/4527_a:
945
Missing from /foo/bar/numbered_testing/4526_a:
159
Code:
./SequencyConsistency.py */
Missing from /foo/bar/numbered_testing/4527_a:
945
Missing from /foo/bar/numbered_testing/4526_a:
159
rather interesting: I do suppose that one you or another qualified individual verse in python could even expand on that same code.
Leaving that as a default and adding cli arguments for name patterns to eliminate the glitch of 001.ext and Name001.ext and Name-001.ext as being seen as the same thing while running it though so one can not have that problem.
let me go copy paste that and give it a go said the bear that climb the hill.
Code:
using the ole' search path piped to local ./dir script method I see yes?
echo './' | ./SequencyConsistency.py
No 3 - now I got a install that just to check this?
Don't feel too rushed :P python3 has been out since 2008 and python2 EOL is in 3 years (2020)
But the hiccup seems to be with a print statement:
Change the below bolded part and it seems to run ok:
Code:
print(err, file=sys.stderr)
print(err)
However missing directories or listed files will probably log a error into stdout and not stderr
Quote:
rather interesting: I do suppose that one you or another qualified individual verse in python could even expand on that same code.
Leaving that as a default and adding cli arguments for name patterns
Interesting idea, that wouldn't be hard to put in.
then awk and sed and find was invented... or things like egrep - pgrep which are honed down for just specif needs.
so I wonder why Slack don't ship with it. Not that i use it or perl for that matter. I decided BASH was enough for what I do.
awk, sed and find are all great (I use find a lot). You almost certainly use python however. Try running grep -IE '#!.*python' /usr/bin/* -m1 to examine many python scripts you're already using
Bash is great, but its specialty is interacting with the operating system. Converting files, downloading files, permissions. It's really good for doing simple jobs efficiently. I have over a hundred of one liner functions.
IMO, don't bother learning another language until (if) you notice bash feeling "weak" for a job. I used bash solely for 4 years and managed fine.
However, learning python (still very new) has shown it has a nice similarity in unix, most "modules" does its job only and does it well.
Parsing arguments is pretty easy in python. Feel free to trial it. Run it with -h to see the options. Also, it should now work fine in python2.7 thanks to the futures import (from __future__ import print_function)
If you want to easily use it (without installing python3), you can change python3 to python in #!/usr/bin/env python3 in a file called SequencyConsistency.py, run chmod +x SequencyConsistency.py and move it to /usr/local/bin/SequencyConsistency.
Then you should be able to just call it: $ SequencyConsistency.
awk, sed and find are all great (I use find a lot). You almost certainly use python however. Try running grep -IE '#!.*python' /usr/bin/* -m1 to examine many python scripts you're already using
Bash is great, but its specialty is interacting with the operating system. Converting files, downloading files, permissions. It's really good for doing simple jobs efficiently. I have over a hundred of one liner functions.
IMO, don't bother learning another language until (if) you notice bash feeling "weak" for a job. I used bash solely for 4 years and managed fine.
However, learning python (still very new) has shown it has a nice similarity in unix, most "modules" does its job only and does it well.
Parsing arguments is pretty easy in python. Feel free to trial it. Run it with -h to see the options. Also, it should now work fine in python2.7 thanks to the futures import (from __future__ import print_function)
If you want to easily use it (without installing python3), you can change python3 to python in #!/usr/bin/env python3 in a file called SequencyConsistency.py, run chmod +x SequencyConsistency.py and move it to /usr/local/bin/SequencyConsistency.
Then you should be able to just call it: $ SequencyConsistency.
yeah regex is something I should work on -- then I see it is different depending on what one is using? perl has its regex for perl and now you're telling me python has its regex for it so indicates a difference between the two without examining btw.
indicating no standardization which it should be if not. Just makes life easier is all.
and yes so far BASH has been just fine - someone just suggested to show me how to get this question done using perl and if it works why be prejudice against it?
then came curiosity and I decided that looking at Perl and doing little easy things like creating a dir and sub dir all at once perl nope, not as easy, searching a dir and its sub dir nope not as easy,
not something as simple as find / searches everything where perl will not look into the sub dir'es at all.
for me to take a bash script that already works just like I told it to and to just re do it in perl just to see what I could see when I already have a fully working script - too much - I'd have to add two or three more lines of code to do what I did with one line of code in BASH.
learning curve too big of a headache for me to be honest. It to me is not worth figuring out how to do it in perl when I am already doing it just fine in BASH.
It is not like I am going to get a job doing this so that too is no incentive to learn a different programming lang. just to know it.
regex is standardized.. but each "flavor" has its own notations that do a little extra.
After all, why doesn't everyone write in sh instead of bash? Cause bash can do more, even if it's less compatible.. giving it a "flavor"
Regex permeates into almost everything and can make learning tools like vim easier. I would suggest learning the basics. There are many options. Ignore advanced ones (avoid overloading information about specific flavors) and regex will make a lot more sense.
Luckily, writing regex is a lot easier then reading it :P You're thinking -> Ok, I want to match a line starting (^) with digits ([0-9]) that will occur at least once (+), then a set of alphanumeric characters ([a-zA-Z0-9]) that will occur at least once (+), then I want to match a period but because a . also means "any character" I'll place it in a set ([.]) and finally the extension of alphanumeric characters ([a-zA-Z0-9]) between 1 and 4 characters long ({1,4}). Then make sure this is the end of the line ($)
So you end up with '^[0-9]+[a-zA-Z0-9]+[.][a-zA-Z0-9]{1,4}$'
Or in python regex '^\d+\w+[.]\w{1,4}$'
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.