[bash] find - filter files matching a list of files
Scenario
A folder contains about 290,000 html files, all residing in monthly subfolders and following the same naming convention. Id's are numeric and of varying length. Code:
data/<yyyy>/<mm>/event_<id>.html Code:
12345 First attempt time: 0.44s Code:
find data/ -type f |grep -F -f <(sed -r 's/(.*)/_\1./' idnew.log) I think this is pretty fast. I am just wondering whether there is a more elegant way of achieving the result, particularly considering that I need to return all files in data/ if idnew.log does not exist: The only solution I can think of: Code:
find data/ -type f >fileall.log |
If you don't need to generate fileall.log (if you do, just call tee):
Code:
find data/ -type f | { |
Quote:
Funny, I had tried something similar but it did not work with the process substitution: Code:
if [[ -e idnew.log ]] ; then Also, are there alternatives to my grep <(sed ...)? |
Quote:
|
Code:
find ... |grep -F -f <(awk '{print "_" $0 "."}' idnew.log) Is there a way grep can be made recognize any non-numeral as word boundary (a bit like the shell's $IFS)? Then the list could be used as is. |
Quote:
Code:
find data/ -type f | eval $FILTER > filesome.log |
I am using sed/awk to isolate the id in in the filename to exclude part-matching id's.
If grep's word boundaries could be redifined (in this case to non-integer), this would be superfluous. Is there something like $IFS for grep? |
All times are GMT -5. The time now is 10:10 AM. |