Scenario
A folder contains about 290,000 html files, all residing in monthly subfolders and following the same naming convention. Id's are numeric and of varying length.
Code:
data/<yyyy>/<mm>/event_<id>.html
I file
idnew.log contains 20,000 id's, one per line:
Code:
12345
3456
39999
399999
I need to build up a list of files matching this list of id's.
First attempt time: 0.44s
Code:
find data/ -type f |grep -F -f <(sed -r 's/(.*)/_\1./' idnew.log)
I use
sed to prefix and suffix the id:
_<id>. to ensure only filenames containing the whole id are found.
I think this is pretty fast. I am just wondering whether there is a more elegant way of achieving the result, particularly considering that I need to return
all files in
data/ if
idnew.log does not exist:
The only solution I can think of:
Code:
find data/ -type f >fileall.log
cp fileall.log filesome.log
if [[ -e idnew.log ]] ; then
grep -F -f <(sed -r 's/(.*)/_\1./' idnew.log) fileall.log >filesome.log
fi
Well, this is pretty ugly!