LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   create file list: SED inline vs SED standalone, enormous speed difference (https://www.linuxquestions.org/questions/linux-newbie-8/create-file-list-sed-inline-vs-sed-standalone-enormous-speed-difference-4175475456/)

Corsari 09-01-2013 04:32 AM

create file list: SED inline vs SED standalone, enormous speed difference
 
Hello to all the community

I've created this one line command, script, to create files lists, that works, though

Code:

ls -R1 /rootpathname/ | while read l; do case $l in *:) d=${l%:};; "") d=;; *) echo "$d/$l";; esac; done > /tmp/filelistname.txt
since I do use it with disks mounted under /media (ubuntu), the files-lists increases in size since at the begin of every line, there is the
Code:

/media/volumename
string

so I've added some sed, but the speed slowed down in a terrible manner, hundreds times slower

here it is the code, though... and it works

Code:

ls -R1 /media/MAC01/ | while read l; do case $l in *:) d=${l%:};; "") d=;; *) echo $d/$l | sed 's/\/media\/MAC01//';; esac; done >  /tmp/MAC01-file-list.txt
but this command will take ages against doing it into two steps

Code:

ls -R1 /media/MAC01/ | while read l; do case $l in *:) d=${l%:};; "") d=;; *) echo "$d/$l";; esac; done > /tmp/MAC01-file-list.txt
followed by

Code:

sed 's/\/media\/MAC01//' /tmp/MAC01-file-list.txt > /tmp/MAC01-file-list-cleaned.txt
literally some seconds against many minutes (this drive contains more than 400.000 files, which mean that its files lists' length, is more than 400.000 lines)

Do any of you have an technical explanation about this enormous speed difference?

Have I placed the sed command in a wrong position?

Thank you for hinting

Cor

Corsari 09-01-2013 04:56 AM

Maybe the command substitution technique could help increasing the speed,

since they write it extracts the stdout of a command, then assigns it to a variable using the = operator.

but I don't find any reference that I could understand to apply it to the above "slow" script.

GazL 09-01-2013 10:17 AM

Seems like a awfully complicated way of producing a file list.
Can't you just use 'find'?

Code:

find "/media/MAC01/" -type f -printf "%P\n" | sort > filelist.txt
(the %P format code will strip off the /media/MAC01/ prefix, so no need for any sed'ing)


The reason your code is so slow is because you placed the call to sed within the body of the loop, so you're asking it to start 400,000 instances of sed, one after the other.

Corsari 09-01-2013 04:59 PM

thank you GazL

it works and it is fast

searching I've also found this command

Code:

find . > filelist.txt
it must be run from the root of the tree you want to create the filelist.

to only count the files in a tree , the output could be piped to wc, like this find . | wc -l

Cor

grail 09-02-2013 03:01 AM

Please remember to mark as SOLVED once you have a solution


All times are GMT -5. The time now is 08:16 PM.