LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-24-2014, 11:16 AM   #16
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918

thanks for providing, i just tried on a directory tree with about 2,000 images and i am getting those errors too. for you it mite be because of the # not sure what my issue is (also not sure how to correct) ?

i think the problem is my command chokes if any filename (image or not) has an apostrophe ' in it... i think it gets interpreted as a close quote.

Last edited by schneidz; 02-24-2014 at 02:30 PM.
 
Old 02-24-2014, 09:03 PM   #17
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Original Poster
Rep: Reputation: 24
Hmmm. Interesting possibility but I've scanned the file names of my images and not found any apostrophes. I really understand very poorly how tr works, but is it possible an apostrophe, under certain conditions (say, double-space), might be introduced by it? Thanks.
 
Old 10-30-2014, 02:55 PM   #18
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Original Poster
Rep: Reputation: 24
I found a better way, using $RANDOM (in place of $(date +%N)), to alter each symlink name so that files sharing the same name will not overwrite each other (credit to http://www.cyberciti.biz/faq/bash-sh...andom-numbers/). The latest incarnation of this one-liner is
Code:
find `pwd` -exec sh -c "file -i '{}' | grep image.*charset=binary$" \; -exec sh -c 'filename="${0##*/}"; ln -sf "$0" sym/`echo $((RANDOM%900+99))-$filename | tr " " "-"`' {} \;
I can't see why a 3-digit random sequence would not be sufficient for my project, despite the fact that I will be processing thousands of files; maybe a 2-digit sequence would even suffice. In any case, it's easy, by changing 900+99 to increase or decrease the number of digits to one's liking.

Last edited by jamtat; 10-31-2014 at 10:25 AM.
 
1 members found this post helpful.
Old 10-30-2014, 08:40 PM   #19
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,784

Rep: Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083
Why not just use the --backup=numbered option to ln?

As to to strange characters in files names, I would suggest trying to avoid the multiple layers of quoting involved when -execing sh.

Code:
find . | file --mime -0f - |
  grep --text 'image/[^;]*; charset=binary$' | 
  cut -d '' -f1 | 
  while read -r file ; do
     ln --backup=numbered -s "$file" "sym/$(basename "$file")" 
  done
 
1 members found this post helpful.
Old 10-31-2014, 10:22 AM   #20
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Original Poster
Rep: Reputation: 24
Thanks for the suggestion, ntubski. I have to say I like the $RANDOM solution better, since it prepends a set and limited number of digits/characters to the front of each file name, making it a simple matter of stripping those off to find the original file name. I'll test out your iteration to see whether it addresses the error message I was occasionally seeing when using the original script. I also discovered when testing out the original script again yesterday, that the image/ part of that iteration was producing some false positives, for example if a file or directory name had the word "image" in it, or if the file was an iso. Again, your input is much appreciated.
 
Old 10-31-2014, 11:14 AM   #21
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,784

Rep: Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083
Quote:
Originally Posted by jamtat View Post
I can't see why a 3-digit random sequence would not be sufficient for my project, despite the fact that I will be processing thousands of files; maybe a 2-digit sequence would even suffice.
It only depends on how many files with the same name you have. Just a warning, if you have several files all with the same name the Birthday "paradox" applies, so you may need more digits than you think.
 
Old 10-31-2014, 11:43 AM   #22
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Original Poster
Rep: Reputation: 24
Thanks for pointing out the birthday paradox, ntubski. A quick check reveals that I might initially be dealing with something like 15k-20k files (a number later to be drastically reduced once target files within that group have been identified). I'm not so proficient with mathematics, so I'll have to do some further investigation as to whether my 3-digit scheme would suffice, given the total number of files I'll initially be dealing with.
 
Old 10-31-2014, 11:53 AM   #23
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
i usually do something like this if i want something to be fairly unique:
Code:
fn=`date +%Y%j%H%M%S%N`
for shiggles: birthday heat map:
http://io9.com/how-common-is-your-birthday-512052896

how is it that july 2nd and 3rd are popular but the 4th of july is almost empty.
conversely feb 14 is popular but feb 13 and 15 not so much ?

Last edited by schneidz; 11-01-2014 at 09:39 PM.
 
Old 10-31-2014, 12:22 PM   #24
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Original Poster
Rep: Reputation: 24
This looks like a good alternative as well (replace 3 with some other numeral to decrease/increase the pool):
Code:
echo $(</dev/urandom tr -dc A-Za-z0-9 | head -c3)
 
Old 10-31-2014, 03:26 PM   #25
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Original Poster
Rep: Reputation: 24
Quote:
Originally Posted by ntubski View Post
Why not just use the --backup=numbered option to ln?

As to to strange characters in files names, I would suggest trying to avoid the multiple layers of quoting involved when -execing sh.

Code:
find . | file --mime -0f - |
  grep --text 'image/[^;]*; charset=binary$' | 
  cut -d '' -f1 | 
  while read -r file ; do
     ln --backup=numbered -s "$file" "sym/$(basename "$file")" 
  done
I've fiddled around with your script, ntubski, trying to get an idea of how it works. My first observation is that it does not produce valid symlinks, probably because it is not recording the full path. If I replace the period with `pwd`, however, I do get valid symlinks. I also managed to splice in my echo $(</dev/urandom tr -dc A-Za-z0-9 | head -c3) to see whether I could manage that and got it working. One drawback to your script, unlike a more recent iteration of the one schneidz contributed, is that it leaves extraneous spaces in file names: his latest variant replaces those spaces with dashes. I also need to get rid of hash symbols that appear in some file names, but so far I've managed to do that by running
Code:
for file in *; do mv "$file" `echo $file | sed 's/#/Num/g'` ; done
in the directory where the symlinks have been placed. Anyway, thanks for helping me get a better grasp on how to execute this project.
 
Old 10-31-2014, 09:30 PM   #26
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,784

Rep: Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083
Quote:
Originally Posted by jamtat View Post
I've fiddled around with your script, ntubski, trying to get an idea of how it works. My first observation is that it does not produce valid symlinks, probably because it is not recording the full path. If I replace the period with `pwd`, however, I do get valid symlinks.
Oh, I guess I didn't fully understand the layout of your files. When I tested here, the symlinks were valid.

Quote:
One drawback to your script, unlike a more recent iteration of the one schneidz contributed, is that it leaves extraneous spaces in file names: his latest variant replaces those spaces with dashes. I also need to get rid of hash symbols that appear in some file names,
I didn't realize that was part of the requirements, it's easily added.

Code:
find "$PWD" | file --mime -0f - |
  grep --text 'image/[^;]*; charset=binary$' |
  cut -d '' -f1 |
  while read -r file ; do
    ln --backup=numbered -s "$file" \
      "sym/$(</dev/urandom tr -dc A-Za-z0-9 | head -c3)-$(basename "$file" | tr ' ' - | sed 's/#/Num/g')"
  done
I left the --backup=numbered because it has no effect as long as there are no collisions.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Bash: prevent read to remove spaces killout Linux - Newbie 7 11-30-2012 05:46 AM
Need bash script to remove spaces and non alpha chars from folders/ files ne0shell Programming 6 06-22-2012 11:10 AM
[SOLVED] Bash: remove leading spaces with only expansions romagnolo Linux - General 15 02-13-2012 06:39 AM
Bash - to remove files with spaces in name iwitham Programming 4 01-23-2012 12:52 PM
Bash Command to Remove Spaces CincinnatiKid Linux - General 16 09-18-2010 09:46 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:58 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration