Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Distribution: debian on servers and embedded, kubuntu elsewhere
Stupid file naming - should I surrender?
I'd like to hear from you sys admins out there: Am I a too cranky old man, or am I not alone?
For many years I was able to force users (through terrorism, threatening, and crontab automatic renaming) to use sane dir/file names in the server (linux samba server, windows clients), but as the company grew larger I feel I'm losing the battle here (note: automatic file renaming was dropped years ago when we started having filenames referenced by the ERP, CAD linked parts, etc.)
If before I managed to keep them using basically [a-zA-z0-9_.] shorter names, I'm right now looking at a path/file in the server with this name (lengthy first part of path removed for sanity):
.../PLANTA DOS RASGOS COM OS RACKS POSICIONADOS 130611/Posicionamento dos Racks (2)REVISADO MONICA 90611 - vistas de fixação no piso e teto2.DWG
Which in a rough English translation (we're in Brazil) would be:
.../OPENINGS BLUEPRINT WITH THE RACKS POSITIONED 130611/Positioning of the racks (2)MONICA REVISED 90611 - views of the floor and ceiling attachments2.DWG
(hey, who needs document management if you can write the whole file story in its name?)
Some other fine examples of user file names:
(looks like a copy paste of an email attachment description)
.../CARRETEL ACO MTD Ø400 X Ø306 X 150MM.odt
(yeah, those are "diameter" symbols there, not zeroes)
.../Boleto 3° par ref a NF 2060-1.pdf
(exclamation point!! prolly saved from the ERP)
.../HDTM10C Sell & Spec Sheet.pdf
(surely someone saving from a escaped url and not having the trouble to READ/fix the saved filename)
Note my lovely users have no problem using all possible available characters in their keyboards (and if not available, they'll use ALT+XXX, or copy&paste from a word processor, to add the degree, diameter, etc, symbols) for filenames.
I'm very thankful to windows' limit of about 260 chars for path/file names, because users were already complaining that I (or "that linux thing") was preventing them to use all their file naming creativity - So I could blame windows on that. If it was not for that, I'm sure the morons would be writing whole documents on the path/filename instead of INSIDE the file (and probably asking me "why can't we use pictures and tables and formatted text in the filenames? Why did you block pasting videos in the filenames?").
Of course over the years I had to become very aware of codepage converting, escaping, and code injection, tweaking all my programs and scripts which deal with filenames (including some document management integrated with the database) to support the most unthinkable namings. In our environment (Brazil), users can come up with external files with names using CP437, CP850, ISO-8859-1, UTF-8, and more.
So, am I just too cranky and this is a lost/futile battle?
Do you have, or foresee problems with "anything goes" file naming? Give me a reason, and I'll keep fighting, or I'll just give up...
For many years I was able to force users (through terrorism, threatening, and crontab automatic renaming) to use sane dir/file names in the server
What about augmenting that "terrorism and threatening" ? LOL
I really wish I can help. So I just post something to refresh this thread hoping some guru will give an eye to help you. This issue is very interesting and I am interested of the right answer too.
I do not believe in augments. I would suggest you to make those (a few) files lost and promise long time to restore because of some unstable hardware/software. They will definitely ask you how can they avoid such things to happen.
You could try this:
1 Write a script to automatically convert any filename that doesn't meet your approval to a random 32-bit hex number.
2 Make a file with new name/old name. Post one copy at long intervals on a notice board behind a locked glass screen, so it can't be photocopied, preferably too high for comfortable reading.
3 Get some life assurance. (Perhaps this should be step 1)
Distribution: debian on servers and embedded, kubuntu elsewhere
I'm starting to think we're all cranky oldtimers :-) Maybe catkin is right, and the battle is lost.
There were real cases where problems happened due to the odd filenames, for example files that could not be 'rolled back' to their many-days-ago versions because my 'snapshot/compress/hardlink' backup script (http://www.linuxquestions.org/questi...upport-938090/) still didn't cover all user creativity.
Even then the user's conclusion is not that:
"Gee, I was warned that these things could happen and it was true, I should learn and change."
"Nice, I can blame my delayed job on the IT guy, because he can't deal with my filenames."
And unfortunately they are not 100% wrong (on an extreme, we could end up forcing them to use ascii-only emails, ftp instead of MIME attachments, and who knows, send screenshots as .jpg instead of bitmaps pasted into a .doc file). Unfortunately (?) all the user-friendliness allow users to do things that are less and less efficient and more prone to vulnerabilities/trouble. On one hand we have an urge to educate them, but on the other hand, we end up hurting "user friendliness".
There was a time I'd recommend people (in Brazil) not to use accented characters in their emails (resulting in 'wrong' Portuguese), because in most of the times the text would arrive weird on the other side (email systems couldn't deal as well as today with codepage conversions). At one point, this became (almost 100%) resolved, so I stopped recommending that.
So this is why I'd like to learn if the cause (against "anything goes" file names) is still a good one to keep fighting for.
Some of the arguments I use so far (some apply to funny characters, some to long path/file names, some to both):
- If you upload the file to the web server then send the link for the customer to download, customer might have trouble accessing the link due to crazy characters/wrong codepage/anti-injection protection/blank spaces, long urls that get splitted in multiple lines, etc.
- If you IM or email a link for a file in the server (windows UNC) to a coworker, clicking the link might not work without manually copy-pasting it on explorer due to the same reasons.
- Special characters may cause problems accessing the file when transferred or accessed from a PC with different language/codepage settings (all our PCs use English Windows XP/Vista/7, but we're in Brazil, so customers use Portuguese Windows PCs)
- If you try to put a lot of information in the filename instead of using a logic, organized way of storing and keeping track of them, you end up in a crazy mess (for example, use a version control system instead of adding "Jane Doe reviewed this file last Friday and wants some changes" to the filename)
- Windows has a ~260 length limit for path/filename (my users have hit this limit already), so they should be more frugal/efficient with naming (I hope this limit does not go away; I had to show them a longer path/filename on the server to "prove" them that limit was not in "my linux server")
- Problems with backup/document management/automation scripts/programs (these ones they see, and I agree, as something I should fix and deal with - All has been working for many years now, but someone may come up with a "new" unexpected way of breaking them, and new scripts/programs always need special extra work to 'sanitize' filenames - but then, it's our job to user-proof everything...)
I can sympathize with your problem. I will download a zip file with videos, expand it, and the filenames have backslashs in them. They are promo's for Charter. You would think they would know better. I have to rename all of them them. Videos with `#' characters in the filenames can't be previewed in the transcoder because it uses quicktime which won't load them.
Maybe if you called them up and had them read you the directory and file name, letter by letter, it would provide necessary negative re-enforcement. "Sorry, was that an octothorpe. I can't CD to that directory."
I'm sorry to have no solution to point out, but could you try with some "script" based on inotify (incrond) which will check filename length and characters used ?
Then simply reject that file with some kind of message to user ?
I don't know, it's a basic idea, but usually simple things require a lot of effort.
My own opinion is that the battle is lost, users are now so used to using their own (weird!) file naming and filing conventions, and if that works for them and your file system supports it then let them get on with it. If you use a script to rename files then you're going to have to communicate the renaming to your users, do you really need the grief? Do you really need someone to e-mail out "Request For Pricing - Phone System.doc" and then have someone reply with an update and them having to work out you've called the document "/some/other/path/RFP/PhoneSystem.doc" when they want to open it for a read?
Just my 2 cents worth, your mileage may vary, etc. etc.
I'm of two minds about this. Part of me agrees with TenTenths that it's a hopeless cause, and if the system supports it then the users will inevitably do it. But I also believe that it's your duty as the sysop to maintain some sense of rationality and control.
I think that, if I were in your position, I would institute a few "reasonable" and easy to remember rules and enforce them to the best of my ability. Something like "no spaces, no characters not found on the keyboard (or at least not basic to your language), and a 200ch hard length limit", would probably be easy enough for the users to accept, as long as they were otherwise free to work out their own naming conventions. Feel free to supply some recommended naming practices, though.
Just be sure to clearly communicate your rationale for implementing it (like you did here, but less technically), and enforce it firmly and fairly. Demonstrate that you are the authority, and that you're doing this for good technical reasons, not just to be the BOFH, and in the end most users should accept it. Don't just dictate to them, educate them.
Likely a simple renaming script and nightly cron-job would be enough for any holdouts to eventually get the hint.
Distribution: debian on servers and embedded, kubuntu elsewhere
Thank you all for your comments so far.
I did have a cron job in the past which would sanitize/rename files; I gave up on it when CAD (Autodesk Inventor) user number and usage grew. Some of our projects have 10000+ parts, and Inventor may use links to part files.
So although the designers were well aware of my file naming recommendations and reasons, eventually one would come up with some 'funny' directory and/or filename, and use those in an Inventor part or assembly. This would quickly cascade to a lot of links, and renaming them would mean trouble (when opening an assembly, the user is asked to provide the location for missing part files, one by one). Unfortunately the general understanding is that "IT caused the problem".
Blocking the actual bad file name creation as suggested could be a good option, but I'm afraid checking every file creation/renaming would have a performance hit, specially for the CAD, manipulating thousands of files in the server (loading/saving projects is already a critical bottleneck). Additionally, as far as I think I understood how Inventor works, you "checkin/checkout" projects to the local hard disk, so there could be problems if the user created weird names/links in the local disk, then tried to check them to the server at the end of the day.
So I think we really lost the battle. As TenTenths pointed out, if the filesystem and apps allow the users to be "creative", then let it be. The burden is on us to fix any problem/compatibility that this may cause - and explain when it can't be fixed ("the link you emailed Mr. PHB does not work for him because it has blank spaces and was broken in 2 lines. Either rename the path/file, or explain him he has to copy/paste to explorer. No it's not because we use linux in the server, no I won't give you Outlook nor an Exchange Server.")
It's just one more inevitable advancement of "user friendliness" reducing efficiency and reducing the minimal requirement for user awareness of how things work.
Thank you all again for your input! I surrender...