LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 03-08-2009, 04:38 AM   #1
jefffq
LQ Newbie
 
Registered: Apr 2005
Location: Canada
Posts: 11

Rep: Reputation: 0
converting linux (reiserfs) data to NTFS (file names contain special characters)


What would be the easiest and safest way to convert a large volume of data from ReiserFS to NTFS? The file names often contain special characters not permitted under Windows. I want the file names to be adjusted as needed so that it all works in Windows. The new copy of the data will only be used in Windows.

I thought there would be a Linux utility for stuff like this somewhere but I can't find one. The data has filled up an external hard drive that is ReiserFS formatted. The only computer I have available right now is running Windows with only NTFS formatted hard drives. So I guess I need to copy the data from the external ReiserFS drive over to some free space on one of the NTFS drives. Or if there is a way to convert the external drive that is DEFINITELY SAFE?

I can boot to Linux with a live CD. I do have an old computer, currently sans hard drive, that I could also use to run Linux while leaving the other computer running Windows.
 
Old 03-08-2009, 05:04 AM   #2
openSauce
Member
 
Registered: Oct 2007
Distribution: Fedora, openSUSE
Posts: 252

Rep: Reputation: 39
I don't think there's any way to convert one filesystem to another in-place. Maybe you could do ext2 -> ext3 but I'm not even sure about that.

You'll need to boot your Windows box with a live CD and copy everything across. Because of the special characters, you probably want a small script - you can use the tr command or bash's string substitution syntax, e.g. to replace '?' with '-':
Code:
new_filename = $(echo $old_filename | tr ? -)

# or this (probably quicker if you've got a lot of these):
new_filename = ${old_filename//\?/-} # double-slash after old_filename to replace all occurrences, not just one. \ before ? because ? is a special character

cp $old_path/$old_filename $new_path/$new_filename
If you've got a lot of files, it's probably quicker to rename them all first and then copy them all at once. And of course you might get filename collisions this way - it's up to you how smart you want to make your script to try and avoid them, all depends on what the files are called at the moment. Don't forget also that NTFS is not case-sensitive - I don't know whether ResierFS is, but if it is there's potential for collisions there as well.
 
Old 03-08-2009, 03:07 PM   #3
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
There are mount options dealing with encoding options. There are also some characters that need to be avoided in filenames which might be legal in Linux but not in Windows & vice versa. On example is `:' which is used with ntfs streams. They were intended for use in Mac metafile support using an NT server and pretty much forgot about until someone scanning a hard disk found that they were used to hide malware from virus scanners.
Quote:
Originally Posted by ntfs-3g manpage
Alternate Data Streams (ADS)
NTFS stores all data in streams. Every file has exactly one unnamed data stream and can have many named data streams. The size of a file is the size of its unnamed data stream. By
default, ntfs-3g will only read the unnamed data stream.

By using the options "streams_interface=windows", you will be able to read any named data streams, simply by specifying the stream's name after a colon. For example:

cat some.mp3:artist

Named data streams act like normals files, so you can read from them, write to them and even delete them (using rm). You can list all the named data streams a file has by getting the
"ntfs.streams.list" extended attribute.
Quote:
Windows Filename Compatibility
NTFS supports several filename namespaces: DOS, Win32 and POSIX. While the ntfs-3g driver handles all of them, it always creates new files in the POSIX namespace for maximum portability
and interoperability reasons. This means that filenames are case sensitive and all characters are allowed except '/' and '\0'. This is perfectly legal on Windows, though some application
may get confused. If you find so then please report it to the developer of the relevant Windows software.
Quote:
locale=value
This option can be useful if your language specific locale environment variables are not set correctly or at all in your operating system. In such cases, the national characters
can be made visible by using this option. Please see more information about this topic at http://ntfs-3g.org/support.html#locale
Quote:
Originally Posted by mount manpage
Mount options for ntfs
iocharset=name
Character set to use when returning file names. Unlike VFAT, NTFS suppresses names that contain unconvertible characters. Deprecated.

nls=name
New name for the option earlier called iocharset.

utf8 Use UTF-8 for converting file names.

uni_xlate=[0|1|2]
For 0 (or `no' or `false'), do not use escape sequences for unknown Unicode characters. For 1 (or `yes' or `true') or 2, use vfat-style 4-byte escape sequences starting with ":".
Here 2 give a little-endian encoding and 1 a byteswapped bigendian encoding.

posix=[0|1]
If enabled (posix=1), the file system distinguishes between upper and lower case. The 8.3 alias names are presented as hard links instead of being suppressed.

uid=value, gid=value and umask=value
Set the file permission on the filesystem. The umask value is given in octal. By default, the files are owned by root and not readable by somebody else.
Using utf8 & nls= may help.

Also look at the iconv program. That may be more useful than `tr'. You may also need to change the encoding used in konsole or konqueror.
One thing you can do is precede a command with an appropriate LS_LOCALE=<value> assignment. One poster was having a problem entering the password for an encrypted pdf file. Assigning the local variables before the program command (on the same line) fixed his problem.

If utf16 is used, the problems can become monsterous. Then you have to deal with the endianess of the system that create the file & filename. The BOM (byte order marker) at the start makes text into an opaque binary. Php initially didn't support the utf standard because the libraries weren't modularized at that time, and would have caused a 5 fold increase in code size.

Last edited by jschiwal; 03-08-2009 at 03:11 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
removing file with special characters rsashok Linux - General 4 02-09-2010 04:08 PM
Special characters in reiserfs brgsousa Linux - Software 9 02-27-2009 03:13 PM
Special File Names alkos333 Slackware 3 12-31-2008 01:01 AM
How to remove file with name containing only special characters abhisheknayak Linux - Newbie 5 07-04-2008 10:53 AM
copy data from reiserfs to ntfs liljhand Linux - General 3 01-23-2005 11:23 PM


All times are GMT -5. The time now is 05:56 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration