Transferring legacy data from NTFS to ext4: is there inherent metadata loss by changing filesystems?
Linux - DesktopThis forum is for the discussion of all Linux Software used in a desktop context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Transferring legacy data from NTFS to ext4: is there inherent metadata loss by changing filesystems?
I need to switch 20+ years worth of highly sentimental data from its native NTFS to ext4, for a variety of performance and practical reasons. I'm worried about potentially losing data stored in the filesystem header and/or any other places where metadata may be stored in NTFS.
This data has been moved across NTFS volumes multiple times, and thus far the only noticeable loss I've experienced has been the timestamps on directories. I don't want to move the data and think everything is gucci but then find out a year later that some crucial piece of historic information got lost during the transfer.
So the question is: Is there anything NTFS holds that ext4 can't hold, or anything that cannot be translated between the two?
(Bonus question: Anybody knows how to copy a file tree and keep the directory timestamps? I really like my accurate timestamps. Not even rsync can pull this off)
#1 there is data in the file system that does not translate and cannot be preserved by file systems with seriously different data structures. That said, I have never had any NTFS data lost during transfer to a Linux file system be a problem OTHER than timestamp data. If you have a way to handle that then you should be fine.
#2 speaking of time stamps: the DTG (Date Time Group_ structures are VERY different, and do not translate directly. Some programs do better than others at conversions, but do not expect fine granularity. When I started in the business FAT timestamps and Unix timestamps (it was PRE-NTFS and PRE-LINUX) were not converted properly by anything I had. If the data IN THE FILE is properly retained, that is a near best-case situation.
#3 As far as transfer of entire tree structures, rsync does pretty well. I have had luck with some archive software (tar) when combined with shell features doing a better job of transferring AND CONVERTING a tree than other software specifically made for the purpose. It has been a decade since I have done an NTFS->EXT4 transfer that way, so the tools have changed and you would want to run tests before counting on the technique.
Example command-line for #3 recovered from ancient and possibly corrupt meat memory:
Code:
cd /top/level/below_this;tar -cf - source | ( cd /target/folder && tar -xf - )
To clone the directory tree starting at folder /top/level/below_this/source to a new folder named source in /target/folder.
Again, this is from memory and needs testing.
As far as accuracy of timestamps NTFS precision is 100ns and ext4 is 1ns and therefore they probably will not be identical. With newer utilities you can preserve creation times with rsync's --crtime option but I don't know if it is compatible with NTFS. stat does not report birth times for NTFS on my debian 12.
Basically filesystem metadata are timestamps, file length and where the file is located which is separate from the file itself. metadata information for specific files like pdf, or jpeg's are stored within the file itself.
Data are data. If the "birth date" (i.e. the very first time a file was created) is significant, you will have to manage that yourself. Unless that has been dealt with since I last looked.
For things like (digital) photos, exif data generally has all that, but scanned photos/slides don't. I put the date in the filename, but if the date it was created is relevant you'll need to be careful.
Used to be that one would zip the data to preserve extended filesystem information but it only works ntfs to ntfs.
Used to be we would connect an RS232 cable between the serial ports and dump ascii and capture it. But that was before NTFS existed (or Linux, for that matter.) Here is the thing: you should be filled with joy that you can transfer the DATA with great confidence in your chances for success, much less ANY of the MetaData. It was not long ago when getting data transferred (even simple text) was not a sure thing. (DON'T get me started on mixing big-endian and small-endian systems! What a hoot!)
We have the tools today for you to check the stats on all of your source files and make a database of the names and metadata you need, and make that one of the files you transfer. Although you might not get that metadata into the other file system, you can look it up in your data file if you need it.
I just tested it by moving a tree of files from an NTFS partition to an ext4 directory. All the names, timestamp (modified) were copied correctly and the directory structure preserved. The date created timestamp is NOT preserved however. Date modified might be the same as date created, but not necessarily.
I am going to have to do some tests with zip as jefro suggests.
This has been a learning experience!
I suggest you just try it. Remember, you are using cp and notmv !
[Edit]: The next bit is NOT CORRECT, I just checked
rsync copies timestamps if requested. I generally use rsync -avh /source/ /destination Be careful with that training slash at /source/ Best to check the man page.
But I have left it for posterity.
[/EDIT]
Didn't some sorts of Windows files have metadata stored in NTFS alternate streams?
Those would have to be checked for and dealt with. The utility Streams from MS's
Sysinternals Suite can survey a directory tree and notify of the existence of any
of those streams.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.