LinuxQuestions.org - Convert a hardlink to a file with its own inode

- Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)

- - Convert a hardlink to a file with its own inode (https://www.linuxquestions.org/questions/linux-software-2/convert-a-hardlink-to-a-file-with-its-own-inode-495453/)

Convert a hardlink to a file with its own inode

A file has an inode.
The inode is a data structure that contains the information that the filesystem needs to find the raw data of the file in the filesystem media.
The inode has a count of all the files that connect to it.
When you use 'ln' to create a hard link to a file you increase the ref count on the inode and you have a new file with that same inode as the original file.

eg:

filename inode
file1 1

inode 1 now has a refcount of 1

ln file1 file2

filename inode
file1 1
file2 1

inode 1 now has refcount of 2

file1 and file2 are just 2 names that refer to the same inode which is to say the same data in the memory storage medium.
edits to file1 will show up in file2
If I now copy

cp file1 file3

filename inode
file1 1
file2 1
file3 2

edits to file1 will show up in file 2 but not in file3 because file3 links to a different inode and hence to a different set of data within the data storage medium.

I am looking for a command that allows me to change a filenames inode to a new inode that refers to a copy of the data referred to by its original inode.

With reference to the above example

filename inode
file1 1
file2 1
file3 2

command file2

would result in

filename inode
file1 1
file2 3
file3 2

Where inode 3 refers to a copy of the data currently referred to by inode 1

This would be the same result as

cp file1 tempfile
rm file2
mv tmpfile file2

Although this is a solution that could work quite well and a script could be devised that could do this, and even handle situations where the tempfile name might clash with something already there and could also handle the situation where the process of making the new copy of the data might over fill the disk. I was wondering if there is a switch on either cp mv ln or any other command that perfoms this function.

Boy, that was a long post. I *think* I understand what you are asking, but I'm not sure!

If you have two copies of the data (no matter how many links point to these copies), and you now want THREE copies of the data, you're just going to have to cough up more disk space to store that third copy. No if's, and's, or but's about it. There is no magic way to store data on a disk without using disk space that I'm aware of.

Oh wait, I reread your post and you're not asking about running out of disk space.

You can try:

Code:

$ cp --remove-destination file1 file2

Quote:

Originally Posted by haertig

You can try:

Code:

$ cp --remove-destination file1 file2

That works :)
Still it would be nice to have something that only needs a reference to the subject file.

I would like a switch to mv that tells it to perform as if it is moving across filesystems. Normally if you mv within a filesystem the filename referring to the inode is all that changes. If you mv across filesystem boundries a copy is created on the destination filesystem with its own destination inode and a destination file is created to link to the new destination inode and then the source file is unlinked from the source inode and if the source inode's refcount drops to 0 then the source inode is released and can be reused by the filesystem.

Quote:

Originally Posted by claytonjohnroby

I would like a switch to mv that tells it to perform as if it is moving across filesystems. Normally if you mv within a filesystem the filename referring to the inode is all that changes. If you mv across filesystem boundries a copy is created on the destination ...

So you prefer the inefficient over the efficient? For the life of me, I can't figure out why. But if you really want to wipe out program shortcuts and speedups, just write your own mv command.

Here it is in pseudo-code. Implement it in whatever language you like.

Code:

open(A, ro)

open(B, rw)

while read(A) { write(B) }

close(A)

close(B)

remove(A)

The reason that I want this is that I am currently developing a website on my linux box and uploading it to the server when I have completed updates.

In each directory I have some files that are the same for each directory of the project. I would like to edit these files as if they were all one file so hardlinking them seems the way to go. This results in a single file to be edited in my development environment and separate copies of this single file in each directory of my production environment because copying of several files linked to the same inode from one system to another results in each destination file having its own inode.

Once I set this up I asked myself the question of how to then customise these constant files in these directories when I need to. As I pointed out in my original post I can always copy to a temp file, remove the hardlinked file and then move the temp file to the original filename or I could copy with a remove destination option from another file that shares the inode as was demonstrated earlier. The first option is a bit inelegant and the second has the requirement that you need to find another file that shares the inode of the file you want to convert.

A more complete description of what I am looking for

Code:

$command file1

Assigns a new inode and copies to it the data from the original inode like cp would do.
Delink file1 from its current inode and decrement the inode's refcount by 1 like rm would do.
Link file1 to the new inode and increment the inode's refcount by 1 like ln would do.

This would eliminate the need for a temp filename and eliminate the need to find another file with the same inode.

I think if such a facility does not already exist that it would be a good feature to add to bash and to the stand alone versions of either ln, mv or cp.

Try putting your common files is one place and using symlinks, not hardlinks. A symlink will copy from system to system. Do "man ln" and look at the -s option.

I have found something relating to what I am looking for. It is called "copy-on-write link breaking" and it is used by VServer. It allows a collection of files that are common to several guest operating systems to be hardlinked to a single file in a central repository. If one of the guest operating systems then writes to the file through their link to the shared file the link is broken and a copy that belongs only to the guest operating system with the modifications the guest operating system made is created.

Further research shows that this is a concern all throughout "kernel level isolation" virtualisation where there is a desire to share files across multiple guests.

Now I just need to find out how to do this for myself.