What is the fastest way to create a file?

sovereign_aua · 01-13-2010, 10:32 AM

I'm wondering,if open() is the fastest way to create a file.
Not one or two...,but a million of files.
Is assembler file creation algorithm more efficient than open() in gcc?
Can you suggest me to use script's,or this is not the best way?

Thank's in advance

carbonfiber · 01-13-2010, 10:53 AM

Do you know what the "assembler file creation algorithm" is? What would you need scripts for?

sovereign_aua · 01-13-2010, 10:58 AM

I know assembler very little bit.
Here is a Intel syntax fragment of code:
.............
file1 db "c:\test1\file1.txt", 0
.............
mov ah, 3ch
mov cx, 0
mov dx, offset file1
int 21h
jc err
mov handle, ax

carbonfiber · 01-13-2010, 11:51 AM

I see. I'm unsure if you are asking because you plan to write a program to "create 1000000 files" and would like to know the 'faster' way to do this, or.. you're asking to satisfy a curiosity (in the style of: which is faster, a great white shark or a tiger shark). If it is the former, I'd say the best way to find an answer would be to write said programs. "PC Assembly language" (a free tutorial, available from http://www.drpaulcarter.com/pcasm/) might help with the assembly part. You might also find it stimulating to read about linux system calls. I assume you are already familiar with C programming. Check out gcc's "-S" flag, and the various optimisation flags.

johnsfine · 01-13-2010, 12:03 PM

"Non-*NIX" in the name of this forum does not necessarily mean Windows. IIUC, this is the main "programming" subforum of LQ, so despite "Non-*NIX", programming questions here tend to be Linux. There are also other alternatives. So we didn't know you meant Windows until you said "int 21h". It would have helped to say "Windows" up front. (If you didn't mean Windows, you're confused about the use of int 21h).

Anyway, there will be no noticeable performance difference across any of the reasonable ways you might have your user mode code send the request to the kernel to create the files. All the real work is in the OS, and the OS will do the same work to create the files regardless of which functions you use in your own program.

Windows can create 1000 files in each of 1000 directories faster than it can create 1000000 files in one directory. There are probably many other aspects of the sequence of names of the files that would have measurable impact on the total time, but I don't know Windows internals to level of trying to predict that.

bigearsbilly · 01-13-2010, 12:53 PM

well if you create a file you'll still need to call a system library
so I shouldn't think it makes much difference.
C would probably do quick enough.

why don't you try it?

sovereign_aua · 01-14-2010, 10:11 AM

Thank's all for replies.
I've written such an inline asm part or my C prog:

char *dir1 = "/home/.../my_file";
__asm__ __volatile ("":: "dx"(dir1));
__asm__ __volatile ("movb $0x39, %ah");
__asm__ __volatile ("int $0x80");

I use the gcc build in assembler.
This prog compiles without any warning's but doesn't create the file.
What could be the reason?

carbonfiber · 01-14-2010, 10:41 AM

I guess it is Linux we're discussing. Try making sense of the following program:

Code:

segment .data
filename db 'hello'
filename_length equ $ - filename


segment .text
	global _start

_start:
	mov eax, 8
	mov ebx, filename
	mov ecx, filename_length
	int 0x80

	mov eax, 1
	mov ebx, 0
	int 0x80

sovereign_aua · 01-14-2010, 11:11 AM

Yes,I'm using Linux.
-----------
2.6.25-1.1 Suse 11.0
x64 86
-----------
I rewrited your's asm code:

char *dir1 = "my_asm";
char *dir2 = "$ - filename";

__asm__ __volatile ("movl $8, %eax");
__asm__ __volatile ("":: "ebx"(dir1));
__asm__ __volatile ("":: "ecx"(dir2));
__asm__ __volatile ("int $0x80");

__asm__ __volatile ("mov $1, %eax");
__asm__ __volatile ("mov $0, %ebx");
__asm__ __volatile ("int $0x80");

My prog waits for input,but dont creates the file.

carbonfiber · 01-14-2010, 11:35 AM

"Making sense" does not mean "try to integrate it in your program without understanding how it works".

johnsfine · 01-14-2010, 12:56 PM

Quote:

Originally Posted by sovereign_aua

__asm__ __volatile ("movl $8, %eax");
__asm__ __volatile ("":: "ebx"(dir1));
__asm__ __volatile ("":: "ecx"(dir2));
__asm__ __volatile ("int $0x80");

I'm not certain when you get away with doing asm the way you have it there. But generally you can't.

You should not write sequential asm instructions as separate embedded asm instructions in C or C++. There is no guarantee the compiler won't disturb your registers between your instructions.

GCC has a very complicated syntax for specifying inputs, outputs and register use for a block of embedded asm. If you want to code in embedded asm, learn that syntax, use it correctly and combine all related asm instructions into a single embedding block.

Usually it is less confusing to write an entire function in actual asm, rather than writing part of a function in embedded asm. It is certainly easier to learn the rules for entire functions in asm as compared to the rules for correct embedded asm.

I don't know where the register use and values for int $0x80 are documented. I wouldn't try to use int $0x80 by fudging some example. I wouldn't try to use it without first finding and understanding the documentation.

Repeating the main fact from earlier in this thread: You are wasting your efforts. The few milliseconds you might save in user code by directly calling int $0x80 a million times instead of calling open() a million times, is trivial compared to the time the OS needs to spend (either way) to service those million requests.

10110111 · 01-14-2010, 02:51 PM

When doing access to disks it is usually no difference if you are using asm and calling int 0x80 or you use fopen(), because disk operations are usually MUCH slower than executing some layer of an API.

Sergei Steshenko · 01-14-2010, 03:11 PM

A million files will most likely exhaust OS file/directory buffers, i.e. actual writes to disk will be necessary. In such a case writing in assembly makes no sense - disk speed will be the bottleneck.

smeezekitty · 01-14-2010, 04:27 PM

Just use pure C for portability because the disk or the kernel itself will be the drag and not your program.