LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 10-13-2005, 10:00 AM   #1
Thinking
Member
 
Registered: Oct 2003
Posts: 249

Rep: Reputation: 30
how can i speed up memcpy?


hiho@ll

i have a simple test environment

i have a server
i have a client

/*booth C progs*/

the client does write(); in a loop
and the server does read(); in a loop

the client does some time measurement so i know how fast i can be

what i want:
the maximum throuthput which is possible and which fits in my application needs

i'm not sure if my mesaurements are correctly
but what i know is
if i do ONLY write(); AND read(); in a loop on both sides
i get a value of 2002

on the server side
if i do a memcpy(); during the loop
i get a value of 600

this means if i copy a buffer (the incoming to an temp buffer) the speed is slowed down!

i know i never get a value of 2000 if my whole app is between write(); and read();

BUT i want at minimum 1000!

i thought about writing
memcpy();
in assembler

WHY?
because gcc does not always use the best implementation of assembler memcpy!!!
on some cpu's
a loop with mov commands is faster than
using the rep command!!!

so my questions:

1. because i have very very very basic knowledge of assembler itself i have no idea how the gcc assembler syntax works!
the best would be that somebody can write those few lines in assembler for me
OR
maybe i get some little to understand info on assembler programming in gcc
so i can try the two possible memcpy assembler implementations (rep or mov)

2. maybe some weird GCC command line optimization options would help!!
but i have no idea which options do what
so i don't know which to use
and i don't know which options are available

thx@ll

btw. if use memcpy and my whole algorithm is between write(); and read(); i have a throughput of 400
this is fast
really fast

so if anybody posts
DON'T try it!
i only have to say
but i want it to try
that's the only way i learn many things about linux, C and maybe assembler
so please help and don't say
let it be
 
Old 10-13-2005, 12:21 PM   #2
aluser
Member
 
Registered: Mar 2004
Location: Massachusetts
Distribution: Debian
Posts: 557

Rep: Reputation: 42
Quote:
maybe some weird GCC command line optimization options would help!!
I think -O3 will turn on everything you'd want except maybe -funroll-loops . There's a possibility that loop unrolling actually slows down your code, so it's imperative to test it : )

Quote:
maybe i get some little to understand info on assembler programming in gcc
A google found this: http://www.ibiblio.org/gferg/ldp/GCC...bly-HOWTO.html . There's some (more cryptic) information in the gcc docs: http://gcc.gnu.org/onlinedocs/gcc-3....ended-Asm.html and http://gcc.gnu.org/onlinedocs/gcc-3....nstraints.html

I can help more with that if you want, probably


Are you positive that you need as many memcpy()s as you're using? Perhaps the problem could be solved by a more complicated buffering scheme. What is your server doing?
 
Old 10-13-2005, 12:27 PM   #3
aluser
Member
 
Registered: Mar 2004
Location: Massachusetts
Distribution: Debian
Posts: 557

Rep: Reputation: 42
also, it seems inuitive that, if you always memcpy in multiples of the word size you could beat gcc's implementation. Is that the case? You might even try unrolling the loop around your mov instructions..

However, if gcc is using a builtin memcpy and you call it with constant lengths, maybe it already does these things (?)
 
Old 10-13-2005, 12:32 PM   #4
itsme86
Senior Member
 
Registered: Jan 2004
Location: Oregon, USA
Distribution: Slackware
Posts: 1,246

Rep: Reputation: 58
I believe glibc's implementation of memcpy() already copies in word-sized chunks by means of casting.

EDIT: Yeah, it actually does whole pages at a time if it can, then words, then bytes. this is in the file glibc-2.3.5/sysdeps/generic/memcpy.c
Code:
void *
memcpy (dstpp, srcpp, len)
     void *dstpp;
     const void *srcpp;
     size_t len;
{
  unsigned long int dstp = (long int) dstpp;
  unsigned long int srcp = (long int) srcpp;

  /* Copy from the beginning to the end.  */

  /* If there not too few bytes to copy, use word copy.  */
  if (len >= OP_T_THRES)
    {
      /* Copy just a few bytes to make DSTP aligned.  */
      len -= (-dstp) % OPSIZ;
      BYTE_COPY_FWD (dstp, srcp, (-dstp) % OPSIZ);

      /* Copy whole pages from SRCP to DSTP by virtual address manipulation,
         as much as possible.  */

      PAGE_COPY_FWD_MAYBE (dstp, srcp, len, len);

      /* Copy from SRCP to DSTP taking advantage of the known alignment of
         DSTP.  Number of bytes remaining is put in the third argument,
         i.e. in LEN.  This number may vary from machine to machine.  */

      WORD_COPY_FWD (dstp, srcp, len, len);

      /* Fall out and copy the tail.  */
    }

  /* There are just a few bytes to copy.  Use byte memory operations.  */
  BYTE_COPY_FWD (dstp, srcp, len);

  return dstpp;
}

Last edited by itsme86; 10-13-2005 at 12:44 PM.
 
Old 10-13-2005, 12:38 PM   #5
aluser
Member
 
Registered: Mar 2004
Location: Massachusetts
Distribution: Debian
Posts: 557

Rep: Reputation: 42
Quote:
I believe glibc's implementation of memcpy() already copies in word-sized chunks by means of casting.
Sure, but somehow this has to work:
Code:
char a[4];
a[3] = '\0';
memcpy(a, "abcQ", 3);
assert(strcmp(a, "abc") == 0);
So memcpy() is doing something special for the case where the size isn't a multiple of the word size. If the size argument isn't constant, then this has to be done at run time, somehow.
 
Old 10-13-2005, 12:47 PM   #6
itsme86
Senior Member
 
Registered: Jan 2004
Location: Oregon, USA
Distribution: Slackware
Posts: 1,246

Rep: Reputation: 58
Sorry, I pasted glibc's implementation after your post. I guess if you already knew how many bytes you were copying then you could avoid memcpy()'s branch logic, but even if you previously knew how many bytes you were copying it adds a maintenance headache. What if that number of bytes changes in the future? You have to remember to consider your copying algorithm too, all for the sake of saving the tiniest bit of time.
 
Old 10-13-2005, 12:59 PM   #7
aluser
Member
 
Registered: Mar 2004
Location: Massachusetts
Distribution: Debian
Posts: 557

Rep: Reputation: 42
The whole post sounds like a somewhat silly micro-optimization to me too, but it could be academcally interesting : )

To save on the maintainance headache, you could call assert() inside of memcpy (this can be compiled away with NDEBUG) or make your version take a number of words as the size argument instead of a number of bytes. Call it wordcpy or something../
 
Old 10-13-2005, 01:01 PM   #8
jim mcnamara
Member
 
Registered: May 2002
Posts: 964

Rep: Reputation: 34
Did you try:
gcc -g -p myprog.c -o myprog

1. run your code
2 run gprof myprog

and look at the results of profiling? Just because elapsed time is longer does not mean that your memcpy() call is necessairly the problem.
 
Old 10-14-2005, 02:54 AM   #9
Thinking
Member
 
Registered: Oct 2003
Posts: 249

Original Poster
Rep: Reputation: 30
hiho@ll

anybody knows the bcopy function??
what the hell is the difference between bcopy and memcpy????

reading the man page of gcc while searching some compiler options i noticed that there is a function bcopy
i didn't know this function exists!

the man of bcopy says it's depricated and i should use memcpy

i just gave it a try an voila

the speed increases to about 900!! THIS IS EXTREMLY GOOD

well i also used some gcc flags
but i used those flags with memcpy and bcopy
and it didn't help with memcpy
but bcopy is really fast!

so, what's the difference?

btw: i tried the whole stuff with and without gcc flags
the flags i used are for i686 architecture (i just wanted to try)
using memcpy i got around 450
using bcopy i got around 900
WITHOUT the flags
memcpy: around 450
bcpy: around 1100!!!


how did i measure:
i used the gettimeofday function to know how fast i can send data between 2 progs
using only a simple benchmark tool
i got a value around 1300
there is nothing between the 2 progs
just a server and a client
the server writes and the client reads

then i tried the same measurement using my own architecture
and i got 1100 (well there is not my whole architecture between the 2 endpoints, so the whole architecture will reduce the stuff a bit)

so i think i'm at the end of testing ;-)

but i want to know why bcopy is so much faster?

thx@ll
 
Old 10-14-2005, 07:53 AM   #10
jlliagre
Moderator
 
Registered: Feb 2004
Location: Outside Paris
Distribution: Solaris10, Solaris 11, Mint, OL
Posts: 9,506

Rep: Reputation: 360Reputation: 360Reputation: 360Reputation: 360
Quote:
anybody knows the bcopy function??
yes
Quote:
what the hell is the difference between bcopy and memcpy????
Their name, the order of their arguments, and possibly their implementation ...
bcopy comes from BSD code, while memcpy is from System V, and more standardized.
Quote:
but i want to know why bcopy is so much faster?
I would say it's more optimized, at least on your system.
There may be platforms where they behave similarly, or where memcpy run faster ...
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
memcpy problems alaios Programming 4 09-17-2005 07:26 AM
PCI bus speed and video card speed juanbobo General 0 08-01-2005 01:13 PM
Need help: Seg fault, Memcpy, and dynamically allocated arrays benobi Programming 3 06-09-2005 10:58 PM
cdrecord - trying to use high speed medium on low speed writer captain-cat Linux - Hardware 2 07-12-2004 06:27 PM
Is there a tool to monitor Internet connection speed and also network speed? xleft4dexy Linux - Networking 4 10-14-2003 10:29 PM


All times are GMT -5. The time now is 02:09 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration