LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Blogs > rainbowsally
User Name
Password

Notices

Rate this Entry

libLQ and/or Temp (recycling) formatted strings and blocks of memory

Posted 02-26-2012 at 11:02 AM by rainbowsally
Updated 05-06-2012 at 05:48 AM by rainbowsally

[Note: this refers to an obsolete implementation of the temp strings and libLQ sources but the concept of being able to pass a buffer containing a formatted string directly in the parameter list makes this both interesting and important regardless.

Also the peek into the asm code may be helpful for folks wishing to get into the really low level stuff. See how we can redirect a call to sprintf() instead of calling vsprintf() below. This may not be portable but then neither can QT be ported to GTK or visa versa, so you decide. -rs]

libLQ and/or Temp (recycling) formatted strings and blocks of memory

Features:
  • Automatically recycling temp strings that accept formatting strings (like printf).
  • Beefing up libLQ
  • An interesting Computer Mad Science experiment in assembler (gcc dialect).
  • More monkeying with Makefiles generated by makefile-creator (See previous blog for sources).

How many times have you wished you could combine sprintf() or some other thing on temporary strings as a single operation, such as for doing system calls, etc.?

For example if you wanted to execute a system call such as "ls <funcname>" where 'funcname' is supplied from somewhere isn't it a pain having to first sprintf() and then call like this?
Code:
  sprintf(tmpbuf, "ls %s", funcname);
  // and THEN use the string in a call
  err = system(tmpbuf);
Let's address that issue here and let's get our hands a little dirty here with some assembler, just for fun. We won't use the assembler version in the real code, but it's interestingto see how much smaller and faster the code can be if you KNOW the cpu that's going to be running your code.

First, let's define the interface in such a way that we can change the number of temp strings we can use before it recycles and so we can also create other kinds of temporary objects that we can 'adopt' as simply as copying the object pointer and nulling out the pointer in the tmpmem array so that it can't be recycled.

file: src/tmpstr.h
Code:
// tmpstr.h

#ifndef tmpstr_h
#define tmpstr_h

#ifdef __cplusplus
extern "C" {
#endif

// The number of concurrent blocks of mem or strings that can be used 
// before they start recycling.
#define TMPMEM_MAX_BUFFERS 32

/// Returns a memory temp object from a recycling buffer of size bytes
void* tmpmem(int size);

/// Returns a tempstring from a recycling buffer up to size bytes long
char* tmpstr(int size_max, const char* fmt, ...);

/** Takes over ownership of an object or string.  After adoption the user
 is responsible for freeing the object when no longer needed.  Returns 
0 on success, else unable to adopt.*/
int tmpmem_adopt(void* obj);

/// an alias for tmpmem_adopt() for use with strings.
int tmpstr_adopt(char* obj);

#ifdef __cplusplus
} // extern "C"
#endif

#endif // tmpstr_h
That messy stuff up there about __cplusplus is often hidden in macros that are named something like BEGIN_DECLS and END_DECLS, but we'll leave it looking messy for clarity. (? Hmm... well, this time, that may actually make sense.) We may want this code to run from C or from C++ in our libLQ library.

[Note, if you use kdevelop3, as I do, you might want to just delete the filelist and reuse the folders from the libLQ code in the previous blog entry and edit the main.c code for this since this will soon be added to the libLQ library.]

file: src/tmpstr.c
Code:
// tmpstr.c

#include "tmpstr.h"

#ifdef __cplusplus
extern "C" {
#endif

#include <malloc.h> // realloc()
#include <stdio.h>  // vnsprintf()
#include <stdarg.h> // va_list()
#include <stdlib.h> // exit()

static void* tmps[TMPMEM_MAX_BUFFERS];
static int tmps_index = 0;

// get a temp block of memory of size bytes.
void* tmpmem(int size)
{
  void* buf = tmps[tmps_index];
  if(buf)
    buf = realloc(buf, size);
  else
    buf = malloc(size);
  tmps[tmps_index++] = buf;
  if(!buf) {
    fprintf(stderr, "Fatal error: Can't allocate temp buffer (tmpmem/tmpstr)\n");
    exit(1);
  }
  if(tmps_index >= TMPMEM_MAX_BUFFERS) tmps_index = 0;
  return buf;
}

  
/// Returns a tempstring from a recycling buffer up to size bytes long
char* tmpstr(int size_max, const char* fmt, ...)
{
  char* mem = (char*)tmpmem(size_max);
  va_list args;
  va_start(args, fmt);
  vsnprintf(mem, size_max, fmt, args);
  va_end(args);
  return mem;  
}

int tmpmem_adopt(void* obj)
{
  for(int i = 0; i < TMPMEM_MAX_BUFFERS; i++) {
    if(tmps[i] == obj) {
      tmps[i] = 0;
      return 0;
    }
  } // we we get here, it's technically an error.
  return 1;
}

int tmpstr_adopt(char* obj) { return tmpmem_adopt((void*) obj);}

#ifdef __cplusplus
} // extern "C"
#endif
And here's a tester.

file: src/main.c
Code:
// main.c -- test

#include <stdio.h>  // printf() etc.
#include <stdlib.h> // system()

#define COMPUTER_MAD_SCIENCE 0

#include "tmpstr.h"
void dbg(){}

char* _tmpstr(int size_max, const char* fmt, ...);

int main(int argc, char** argv)
{
  dbg();
  if(argc != 2) {
    printf(
        "\n"
        "  Input the name of a file or directory under the current\n"
        "  working directory (PWD).\n\n"
          );
    return 1;
  }
  // ** This is the main thing we're after **
  system(tmpstr(256, "ls %s", argv[1]));
  // and let's test adopt while we're at it.
  for(int i = 0; i < TMPMEM_MAX_BUFFERS * 2; i++)
  {
    // this needs to be traced in a debugger after the 32nd adoption
    // to verify that the string is being nulled out and libc will 
    // tell us if we double-free anything.
    char* n = tmpstr(8, "%d", i);
    if(i > TMPMEM_MAX_BUFFERS)
      dbg();
    
    printf("Adopting string # %s \t", n);
    tmpstr_adopt(n);
    printf("Freeing string # %s\n", n);
    free(n);
  }
  return 0;
}


#if COMPUTER_MAD_SCIENCE

static void place_for_inline_code()
{
  // You can use the 32 bit asm macros in previous blog entries 
  // for this if you like, but we'll spell it out longhand here.
  asm(
      ".data;"
      ".align 4;"
      "retad:"
      ".long 0;"
      
      ".text;\n"
      ".align 4;\n"
      ".globl _tmpstr;\n"             // declare extern
      ".type _tmpstr, @function;\n"   // executable
      "_tmpstr:"                      // entry point named '_tmpstr'
  
  // get the tmpmem pointer in eax an replace
  // sizemax with this buffer and call normal
  // sprintf (not vsprintf) and let the caller
  // clean the final stack.
      "push 4(%esp)\n;"       // param1 = size_max
      "call tmpmem;\n"        // get new temp string
      "addl $4, %esp;\n"      // caller cleans up
      "mov %eax, 4(%esp);\n;" // replace size_max with buffer ptr
      "popl retad;\n"         // save caller's return address
      "call sprintf\n;"       // call sprintf as though directly
      "pushl retad;\n"        // restore return address           
      "movl 4(%esp), %eax;\n" // return the buffer
      "ret;\n"                // return to caller
     );
}

#endif // COMPUTER_MAD_SCIENCE
Run the test code as shown above with something to 'ls' (but quote filenames if you use splats), then let's do a bit of COMPUTER MAD SCIENCE.

Note that the following will NOT work on a 64 bit machine. It will only work for ix86 32 bit CPUs for reasons explained in the Asm 64 blog entries. If you have a 64 bit machine and want to see this work, add the -m32 flag to both the CFLAGS and LDFLAGS in your makefile to cross-compile as a 32 bit app which will still run on your system.

First, though, let's take a look at the disassembly of the tmpstr() function with the -O2 optimization.
Code:
80487ba <tmpstr>:
 80487ba: 55                    push   %ebp
 80487bb: 89 e5                 mov    %esp,%ebp
 80487bd: 83 ec 28              sub    $0x28,%esp
 80487c0: 8b 45 08              mov    0x8(%ebp),%eax
 80487c3: 89 04 24              mov    %eax,(%esp)
 80487c6: e8 45 ff ff ff        call   8048710 <tmpmem>
 80487cb: 89 45 f4              mov    %eax,-0xc(%ebp)
 80487ce: 8d 45 10              lea    0x10(%ebp),%eax
 80487d1: 89 45 f0              mov    %eax,-0x10(%ebp)
 80487d4: 8b 55 f0              mov    -0x10(%ebp),%edx
 80487d7: 8b 45 08              mov    0x8(%ebp),%eax
 80487da: 89 54 24 0c           mov    %edx,0xc(%esp)
 80487de: 8b 55 0c              mov    0xc(%ebp),%edx
 80487e1: 89 54 24 08           mov    %edx,0x8(%esp)
 80487e5: 89 44 24 04           mov    %eax,0x4(%esp)
 80487e9: 8b 45 f4              mov    -0xc(%ebp),%eax
 80487ec: 89 04 24              mov    %eax,(%esp)
 80487ef: e8 dc fc ff ff        call   80484d0 <vsnprintf@plt>
 80487f4: 8b 45 f4              mov    -0xc(%ebp),%eax
 80487f7: c9                    leave  
 80487f8: c3                    ret
21 lines of assember.

And here's _tmpstr, the COMPUTER_MAD_SCIENCE version.
Code:
 80486fc: ff 74 24 04           pushl  0x4(%esp)
 8048700: e8 1f 00 00 00        call   8048724 <tmpmem>
 8048705: 83 c4 04              add    $0x4,%esp
 8048708: 8f 05 38 a0 04 08     popl   0x804a038
 804870e: 89 04 24              mov    %eax,(%esp)
 8048711: e8 8a fd ff ff        call   80484a0 <sprintf@plt>
 8048716: ff 35 38 a0 04 08     pushl  0x804a038
 804871c: 8b 44 24 04           mov    0x4(%esp),%eax
 8048720: c3                    ret
9 lines of assembler and calls sprintf() instead of vsprintf().

[Note: The attempt to avoid some pipeline hits caused by pushes and pops may strike you as being counter-intuitive but it's deliberate. Also, the internal instruction cache may get clobbered if we do a 'jmp *retad' instead of a normal return.]

To run the same test code using the asm version, '#define COMPUTER_MAD_SCIENCE 1' and change the tmpstr() refs to _tmpstr().

Probably best to step this code with something like kdbg >= version 5.0, if you want to see both the source and the disassembly (and registers).

Building the libLQ library with us?

When you're done playing with this move the tmpstr.c and tmpstr.h files to the folder with your libLQ files.

Edit the makefile to add ./tmpstr.c, .h, and .o to the SRC, HDRS, and OBJS lists (with the '\').

Then add this to the rules section.
Code:
$(OBJDIR)/tmpstr.o: $(SRCDIR)/tmpstr.c $(HDR)
  @echo
  @echo "Compiling tmpstr.o"
  $(COMPILE) $(OBJDIR)/tmpstr.o $(SRCDIR)/tmpstr.c $(CFLAGS) $(INCLUDE)
And to recompile the lib type:
Code:
make clean
make
Need help with the Makefile?

Code:
base64 -d << _eof | gunzip -c > Makefile-libLQ
H4sICJwbSk8AA01ha2VmaWxlAJVU32+bMBB+rv+KW6nUUA0ibU+NFGkJSRtWErr8UDRpLwSc4BUw
wkbT9rC/fWdiSFjSNfODz3f2fffx+YxhwDR4oVuWUAgLGkgawQ8mY7hNddiqwry4JcQwYBlT4KXM
S7mFKikLUpy4BJaFSRmxbAd5IGPir5azwXQMfUjYxvtiB016xAoaIiCjAra8AMHLIqTiPXQkTXMT
+OY77qMfZBFsWBYUP3XJjjDJ0J2N3DnC2mQxd+qlP/xcL1WZdRxIcPzps+uNQcS8TCKIuE3qUB92
YQhW+vEDWGg5cR68weMC45aPISGj/i4r7+/BAJ5LlrJflLgzx1uNVK7ldktRdPcfTNG1uzedPRnz
UN5zZ0/HtSu/D0GBOgPxRnVFQ6kXo254Ytjya8G2ZZJUooLkIJsLqPQn04E7w6ybzl4YE6lo6RUX
mPoj9+ErDMeev4blxF0oXmNYu8sJPM7HgyU4g/n4HZ60LhtEyY4FvxEAuysSJqQdnvVkmgtZaNc4
HoRMRvMGA++7yotbKHEbJT6Hgrd+gsJbKLyNws+hGG8OmJcJNusC25LxjJAgSXqouJLeRIsfowyy
UQblQeX1bu8Q16eq7atPNIy5NnDtsexFPZzD1V2Tq5uO6hjzqI4G0q1TLd2hqU4++4tlVXT/DsyD
Gj1oWrMJhprN3zQcnuYsUUSa9IqHfjWaQhv/FXinoaifTYudOOUmLmQm/s1LnLASF3KqO+SYVNPB
b7Gqk1+j1bTfWexzxC7pypWgBYzwJ53RiJAwoUHWQ3JFCta2bpsjX7VP4979hrvufmoMLggRNGU1
1H9l/gE01prlSgYAAA==
_eof
We'll troubleshoot the lib in a while. Do you know why tmpstr will work with C++ but slists will not?

There are two ways to deal with this. One would be to make a class for SList that automatically runs the constructors/destructors and doesn't need an explicit object pointer, and the other would be... what?

We'll do the 'what' version first. If you already know what it needs, go ahead and fix it and we'll catch up with you in a bit.

Note: slists and tmpstr strings will work fine together. When we append a string to an slist it makes it's own copy so you can recycle away with the tmpstr stuff and the strings in the slist will not get deleted.

My libLQ.a file is now almost 9K. WOO HOO! :-)
Posted in Uncategorized
Views 299 Comments 0
« Prev     Main     Next »
Total Comments 0

Comments

 

  



All times are GMT -5. The time now is 02:45 PM.

Main Menu
Advertisement

My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration