LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-14-2013, 09:47 AM   #1
jyunker
Member
 
Registered: Aug 2009
Posts: 167

Rep: Reputation: 0
Smashing Perfromance with Oprofile example code is not compiling


I am trying to use Oprofile to find instances of false sharing. I found some code and a website that I think is helpful.

The website is:

http://www.ibm.com/developerworks/library/l-oprof/

"Smashing Performance with Oprofile" is the name of the IBM report. The code is



Code:
/*
 * Shared data being modified by two threads running on different CPUs.
 */

/* shared structure between two threads which will be optimized later*/
struct shared_data_align {
    unsigned int num_proc1;
    unsigned int num_proc2;
};
/* 
 * Shared structure between two threads remains unchanged (non optimized)
 * This is required in order to collect some samples for the L2_LINES_IN event.
 */
struct shared_data_nonalign {
    unsigned int num_proc1;
    unsigned int num_proc2;
};

/*
 * In the example program below, the parent process creates a clone
 * thread sharing its memory space. The parent thread running on one CPU 
 * increments the num_proc1 element of the common and common_aln. The cloned
 * thread running on another CPU increments the value of num_proc2 element of
 * the common and common_aln structure.
 */
/* Declare global data */
struct shared_data_nonalign common_aln;

/*Declare local shared data */
struct shared_data_align common;

    /* Now clone a thread sharing memory space with the parent process */
    if ((pid = clone(func1, buff+8188, CLONE_VM, &common)) < 0) {
        perror("clone");
        exit(1);
    }
    
    /* Increment the value of num_proc1 in loop */
    for (j = 0; j < 200; j++)
        for(i = 0; i < 100000; i++) {
            common.num_proc1++;
        }

    /* Increment the value of num_proc1 in loop */
    for (j = 0; j < 200; j++)
        for(i = 0; i < 100000; i++) {
            common_aln.num_proc1++;
        }
/*
 * The routine below is called by the cloned thread, to increment the num_proc2 
 * element of common and common_aln structure in loop.
 */
int func1(struct shared_data_align *com)
{
    int i, j;
    /* Increment the value of num_proc2 in loop */
    for (j = 0; j < 200; j++)
        for (i = 0; i < 100000; i++) {
            com->num_proc2++;
        }

    /* Increment the value of num_proc2 in loop */
    for (j = 0; j < 200; j++)
        for (i = 0; i < 100000; i++) {
            common_aln.num_proc2++;
        }
But is does not compile. I know that it is missing a main function, but I think that is just a mistake the author made when putting the code in the article. What code is missing from the
source to allow it to compile?

It has been nted in other posting that this code does not compile. I just need to know what is missing.


Any help appreaciated. Thanks in advance.

Respectfully,

jyunker
 
Old 11-14-2013, 11:21 PM   #2
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and Scientific Linux
Posts: 5,753

Rep: Reputation: 1288Reputation: 1288Reputation: 1288Reputation: 1288Reputation: 1288Reputation: 1288Reputation: 1288Reputation: 1288Reputation: 1288
Hi,

well, for one thing it appears to be missing a closing brace at the end (there may be others missing too). For another, when reporting problems, you should always include the actual error, or in this case the output from the compiler along with what command you issued.

Evo2.
 
Old 11-15-2013, 01:02 AM   #3
padeen
Member
 
Registered: Sep 2009
Location: Perth, W.A.
Distribution: Slackware 14, Debian Sid, FreeBSD 10, OpenBSD
Posts: 191

Rep: Reputation: 36
I read the article and briefly played around with the code, but it looks like it is a code snippet rather than compilable source. It is not just that the author has forgotten to add "main( ..){...}", there are many missing declarations as well, too many for just an oversight. I think the author is demonstrating the concept rather that writing compilable code.

Having said that, it would only be a few minutes work to add the missing declarations and definitions.
 
Old 11-15-2013, 09:47 AM   #4
jyunker
Member
 
Registered: Aug 2009
Posts: 167

Original Poster
Rep: Reputation: 0
I believe what you say to be correct. I am not a computer scientist. My area is operations research. It may take someone who knows a lot about c just a few minutes to repair the code, but I am not that person. I just want it to run. I have a strong interest in seeing this code in the before and after scenario as far as false sharing goes. That is really what I am all about. The code as was said is clearly missing a main statement and clearly missing some variable definitions.

My compile line and output are shown:

gcc prog.c -o prog.c -lm -lrt -g
prog.c: In function ‘main’:
prog.c:7: warning: empty declaration
prog.c:15: warning: empty declaration
prog.c:34: error: expected declaration specifiers before ‘if’
prog.c:40: error: expected declaration specifiers before ‘for’
prog.c:40: error: expected declaration specifiers before ‘j’
prog.c:40: error: expected declaration specifiers before ‘j’
prog.c:41: error: expected declaration specifiers before ‘i’
prog.c:41: error: expected declaration specifiers before ‘i’
prog.c:46: error: expected declaration specifiers before ‘for’
prog.c:46: error: expected declaration specifiers before ‘j’
prog.c:46: error: expected declaration specifiers before ‘j’
prog.c:47: error: expected declaration specifiers before ‘i’
prog.c:47: error: expected declaration specifiers before ‘i’
prog.c:55: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
prog.c:1: error: old-style parameter declarations in prototyped function definition
prog.c:68: error: expected ‘{’ at end of input

Any help appreciated in getting this to compile.

Thanks in advance.

Respectfully,


jyunker

In 7 and 15 the compiler decalring it an empty decaration. What is that?
 
Old 11-15-2013, 10:20 AM   #5
benb2013
LQ Newbie
 
Registered: Oct 2011
Posts: 4

Rep: Reputation: Disabled
As you said ANY help appreciated: below source compiles and runs, at least when using gcc-4.8.2 and glibc-2.17-1.

Save source below as file.c and compile with 'gcc -DRESULT -static -o yourprogname file.c',
leave out the -DRESULT for the outcome summary .

I cannot tell from the snippet what it was supposed to do originally, but it has the look and feel of an untested ad-hoc example.The way I interpreted the original incomplete snippet, the impressive counters do not much more than counting beans one by one and then assign the outcome to a variable in a struct. But perhaps I misinterpret what the author originally meant to do. The parent and child process running on multiple cores by cloning a single thread is in no way guaranteed.

Added code was marked as NEW in the source, explanation included.

Code:
/*                                                                              
 * Shared data being modified by two threads [?running on different CPUs?].     
 */                                                                             
                                                                                
                                                                                
#ifndef RESULT                                                                  
#undef  RESULT        /* Shall we print? change #undef into #define here or compile */  
#endif                /* with -DRESULT for printing the assigned values             */                               
                                                                                
#define _GNU_SOURCE   /* required: for clone flags / GNU clone wrapper */       
#include <stdio.h>    /* required: for the added print statements */            
#include <sched.h>    /* required: clone function protype */                    
#include <stdlib.h>   /* required: for return added to function and main */     
#include <sys/mman.h> /* for mmap allocating stack space for the child */       
                                                                                
int main(void) { /* NEW wrapper main() */                                       
                                                                                
/*                                                                              
 * Shared structure between two threads remains unchanged (non optimized)       
 * This is required in order to collect some samples for the L2_LINES_IN event. 
 */                                                                             
struct shared_data_nonalign {                                                   
    unsigned int num_proc1;                                                     
    unsigned int num_proc2;                                                     
};                                                                              
                                                                                
/* shared structure between two threads which will be optimized later*/         
struct shared_data_align {                                                      
    unsigned int num_proc1;                                                     
    unsigned int num_proc2;                                                     
};                                                                              
                                                                                
/*                                                                              
 * In the example program below, the parent process creates a clone             
 * thread sharing its memory space. The parent thread running on one CPU        
 * increments the num_proc1 element of the common and common_aln. The cloned    
 * thread running on another CPU increments the value of num_proc2 element of   
 * the common and common_aln structure.                                         
 */                                                                             
/* Declare global data */                                                        
                                                                                                                                     struct shared_data_nonalign common_aln;                                         
                                                                                
/* NEW: It seemed a good idea to initialize data to avoid random outcome */     
common_aln.num_proc1 = 0;                                                       
common_aln.num_proc2 = 0;                                                       
                                                                                
/*Declare local shared data */                                                  
struct shared_data_align common;                                                
                                                                                
/* NEW: See previous */                                                         
common.num_proc1 = 0;                                                           
common.num_proc2 = 0;                                                           
                                                                                
/*                                                                              
 * The routine below is called by the cloned thread, to increment the num_proc2 
 * element of common and common_aln structure in loop.                          
 */                                                                             
int func1(void *com)                                                            
{                                                                               
    /* NEW: clone expects a void pointer for the function argument, so the      
     *      argument was defined as void * and added was a local pointer     
     *      inside the function body, that addresses the void *arg as a         
     *      struct shared_data_align pointer, which effectively functions       
     *      as a cast of the void parameter.                                         
     */                                                                         
    int i, j;                                                                   
    struct shared_data_align *ncom = com;         /* point to void parameter */ 
                                                                                
    /* Increment the value of num_proc2 in loop */                              
    for (j = 0; j < 200; j++)                 /* there are so many ways to   */ 
        for (i = 0; i < 100000; i++) {        /* assign a value to a var...  */ 
            ncom->num_proc2++;                                                  
        }                                                                       
                                                                                
    /* Increment the value of num_proc2 in loop */                              
    for (j = 0; j < 200; j++)                                                   
        for (i = 0; i < 100000; i++) {                                          
            common_aln.num_proc2++;                                             
        }                                           

                                                                                
    return 0;                    /* function of type int must return int */     
}                                                                               
                                                                                
/* NEW: Added mmap to allocate stack space for the child to pass to clone       
 *      The stack parameter for the child is called 'buff' in the example.      
 *                                                                              
 *      From the clone man page:                                                
 *                                                                              
 *      " Since the child and calling process may share memory, it is not       
 *        possible for the child process to execute in the same stack as        
 *        the calling process. The calling process must therefore set up        
 *        memory space for the child stack and pass a pointer to this space     
 *        to clone()"                                                           
 *                                                                              
 *      Also see the reference to mmap question on stack overflow.                                                               
 */                                                                             
                                                                                
/* Now clone a thread sharing memory space with the parent process */           
                                                                                
int pid, i=0,j=0; /* NEW */                                                     
void *buff;       /* stack */                                                   
                                                                                
/* See http://stackoverflow.com/questions/1083172/\                             
 * how-to-mmap-the-stack-for-the-clone-system-call-on-linux */                  
void *start =(void *) 0x0000010000000000;                                       
size_t len =          0x0000000000200000; /* grows with */                      
                                                                                
/* The flags for mmap are tailored to stack allocation */                       
if((buff = mmap(start, len, PROT_WRITE|PROT_READ,                               
                            MAP_PRIVATE|MAP_GROWSDOWN|MAP_FIXED|MAP_ANONYMOUS,  
                             -1,0)) == MAP_FAILED)                              
{                                                                               
    perror("mmap");                                                             
    exit(-1);                                                                   
}

if ((pid = clone(func1, buff+len, CLONE_VM, &common)) < 0)                      
{                                                                               
    perror("clone");                                                            
    exit(1);                                                                    
} else {                                                                        
    printf("Cloned a thread with pid %d\n", pid);                               
}                                                                               
                                                                                
/* Increment the value of num_proc1 in loop */                                  
for (j = 0; j < 200; j++)                                                       
    for(i = 0; i < 100000; i++) {                                               
        common.num_proc1++; /* 200*100000, would be... one moment please...*/   
    }                                                                           
                                                                                
/* Increment the value of num_proc1 in loop */                                  
for (j = 0; j < 200; j++)                                                       
    for(i = 0; i < 100000; i++) {                                               
        common_aln.num_proc1++;                                                 
    }                                                                           
                                                                                
#ifdef RESULT                                                                   
printf("common.num_proc1 is: %d\n", common.num_proc1);                          
printf("common.num_proc2 is: %d\n", common.num_proc2);                          
                                                                                
printf("common_aln.num_proc1 is: %d\n", common_aln.num_proc1);                  
printf("common_aln.num_proc2 is: %d\n", common_aln.num_proc2);                  
#endif                                                                          
                                                                                
printf(">>> 42 <<<\n");     
                                                                                
return 0;                                                                       
}
Hope this helps, but I doubt this single code snippet will help much to demonstrate the toolset discussed.

As I mentioned: the example seems the result of cut and paste, probably taken from a larger context with a proper testbed. Perhaps someone more familiar with the tools can improve this answer, so it becomes more meaningful as a demonstration of how the Oprofile toolset can detect that code is less optimized than assumed.

Enjoy.
 
Old 11-15-2013, 10:51 AM   #6
jyunker
Member
 
Registered: Aug 2009
Posts: 167

Original Poster
Rep: Reputation: 0
gotten the code this far

I will try the example in the discussion right above this one. I have been working on the code and here is the output
when I compile it:

gcc prog.c -o prog -lm -lrt -g
prog.c: In function ‘main’:
prog.c:35: error: ‘pid’ undeclared (first use in this function)
prog.c:35: error: (Each undeclared identifier is reported only once
prog.c:35: error: for each function it appears in.)
prog.c:35: error: ‘func1’ undeclared (first use in this function)
prog.c:35: error: ‘buff’ undeclared (first use in this function)
prog.c:35: error: ‘CLONE_VM’ undeclared (first use in this function)
prog.c:37: warning: incompatible implicit declaration of built-in function ‘exit’
prog.c:69: error: expected declaration or statement at end of input

Code:
nt main( int argc, char **argv ) {
int i, j;
/*
 * Shared data being modified by two threads running on different CPUs.
 */

/* shared structure between two threads which will be optimized later*/
struct shared_data_align {
    unsigned int num_proc1;
    unsigned int num_proc2;
};
/* 
 * Shared structure between two threads remains unchanged (non optimized)
 * This is required in order to collect some samples for the L2_LINES_IN event.
 */
struct shared_data_nonalign {
    unsigned int num_proc1;
    unsigned int num_proc2;
};

/*
 * In the example program below, the parent process creates a clone
 * thread sharing its memory space. The parent thread running on one CPU 
 * increments the num_proc1 element of the common and common_aln. The cloned
 * thread running on another CPU increments the value of num_proc2 element of
 * the common and common_aln structure.
 */
/* Declare global data */
struct shared_data_nonalign common_aln;

/*Declare local shared data */
struct shared_data_align common;

    /* Now clone a thread sharing memory space with the parent process */
    if ((pid = clone(func1, buff+8188, CLONE_VM, &common)) < 0) {
        perror("clone");
        exit(1);
    }
    
    /* Increment the value of num_proc1 in loop */
    for (j = 0; j < 200; j++)
        for(i = 0; i < 100000; i++) {
            common.num_proc1++;
        }

    /* Increment the value of num_proc1 in loop */
    for (j = 0; j < 200; j++)
        for(i = 0; i < 100000; i++) {
            common_aln.num_proc1++;
        }
/*
 * The routine below is called by the cloned thread, to increment the num_proc2 
 * element of common and common_aln structure in loop.
 */
int func1(struct shared_data_align *com)
{
    int i, j;
    /* Increment the value of num_proc2 in loop */
    for (j = 0; j < 200; j++)
        for (i = 0; i < 100000; i++) {
            com->num_proc2++;
        }

    /* Increment the value of num_proc2 in loop */
    for (j = 0; j < 200; j++)
        for (i = 0; i < 100000; i++) {
            common_aln.num_proc2++;
        }
}
I am not sure what it is telling me. I am especially concenred about pid. I know little about that.

Any help appreciated in gettiing this code to compile and run.

Thanks in advance.

Respectfully,


jyunker
 
Old 11-15-2013, 11:07 AM   #7
jyunker
Member
 
Registered: Aug 2009
Posts: 167

Original Poster
Rep: Reputation: 0
I did try to the save compile of the code. My question is what are the reasons for the

-DRESULT -static

in the command line. I had to

sudo yum install glibc-static

to make the error go away. So the question is why the static command?

I would also like to link the library: libsheriff_detect64.so

into the executble. So how do I do that?

Any help appreciated. thanks in advance.


Respectfully,

jyunker
 
Old 11-15-2013, 12:12 PM   #8
jyunker
Member
 
Registered: Aug 2009
Posts: 167

Original Poster
Rep: Reputation: 0
I modified a Makefile on the my PC so I could use it to
compile and link this program. The output is:

make
gcc -libsheriff_detect64.so progg.o -o progg
/usr/bin/ld: cannot find -libsheriff_detect64.so
collect2: ld returned 1 exit status
make: *** [progg] Error 1

Now I do not know what is wrong. I took the commands from the command line. The Makefile output is now shown.


CC=gcc
CFLAGS=-DRESULT -static -o
LDFLAGS= -libsheriff_detect64.so
SOURCES= progg.c

OBJECTS=$(SOURCES:.c=.o)
EXECUTABLE=progg

all: $(SOURCES) $(EXECUTABLE)

$(EXECUTABLE): $(OBJECTS)
$(CC) $(LDFLAGS) $(OBJECTS) -o $@

.cpp.o:
$(CC) $(CFLAGS) $< -o $@


I do not see anything wrong with it. It was taken from the message above above (not immediately above, I do not know what that is).
I got the error message shown.

Any help appreaciated. Thanks in advance

Respectfully,

jyunker
 
Old 11-15-2013, 12:50 PM   #9
jyunker
Member
 
Registered: Aug 2009
Posts: 167

Original Poster
Rep: Reputation: 0
Okay I made a minor change to my Makefile code.

The output is

make
gcc -m64 -libsheriff_detect64.so progg.o -o progg
/usr/bin/ld: cannot find -libsheriff_detect64.so
collect2: ld returned 1 exit status
make: *** [progg] Error 1

The Makefile is:

Code:
CC=gcc 
CFLAGS=-DRESULT -static -o -m64
LDFLAGS= -m64  -libsheriff_detect64.so
SOURCES= progg.c

OBJECTS=$(SOURCES:.c=.o)
EXECUTABLE=progg

all: $(SOURCES) $(EXECUTABLE)

$(EXECUTABLE):	$(OBJECTS)
	$(CC)	$(LDFLAGS)	$(OBJECTS)	-o $@

.cpp.o:
	$(CC)	$(CFLAGS)	$< -o $@
The change was simply putting -m64 in both CFLAGS and LDFLAGS.

However the compiler/linker still cannot find

libsheriff_detect64.so

even though its location is in the LD_LIBRARY_PATH?

What si going on here. Why can it not find

libsheriff_detect64.so

?

Any help appreciated. Thanks in advance.

Respectfully,

jyunker
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Compiling Issues:Compiling FFMPEG souce code on Visual Studio 2010 JAYANTHI Linux - Newbie 1 11-07-2013 11:41 PM
compiling oprofile error DKSL Linux - Software 1 09-06-2012 08:12 AM
Using oprofile on Centos Linux Unable to complete dump of oprofile data Frank36 Linux - Newbie 0 04-26-2011 12:42 PM
problem in compiling oprofile for arm namit2010 Linux - Software 1 06-18-2009 01:16 PM
Whats the best Graphics Card for best perfromance in Linux? CragStar General 4 06-19-2003 05:42 PM


All times are GMT -5. The time now is 09:05 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration