LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 11-18-2022, 03:56 AM   #1
chiendarret
Member
 
Registered: Mar 2007
Posts: 307

Rep: Reputation: 16
CUDA error cudaStreamSynchronize(stream) and CUDA error in ComputeBondedCUDA


Hello
I am here again for a problem that I presented months ago. Now I have tried to isolate each GPU, as illustrated below. The problem remains and I had no clues from the user-portal of the software (NAMD) used for these molecular dynamics (MD) simulations.

My computer main board GA-X79-UD3 with two 680 GPUs and

Debian10 Linux,
$ uname -r
5.10.0-19-amd64


ADDENDUM
After that I cleaned the inside of the computer from the little dust, removed both GPUS and discarded the one that was above, setting at its place the one that was initially below. With only this GPU, same error. On the other hand, it was already clear that this is a software error.
Thanks for considering this issue
fp
CUDA driver version: 470.141.03 CUDA Version: 11.4

Software for MD: NAMD_Git-2022-07-21_Linux-x86_64-multicore-CUDA

can't any more run namd-CUDA using the same commands that were OK one month ago. In the meantime, new Linux kernels and CUDA versions did not solve the issue.

How MDs are launched:

Preceded by:
nvidia-smi -pm 1 to make GPUs persistent

Error using both CPUs:

command to run MD: namd2 +idlepoll +p12 +devices 0,1 min.conf
reported error:

Quote:
TCL: Minimizing for 3000 steps
FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file src/CudaTileListKernel.cu, function buildTileLists, line 1136
on Pe 4 (gig64 device 0 pci 0:2:0): an illegal memory access was encountered
FATAL ERROR: CUDA error in ComputeBondedCUDA::forceDoneCheck after polling 48 times over 0.005047 s on Pe 8 (gig64 device 1 pci 0:3:0): an illegal memory access was encountered
FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file src/CudaTileListKernel.cu, function buildTileLists, line 1136
on Pe 4 (gig64 device 0 pci 0:2:0): an illegal memory access was encountered
FATAL ERROR: CUDA error in ComputeBondedCUDA::forceDoneCheck after polling 48 times over 0.005047 s on Pe 8 (gig64 device 1 pci 0:3:0): an illegal memory access was encountered
[Partition 0][Node 0] End of program
Error using GPU 0:

namd2 +idlepoll +p12 +devices 0 min.conf

Quote:
TCL: Minimizing for 3000 steps
FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file src/CudaTileListKernel.cu, function sortTileLists, line 1577
on Pe 8 (gig64 device 0 pci 0:2:0): an illegal memory access was encountered
FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file src/CudaTileListKernel.cu, function sortTileLists, line 1577
on Pe 8 (gig64 device 0 pci 0:2:0): an illegal memory access was encountered
[Partition 0][Node 0] End of program
FATAL ERROR: CUDA error in ComputeBondedCUDA::forceDoneCheck after polling 673 times over 0.077770 s on Pe 8 (gig64 device 0 pci 0:2:0): an illegal memory access was encountered
FATAL ERROR: CUDA error in ComputeBondedCUDA::forceDoneCheck after polling 673 times over 0.077770 s on Pe 8 (gig64 device 0 pci 0:2:0): an illegal memory access was encountered
Error using GPU 1:

namd2 +idlepoll +p12 +devices 1 min.conf

Quote:
TCL: Minimizing for 3000 steps
FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file src/CudaTileListKernel.cu, function sortTileLists, line 1577
on Pe 8 (gig64 device 1 pci 0:3:0): an illegal memory access was encountered
FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file src/CudaTileListKernel.cu, function sortTileLists, line 1577
on Pe 8 (gig64 device 1 pci 0:3:0): an illegal memory access was encountered
[Partition 0][Node 0] End of program
FATAL ERROR: CUDA error in ComputeBondedCUDA::forceDoneCheck after polling 671 times over 0.077836 s on Pe 8 (gig64 device 1 pci 0:3:0): an illegal memory access was encountered
FATAL ERROR: CUDA error in ComputeBondedCUDA::forceDoneCheck after polling 671 times over 0.077836 s on Pe 8 (gig64 device 1 pci 0:3:0): an illegal memory access was encountered
Any GPU hardware failure (memory) seems to me unlikely because both GPUs report the same error.
However, I was unable to trace the origin of the error.
Thanks for advice
francesco pietra

Last edited by chiendarret; 11-20-2022 at 10:31 AM. Reason: Addendum
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
gstreamer stream when using jpeg doesn't fill screen where h264 stream does vdemuth Linux - Software 0 11-30-2017 06:48 PM
Icecast stream a stream R03L Linux - Software 1 07-17-2009 06:37 PM
program to stream the stream (or maybe streaming proxy?) jimmykarily Linux - Software 1 05-13-2009 04:35 AM
How to record a stream and start a new outputXXX.avi/mp3 for each new stream title ? frenchn00b Linux - General 4 08-04-2008 05:40 AM
Howto transcode & relay a MPEG stream to a WMV stream?? crazyivan Linux - Software 0 06-15-2007 03:18 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 09:44 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration