Support Migration Notice: To update migrated JIRA cases click here to open a new case use www.vmware.com/go/sr | vFabric Hyperic 5.7.0 is Now Available

Sigar

SIGAR native error during call to ProcExe.gather() causes JVM crash

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 1.4.0
  • Fix Version/s: 1.5.0
  • Component/s: None
  • Case Links:
    none

Description

I have seen this cause the JON 2.0 agent JVM to crash 4 or 5 times now in the past couple days. hs_err file is attached.

Issue Links

Activity

Hide
Doug MacEachern added a comment -

Can you try reproducing on the command-line like so:

java -jar sigar.jar pfile State.Name.re=.*

Or replace State.Name.re=.* w/ the exact pid if you have it.
'pfile' calls Sigar.getProcExe

getProc{Exe,Args,Env} are implemented using the PEB (Process Environment Block), it is possible SIGAR-66 fixes the problem. This hasn't been backported to 1.4, but you can try the same command-line with the current 1.5 binaries:
svn co http://svn.hyperic.org/projects/sigar/dist/SIGAR_1_5

You can also try the 1.5 sigar.jar and sigar-x86-winnt.dll with the JON agent, it is compatible with the 1.4 APIs.

Show
Doug MacEachern added a comment - Can you try reproducing on the command-line like so: java -jar sigar.jar pfile State.Name.re=.* Or replace State.Name.re=.* w/ the exact pid if you have it. 'pfile' calls Sigar.getProcExe getProc{Exe,Args,Env} are implemented using the PEB (Process Environment Block), it is possible SIGAR-66 fixes the problem. This hasn't been backported to 1.4, but you can try the same command-line with the current 1.5 binaries: svn co http://svn.hyperic.org/projects/sigar/dist/SIGAR_1_5 You can also try the 1.5 sigar.jar and sigar-x86-winnt.dll with the JON agent, it is compatible with the 1.4 APIs.
Hide
Ian Springer added a comment -

Hi Doug,

My agent just crashed again, but this time I had added some debug logging to tell me which process it was calling getProcExe() on right before it crashed.

It ended up being a program called csrss.exe:

C:\Projects\jon-trunk\modules\enterprise\gui\portal-war>pslist | grep csrss

pslist v1.28 - Sysinternals PsList
Copyright ⌐ 2000-2004 Mark Russinovich
Sysinternals

csrss 1368 13 18 1504 18636 0:06:49.593 584:37:21.796

More info from SysInternals Process Explorer):

Description: Client Server Runtime Process
Vendor: Microsoft Corporation
Command Line: C:\WINDOWS\system32\csrss.exe ObjectDirectory=\Windows SharedSection=1024,3072,512 Windows=On SubSystemType=Windows ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=winsrv:ConServerDllInitialization,2 ProfileControl=Off MaxRequestThreads=16
(even more details attached as csrss.png)

And, finally, from Sigar:

C:\Projects\jon-trunk\modules\enterprise\agent\target\dist\lib>java -jar sigar.jar pfile 1368
pid=1368
open file descriptors=1481
name=♥??
cwd=????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
???????????????????????

------------------------

The funky heart character and all the question marks suggest to me that there is some sort of buffer overflow happening in the Sigar native code, though I don't know what's special about this particular process that's causing it...

Note, the JON agent usually runs for several hours before crashing.

Let me know if there's any other info about this process that you need (e.g. its environment).

Regards,
Ian

Show
Ian Springer added a comment - Hi Doug, My agent just crashed again, but this time I had added some debug logging to tell me which process it was calling getProcExe() on right before it crashed. It ended up being a program called csrss.exe: C:\Projects\jon-trunk\modules\enterprise\gui\portal-war>pslist | grep csrss pslist v1.28 - Sysinternals PsList Copyright ⌐ 2000-2004 Mark Russinovich Sysinternals csrss 1368 13 18 1504 18636 0:06:49.593 584:37:21.796 More info from SysInternals Process Explorer): Description: Client Server Runtime Process Vendor: Microsoft Corporation Command Line: C:\WINDOWS\system32\csrss.exe ObjectDirectory=\Windows SharedSection=1024,3072,512 Windows=On SubSystemType=Windows ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=winsrv:ConServerDllInitialization,2 ProfileControl=Off MaxRequestThreads=16 (even more details attached as csrss.png) And, finally, from Sigar: C:\Projects\jon-trunk\modules\enterprise\agent\target\dist\lib>java -jar sigar.jar pfile 1368 pid=1368 open file descriptors=1481 name=♥?? cwd=???????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ??????????????????????? ------------------------ The funky heart character and all the question marks suggest to me that there is some sort of buffer overflow happening in the Sigar native code, though I don't know what's special about this particular process that's causing it... Note, the JON agent usually runs for several hours before crashing. Let me know if there's any other info about this process that you need (e.g. its environment). Regards, Ian
Hide
Doug MacEachern added a comment -

Did you have a chance to try with the 1.5 sigar binaries?

Show
Doug MacEachern added a comment - Did you have a chance to try with the 1.5 sigar binaries?
Hide
Ian Springer added a comment -

No, but I will try that tomorrow, and let the agent run over the weekend. If it all goes all weekend without crashing, then chances are 1.5 has fixed the bug.

Is 1.5 a final release? I noticed it's tagged in your SVN, but the latest distribution on sourceforge is 1.4.

Did the information from my last comment give you any idea what the cause of the crash is?

Show
Ian Springer added a comment - No, but I will try that tomorrow, and let the agent run over the weekend. If it all goes all weekend without crashing, then chances are 1.5 has fixed the bug. Is 1.5 a final release? I noticed it's tagged in your SVN, but the latest distribution on sourceforge is 1.4. Did the information from my last comment give you any idea what the cause of the crash is?
Hide
Doug MacEachern added a comment -

Hi Ian,
1.5 isn't quite final, planning to make it so in January. Yes, the pfile output in your last comment confirms the troubled area is around the PEB access. Comparing the pfile output with the 1.5 binaries would also be interesting.

Show
Doug MacEachern added a comment - Hi Ian, 1.5 isn't quite final, planning to make it so in January. Yes, the pfile output in your last comment confirms the troubled area is around the PEB access. Comparing the pfile output with the 1.5 binaries would also be interesting.
Hide
Ian Springer added a comment -

Thanks for the info on 1.5. I'm glad you have some idea where the problem lies. I'll let you know on Monday whether my agent lasts through the weekend with 1.5.

Show
Ian Springer added a comment - Thanks for the info on 1.5. I'm glad you have some idea where the problem lies. I'll let you know on Monday whether my agent lasts through the weekend with 1.5.
Hide
Doug MacEachern added a comment -

Ian, any luck with this using sigar 1.5?

Show
Doug MacEachern added a comment - Ian, any luck with this using sigar 1.5?
Hide
Ian Springer added a comment -

My agent hasn't crashed since we upgraded to Sigar 1.5.0.1 over two weeks ago, so I suspect the problem's been fixed.

Show
Ian Springer added a comment - My agent hasn't crashed since we upgraded to Sigar 1.5.0.1 over two weeks ago, so I suspect the problem's been fixed.
Hide
Doug MacEachern added a comment -

The changes made for SIGAR-66 were targeted at Vista support, but the general improvements there applied to all windows flavors, certain that's what fixed this issue.

Show
Doug MacEachern added a comment - The changes made for SIGAR-66 were targeted at Vista support, but the general improvements there applied to all windows flavors, certain that's what fixed this issue.

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved:
    Last comment:
    6 years, 8 weeks, 4 days ago