History | Log In     View a printable version of the current page.  
Hyperic HQ 3.2.5-EE Maintenance Release is Now Available
Issue Details (XML | Word | Printable)

Key: HHQ-1958
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Tom Keeney
Reporter: Kashyap Parikh
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hyperic HQ

Many runnable threads at ValueChangeTrigger.processEvent

Created: 10/Apr/08 04:18 PM   Updated: 16/Apr/08 03:28 PM
Component/s: Server
Affects Version/s: 3.2.2
Fix Version/s: 4.0.0, 3.2.3

File Attachments: 1. Text File HHQ_1958_patch.txt (11 kb)

Environment: 3.2.2 #650 with external mysql backend

Verify By: Kashyap Parikh
Last comment: 25 weeks, 2 days ago
Resolution Date: 15/Apr/08 03:38 PM


 Description  « Hide
On performance environment where I have lots of value change alert definitions (active) I am seeing lots of threads consistently in runnable state at following location.
Tom suggested that we should atleast log this measurement IDs in the debug log message. Need to investigate why are so many threads stuck at this location

"JMS SessionPool Worker-14" daemon prio=10 tid=0x00002aabc3b4ac00 nid=0x6937 runnable [0x0000000057685000..0x0000000057686d00]
   java.lang.Thread.State: RUNNABLE
        at org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent(ValueChangeTrigger.java:153)
        at org.hyperic.hq.bizapp.server.mdb.RegisteredDispatcherEJBImpl.dispatchEvent(RegisteredDispatcherEJBImpl.java:103)
        at org.hyperic.hq.bizapp.server.mdb.RegisteredDispatcherEJBImpl.onMessage(RegisteredDispatcherEJBImpl.java:175)
        at sun.reflect.GeneratedMethodAccessor532.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.jboss.invocation.Invocation.performCall(Invocation.java:345)
        at org.jboss.ejb.MessageDrivenContainer$ContainerInterceptor.invoke(MessageDrivenContainer.java:475)
        at org.jboss.resource.connectionmanager.CachedConnectionInterceptor.invoke(CachedConnectionInterceptor.java:149)
        at org.jboss.ejb.plugins.MessageDrivenInstanceInterceptor.invoke(MessageDrivenInstanceInterceptor.java:101)
        at org.jboss.ejb.plugins.CallValidationInterceptor.invoke(CallValidationInterceptor.java:48)
        at org.hyperic.hq.application.HQApp$Snatcher.invokeNextBoth(HQApp.java:416)
        at org.hyperic.hq.application.HQApp$Snatcher.invokeNext(HQApp.java:468)
        at org.hyperic.txsnatch.TxSnatch.invoke(TxSnatch.java:71)
        at org.jboss.ejb.plugins.AbstractTxInterceptor.invokeNext(AbstractTxInterceptor.java:106)
        at org.jboss.ejb.plugins.TxInterceptorCMT.runWithTransactions(TxInterceptorCMT.java:335)
        at org.jboss.ejb.plugins.TxInterceptorCMT.invoke(TxInterceptorCMT.java:166)
        at org.hyperic.hq.application.HQApp$Snatcher.invokeNextBoth(HQApp.java:416)
        at org.hyperic.hq.application.HQApp$Snatcher.invokeNext(HQApp.java:468)
        at org.hyperic.txsnatch.TxSnatch.invoke(TxSnatch.java:71)
        at org.jboss.ejb.plugins.RunAsSecurityInterceptor.invoke(RunAsSecurityInterceptor.java:94)
        at org.jboss.ejb.plugins.LogInterceptor.invoke(LogInterceptor.java:192)
        at org.jboss.ejb.plugins.ProxyFactoryFinderInterceptor.invoke(ProxyFactoryFinderInterceptor.java:122)
        at org.jboss.ejb.MessageDrivenContainer.internalInvoke(MessageDrivenContainer.java:389)
        at org.jboss.ejb.Container.invoke(Container.java:873)
        at org.jboss.ejb.plugins.jms.JMSContainerInvoker.invoke(JMSContainerInvoker.java:1077)
        at org.jboss.ejb.plugins.jms.JMSContainerInvoker$MessageListenerImpl.onMessage(JMSContainerInvoker.java:1379)
        at org.jboss.jms.asf.StdServerSession.onMessage(StdServerSession.java:256)
        at org.jboss.mq.SpyMessageConsumer.sessionConsumerProcessMessage(SpyMessageConsumer.java:904)
        at org.jboss.mq.SpyMessageConsumer.addMessage(SpyMessageConsumer.java:160)
        at org.jboss.mq.SpySession.run(SpySession.java:333)
        - locked <0x00002aab0cf664c0> (a java.util.LinkedList)
        at org.jboss.jms.asf.StdServerSession.run(StdServerSession.java:180)
        at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(PooledExecutor.java:748)
        at java.lang.Thread.run(Thread.java:619)


 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Kashyap Parikh - 11/Apr/08 08:49 AM
Below are the results from hprof anaylsis which was run for few hours, data points were dumped at random intervals (ranging from minutes to hours) by sending -QUIT to server process. HPROF CPU sampling was done using this java option -agentlib:hprof=cpu=samples,interval=20,depth=3

Here is one of the CPU sample result. See the thread at rank 2. Its consistently there in all dumps.

CPU SAMPLES BEGIN (total = 43438387) Thu Apr 10 22:27:01 2008
rank self accum count trace method
   1 77.80% 77.80% 33796621 300850 java.net.SocketInputStream.socketRead0
   2 11.07% 88.87% 4808744 301864 org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent
   3 7.05% 95.92% 3061288 300494 java.net.PlainSocketImpl.socketAccept
   4 1.33% 97.25% 577229 301179 java.lang.Throwable.fillInStackTrace
   5 0.52% 97.77% 224533 301036 java.net.SocketInputStream.socketRead0
   6 0.33% 98.09% 141721 301776 java.lang.Object.clone
   7 0.29% 98.39% 127687 301314 java.lang.Class.getInterfaces
   8 0.21% 98.59% 89698 301869 java.lang.Object.getClass
   9 0.20% 98.80% 87938 301886 java.lang.Object.getClass
  10 0.20% 98.99% 85984 302322 org.hyperic.hq.bizapp.server.mdb.RegisteredDispatcherEJBImpl.dispatchEvent
  11 0.16% 99.15% 67974 300993 java.net.SocketInputStream.socketRead0
  12 0.12% 99.27% 51508 301887 java.lang.Class.isInterface
  13 0.04% 99.31% 18576 301944 java.lang.Thread.setPriority0
  14 0.04% 99.35% 17226 302857 java.lang.reflect.Method.invoke
  15 0.04% 99.39% 16927 301315 java.lang.Class.getInterfaces
  16 0.04% 99.43% 15654 301749 java.lang.Object.hashCode
  17 0.03% 99.46% 14755 301285 java.lang.System.identityHashCode
  18 0.03% 99.49% 14319 301799 java.lang.Thread.start0
  19 0.03% 99.52% 12446 302642 org.hibernate.impl.SessionImpl.checkTransactionSynchStatus
  20 0.03% 99.55% 12192 302038 java.lang.Class.isPrimitive
  21 0.02% 99.57% 7043 301863 java.security.AccessController.doPrivileged
  22 0.01% 99.58% 6054 301904 java.lang.Object.hashCode
  23 0.01% 99.59% 5688 302647 org.hibernate.type.AbstractType.getHashCode
  24 0.01% 99.61% 5676 301802 java.lang.Object.clone
  25 0.01% 99.62% 4904 301857 java.security.AccessController.doPrivileged
  26 0.01% 99.63% 4434 301297 java.util.HashMap.transfer

See all of them below

[workerb@pserv01 bin]$ grep ValueChangeTrigger.processEvent java.hprof.txt
        org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent(ValueChangeTrigger.java:153)
   2 10.86% 88.16% 505515 301864 org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent
   2 12.22% 87.88% 44123 301864 org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent
   2 11.91% 87.79% 310913 301864 org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent
   2 11.84% 86.27% 18337 301864 org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent
   2 11.42% 87.59% 364384 301864 org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent
   2 11.07% 88.87% 4808744 301864 org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent
   2 11.14% 86.04% 153132 301864 org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent
[workerb@pserv01 bin]$


Kashyap Parikh - 11/Apr/08 02:01 PM
Tom and I researched this issue and noticed following:

ValueChangeTrigger is getting events with measurement id 0 (see following debug message).

2008-04-11 15:15:27,107 DEBUG [JMS SessionPool Worker-5] [org.hyperic.hq.bizapp.server.mdb.RegisteredDispatcherEJBImpl] dispatchEvent dispatched to trigger (class org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger that's not configured to handle this type of event: class org.hyperic.hq.measurement.ext.MeasurementEventMESSAGE: Invalid instance ID passed (409455) expected 0

After putting debug message to print measurement id in ValueChangeTrigger.init we see that all values are non-zero, so somewhere the config response is getting corrupted, setting the value to 0.

Ryan Morgan - 11/Apr/08 02:17 PM
Assign to Tom, mark for investigation in 3.2.3.

Tom Keeney - 15/Apr/08 03:24 PM
Patch file addressing the issue in question.

Tom Keeney - 15/Apr/08 03:38 PM
When a resource type alert definition is created and there is no measurement associated with a measurement template, the measurement id for the value change trigger is set to 0 (See EventsBoss.cloneParentConditions()). When this happens, the value change trigger registers interest in ALL events, but on processing throws an exception. This continual exception throwing causes the performance bottleneck.

The fix is to not create the alert definition for the resource if there is no measurement associated with the measurement template. When this happens, we are logging an error message in the server log stating that the alert definition was not created for that resource.

This problem can occur with customers upgrading from old versions of HQ where the measurements assigned to a resource are not consistent with the current templates. This problem has been resolved on upgrading to 4.0, but any resource type alert definitions exhibiting the performance bottleneck will need to be recreated from scratch.

QA should test this fix by removing a measurement from an individual resource. This would involve deleting the measurement from the eam_measurement table given the entity instance_id and measurement template id. Then create a resource type alert definition that would include the removed measurement. A log message should appear in the server log stating that the alert definition was not created. Also, the alert definition should not appear under the resource. Alert definitions for all other resources should be created. The behavior when creating a resource alert definition on that resource should also be checked. It shouldn't be possible to create an alert definition on the deleted measurement(s).

Kashyap Parikh - 16/Apr/08 03:28 PM
Fix looks good for both functionality and performance.

CPU samples after creating 30,000 value change alerts looks good

CPU SAMPLES BEGIN (total = 113608341) Wed Apr 16 17:13:31 2008
rank self accum count trace method
   1 54.93% 54.93% 62407323 300914 java.net.SocketInputStream.socketRead0
   2 40.26% 95.19% 45733039 300856 java.net.SocketInputStream.socketRead0
   3 3.79% 98.98% 4309713 300501 java.net.PlainSocketImpl.socketAccept
   4 0.29% 99.27% 327133 301303 java.net.SocketInputStream.socketRead0
   5 0.22% 99.49% 255309 301413 java.lang.Class.getInterfaces
   6 0.21% 99.71% 242052 301362 java.net.SocketInputStream.socketRead0
   7 0.04% 99.75% 44513 301944 java.lang.Object.clone
   8 0.02% 99.77% 25964 301366 java.lang.System.identityHashCode
   9 0.01% 99.78% 15170 301976 java.lang.Object.hashCode
  10 0.01% 99.79% 12226 302006 java.security.AccessController.doPrivileged

I did see few hundred alert not created log messages because of missing measurement. Verified that alert was not created for all of these resources and it got created after reimporting the resource again.

2008-04-16 16:37:32,537 ERROR [http-0.0.0.0-7080-310] [org.hyperic.hq.bizapp.server.session.EventsBossEJBImpl] No measurement found for entity 3:25913 associated with template id=11937. Alert definition name [File Server File Size Changed]