|
|
|
Tom and I researched this issue and noticed following:
ValueChangeTrigger is getting events with measurement id 0 (see following debug message). 2008-04-11 15:15:27,107 DEBUG [JMS SessionPool Worker-5] [org.hyperic.hq.bizapp.server.mdb.RegisteredDispatcherEJBImpl] dispatchEvent dispatched to trigger (class org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger that's not configured to handle this type of event: class org.hyperic.hq.measurement.ext.MeasurementEventMESSAGE: Invalid instance ID passed (409455) expected 0 After putting debug message to print measurement id in ValueChangeTrigger.init we see that all values are non-zero, so somewhere the config response is getting corrupted, setting the value to 0. Assign to Tom, mark for investigation in 3.2.3.
When a resource type alert definition is created and there is no measurement associated with a measurement template, the measurement id for the value change trigger is set to 0 (See EventsBoss.cloneParentConditions()). When this happens, the value change trigger registers interest in ALL events, but on processing throws an exception. This continual exception throwing causes the performance bottleneck.
The fix is to not create the alert definition for the resource if there is no measurement associated with the measurement template. When this happens, we are logging an error message in the server log stating that the alert definition was not created for that resource. This problem can occur with customers upgrading from old versions of HQ where the measurements assigned to a resource are not consistent with the current templates. This problem has been resolved on upgrading to 4.0, but any resource type alert definitions exhibiting the performance bottleneck will need to be recreated from scratch. QA should test this fix by removing a measurement from an individual resource. This would involve deleting the measurement from the eam_measurement table given the entity instance_id and measurement template id. Then create a resource type alert definition that would include the removed measurement. A log message should appear in the server log stating that the alert definition was not created. Also, the alert definition should not appear under the resource. Alert definitions for all other resources should be created. The behavior when creating a resource alert definition on that resource should also be checked. It shouldn't be possible to create an alert definition on the deleted measurement(s). Fix looks good for both functionality and performance.
CPU samples after creating 30,000 value change alerts looks good CPU SAMPLES BEGIN (total = 113608341) Wed Apr 16 17:13:31 2008 rank self accum count trace method 1 54.93% 54.93% 62407323 300914 java.net.SocketInputStream.socketRead0 2 40.26% 95.19% 45733039 300856 java.net.SocketInputStream.socketRead0 3 3.79% 98.98% 4309713 300501 java.net.PlainSocketImpl.socketAccept 4 0.29% 99.27% 327133 301303 java.net.SocketInputStream.socketRead0 5 0.22% 99.49% 255309 301413 java.lang.Class.getInterfaces 6 0.21% 99.71% 242052 301362 java.net.SocketInputStream.socketRead0 7 0.04% 99.75% 44513 301944 java.lang.Object.clone 8 0.02% 99.77% 25964 301366 java.lang.System.identityHashCode 9 0.01% 99.78% 15170 301976 java.lang.Object.hashCode 10 0.01% 99.79% 12226 302006 java.security.AccessController.doPrivileged I did see few hundred alert not created log messages because of missing measurement. Verified that alert was not created for all of these resources and it got created after reimporting the resource again. 2008-04-16 16:37:32,537 ERROR [http-0.0.0.0-7080-310] [org.hyperic.hq.bizapp.server.session.EventsBossEJBImpl] No measurement found for entity 3:25913 associated with template id=11937. Alert definition name [File Server File Size Changed] |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Here is one of the CPU sample result. See the thread at rank 2. Its consistently there in all dumps.
CPU SAMPLES BEGIN (total = 43438387) Thu Apr 10 22:27:01 2008
rank self accum count trace method
1 77.80% 77.80% 33796621 300850 java.net.SocketInputStream.socketRead0
2 11.07% 88.87% 4808744 301864 org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent
3 7.05% 95.92% 3061288 300494 java.net.PlainSocketImpl.socketAccept
4 1.33% 97.25% 577229 301179 java.lang.Throwable.fillInStackTrace
5 0.52% 97.77% 224533 301036 java.net.SocketInputStream.socketRead0
6 0.33% 98.09% 141721 301776 java.lang.Object.clone
7 0.29% 98.39% 127687 301314 java.lang.Class.getInterfaces
8 0.21% 98.59% 89698 301869 java.lang.Object.getClass
9 0.20% 98.80% 87938 301886 java.lang.Object.getClass
10 0.20% 98.99% 85984 302322 org.hyperic.hq.bizapp.server.mdb.RegisteredDispatcherEJBImpl.dispatchEvent
11 0.16% 99.15% 67974 300993 java.net.SocketInputStream.socketRead0
12 0.12% 99.27% 51508 301887 java.lang.Class.isInterface
13 0.04% 99.31% 18576 301944 java.lang.Thread.setPriority0
14 0.04% 99.35% 17226 302857 java.lang.reflect.Method.invoke
15 0.04% 99.39% 16927 301315 java.lang.Class.getInterfaces
16 0.04% 99.43% 15654 301749 java.lang.Object.hashCode
17 0.03% 99.46% 14755 301285 java.lang.System.identityHashCode
18 0.03% 99.49% 14319 301799 java.lang.Thread.start0
19 0.03% 99.52% 12446 302642 org.hibernate.impl.SessionImpl.checkTransactionSynchStatus
20 0.03% 99.55% 12192 302038 java.lang.Class.isPrimitive
21 0.02% 99.57% 7043 301863 java.security.AccessController.doPrivileged
22 0.01% 99.58% 6054 301904 java.lang.Object.hashCode
23 0.01% 99.59% 5688 302647 org.hibernate.type.AbstractType.getHashCode
24 0.01% 99.61% 5676 301802 java.lang.Object.clone
25 0.01% 99.62% 4904 301857 java.security.AccessController.doPrivileged
26 0.01% 99.63% 4434 301297 java.util.HashMap.transfer
See all of them below
[workerb@pserv01 bin]$ grep ValueChangeTrigger.processEvent java.hprof.txt
org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent(ValueChangeTrigger.java:153)
2 10.86% 88.16% 505515 301864 org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent
2 12.22% 87.88% 44123 301864 org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent
2 11.91% 87.79% 310913 301864 org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent
2 11.84% 86.27% 18337 301864 org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent
2 11.42% 87.59% 364384 301864 org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent
2 11.07% 88.87% 4808744 301864 org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent
2 11.14% 86.04% 153132 301864 org.hyperic.hq.bizapp.server.trigger.conditional.ValueChangeTrigger.processEvent
[workerb@pserv01 bin]$