Support Migration Notice: To update migrated JIRA cases click here to open a new case use www.vmware.com/go/sr | vFabric Hyperic 5.7.0 is Now Available

Hyperic HQ

Approving 25 agents in AIQ leaves some agents with no schedule and unavailable in HQ

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Cannot Reproduce
  • Affects Version/s: 4.2.0
  • Fix Version/s: None
  • Component/s: Deprecated: Server
  • Environment:
    4.2.0 #1191 agent and server

Description

To reproduce:

Bring up 25 platforms
Approve them as follows
hqapi.sh autodiscovery approve
This approves all agents and puts them in the queue
After importing a bunch servers/services following error is logged server.log. Same error is repeated 3 more times. See attached log for full stack.
After the background approval process seems be complete 4 platforms are not reporting any measurements and appear down in HQ.

2009-08-25 00:52:15,255 ERROR [MeasurementEnabler115] [org.hibernate.util.JDBCExceptionReporter@78] Lock wait timeout exceeded; try restarting transaction
2009-08-25 00:52:15,256 ERROR [MeasurementEnabler115] [org.hibernate.event.def.AbstractFlushingEventListener@301] Could not synchronize database state with session
org.hibernate.exception.GenericJDBCException: Could not execute JDBC batch update
at org.hibernate.exception.SQLStateConverter.handledNonSpecificException(SQLStateConverter.java:103)
at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:91)
at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:253)
at org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:92)
at org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:87)
at org.hibernate.jdbc.AbstractBatcher.prepareBatchStatement(AbstractBatcher.java:222)
at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2229)
at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2665)
at org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:60)
at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:279)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:263)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:167)
at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298)
at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000)
at org.hyperic.hq.measurement.server.session.MeasurementManagerEJBImpl.createMeasurements(MeasurementManagerEJBImpl.java:244)
at sun.reflect.GeneratedMethodAccessor5832.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.jboss.invocation.Invocation.performCall(Invocation.java:359)
at org.jboss.ejb.StatelessSessionContainer$ContainerInterceptor.invoke(StatelessSessionContainer.java:237)
at org.jboss.resource.connectionmanager.CachedConnectionInterceptor.invoke(CachedConnectionInterceptor.java:158)
at org.jboss.ejb.plugins.StatelessSessionInstanceInterceptor.invoke(StatelessSessionInstanceInterceptor.java:169)
at org.jboss.ejb.plugins.CallValidationInterceptor.invoke(CallValidationInterceptor.java:63)
at org.hyperic.hq.application.HQApp$Snatcher.invokeNextBoth(HQApp.java:526)
at org.hyperic.hq.application.HQApp$Snatcher.invokeNext(HQApp.java:598)
at org.hyperic.txsnatch.TxSnatch.invoke(TxSnatch.java:71)
at org.jboss.ejb.plugins.AbstractTxInterceptor.invokeNext(AbstractTxInterceptor.java:121)
at org.jboss.ejb.plugins.TxInterceptorCMT.runWithTransactions(TxInterceptorCMT.java:404)
at org.jboss.ejb.plugins.TxInterceptorCMT.invoke(TxInterceptorCMT.java:181)
at org.hyperic.hq.application.HQApp$Snatcher.invokeNextBoth(HQApp.java:526)
at org.hyperic.hq.application.HQApp$Snatcher.invokeNext(HQApp.java:598)
at org.hyperic.txsnatch.TxSnatch.invoke(TxSnatch.java:71)
at org.jboss.ejb.plugins.SecurityInterceptor.invoke(SecurityInterceptor.java:168)
at org.jboss.ejb.plugins.LogInterceptor.invoke(LogInterceptor.java:205)
at org.jboss.ejb.plugins.ProxyFactoryFinderInterceptor.invoke(ProxyFactoryFinderInterceptor.java:138)
at org.jboss.ejb.SessionContainer.internalInvoke(SessionContainer.java:648)
at org.jboss.ejb.Container.invoke(Container.java:960)
at org.jboss.ejb.plugins.local.BaseLocalProxyFactory.invoke(BaseLocalProxyFactory.java:430)

  1. server.log.snapshot
    09/Sep/09 10:05 AM
    438 kB
    Todd Rader
  2. server.log.temp.gz
    25/Aug/09 1:13 AM
    88 kB
    Kashyap Parikh

Activity

Hide
Kashyap Parikh added a comment -

Rresubmitting configuration.properties from HQ Platform's inventory page doesn't fix unavailable platforms. They remain unavailable 15 minutes after config properties were resubmitted..

Show
Kashyap Parikh added a comment - Rresubmitting configuration.properties from HQ Platform's inventory page doesn't fix unavailable platforms. They remain unavailable 15 minutes after config properties were resubmitted..
Hide
Todd Rader added a comment -

I'd bet the trouble comes from MeasurementManager.EJBImpl.createMeasurements(AppdefEntityID, Integer[], long[], ConfigResponse) being marked as "RequiresNew" AND flushing the session explicitly. We already know that "RequiresNew" transactions do NOT get a new Session from our SessionManager.

Show
Todd Rader added a comment - I'd bet the trouble comes from MeasurementManager.EJBImpl.createMeasurements(AppdefEntityID, Integer[], long[], ConfigResponse) being marked as "RequiresNew" AND flushing the session explicitly. We already know that "RequiresNew" transactions do NOT get a new Session from our SessionManager.
Hide
Kashyap Parikh added a comment -

Its strange, now I am able to reproduce this issue when approving second agent in HQ server. Approval of first agent with identical config went fine. Based on the fact that we can reproduce this bug in 3 different environments (upgrade and fresh install), approved agent does not collect metrics/availability of almost all servers/services and this happens while approving only 1 agent from AD portlet I am changing Fix Version to 4.2.0.

Show
Kashyap Parikh added a comment - Its strange, now I am able to reproduce this issue when approving second agent in HQ server. Approval of first agent with identical config went fine. Based on the fact that we can reproduce this bug in 3 different environments (upgrade and fresh install), approved agent does not collect metrics/availability of almost all servers/services and this happens while approving only 1 agent from AD portlet I am changing Fix Version to 4.2.0.
Hide
Todd Rader added a comment -

server.log.snapshot is the log from our QA envoronment gsx1. There is a trace that matches what is seen in the original log, plus another where the transaction gets into a bad state from the processing of what looks like a recovery alert. At this time it's unclear what the relationship is between the two traces.

Show
Todd Rader added a comment - server.log.snapshot is the log from our QA envoronment gsx1. There is a trace that matches what is seen in the original log, plus another where the transaction gets into a bad state from the processing of what looks like a recovery alert. At this time it's unclear what the relationship is between the two traces.
Hide
Todd Rader added a comment -

Resolved the use case of importing one agent, configuring a resource type alert, and then importing another. This was failing reliably. The fix applied to MeasurementManagerEJBImpl.createMeasurements() – getting rid of "RequiresNew" transaction demarcation and an explicit session flush, plus not going through the EJB container on intra-EJB calls (i.e. get rid of the getOne() pattern) looks applicable to the original use case. Dev tested, also tested by Kashyap.

Show
Todd Rader added a comment - Resolved the use case of importing one agent, configuring a resource type alert, and then importing another. This was failing reliably. The fix applied to MeasurementManagerEJBImpl.createMeasurements() – getting rid of "RequiresNew" transaction demarcation and an explicit session flush, plus not going through the EJB container on intra-EJB calls (i.e. get rid of the getOne() pattern) looks applicable to the original use case. Dev tested, also tested by Kashyap.
Hide
Kashyap Parikh added a comment -

Reopening this bug for the original issue in the description. We have not yet addressed approval of 25 or more agents in parallel. What we fixed was regression issue described in this comment -> http://jira.hyperic.com/browse/HHQ-3364?focusedCommentId=83579#action_83579

Show
Kashyap Parikh added a comment - Reopening this bug for the original issue in the description. We have not yet addressed approval of 25 or more agents in parallel. What we fixed was regression issue described in this comment -> http://jira.hyperic.com/browse/HHQ-3364?focusedCommentId=83579#action_83579
Hide
David Wiener added a comment -

Closed due to being outdated

Show
David Wiener added a comment - Closed due to being outdated

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved:
    Last comment:
    1 year, 36 weeks, 4 days ago