Support Migration Notice: To update migrated JIRA cases click here to open a new case use www.vmware.com/go/sr | vFabric Hyperic 5.7.0 is Now Available

Hyperic HQ

purging alerts fails at scale

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Deferred
  • Affects Version/s: 4.5
  • Fix Version/s: None
  • Component/s: Alerts
  • Case Links:
    none
  • Regression:
    Yes
  • Story Points:
    3
  • Tags:

Description

2010-08-20 17:10:27,820 INFO [Scheduler-7] [org.hyperic.hq.measurement.server.session.DataCompressImpl@297] Purging alerts older than 8/19/10 5:00:00 PM
2010-08-20 17:26:17,150 ERROR [Scheduler-7] [org.hyperic.hq.measurement.DataPurgeJob@101] transaction timeout expired
org.hibernate.TransactionException: transaction timeout expired
at org.hibernate.jdbc.AbstractBatcher.setTimeout(AbstractBatcher.java:523)
at org.hibernate.jdbc.AbstractBatcher.getPreparedStatement(AbstractBatcher.java:509)
at org.hibernate.jdbc.AbstractBatcher.getPreparedStatement(AbstractBatcher.java:423)
at org.hibernate.jdbc.AbstractBatcher.prepareQueryStatement(AbstractBatcher.java:139)
at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1547)
at org.hibernate.loader.Loader.doQuery(Loader.java:673)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
at org.hibernate.loader.Loader.loadCollectionBatch(Loader.java:2032)
at org.hibernate.loader.collection.BatchingCollectionInitializer.initialize(BatchingCollectionInitializer.java:47)
at org.hibernate.persister.collection.AbstractCollectionPersister.initialize(AbstractCollectionPersister.java:565)
at org.hibernate.event.def.DefaultInitializeCollectionEventListener.onInitializeCollection(DefaultInitializeCollectionEventListener.java:63)
at org.hibernate.impl.SessionImpl.initializeCollection(SessionImpl.java:1716)
at org.hibernate.collection.AbstractPersistentCollection.initialize(AbstractPersistentCollection.java:344)
at org.hibernate.collection.AbstractPersistentCollection.read(AbstractPersistentCollection.java:86)
at org.hibernate.collection.PersistentBag.iterator(PersistentBag.java:249)
at org.hibernate.type.CollectionType.getElementsIterator(CollectionType.java:213)
at org.hibernate.type.CollectionType.getElementsIterator(CollectionType.java:202)
at org.hibernate.engine.CascadingAction.getAllElementsIterator(CascadingAction.java:415)
at org.hibernate.engine.CascadingAction.access$100(CascadingAction.java:27)
at org.hibernate.engine.CascadingAction$1.getCascadableChildrenIterator(CascadingAction.java:125)
at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:294)
at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
at org.hibernate.event.def.DefaultDeleteEventListener.cascadeBeforeDelete(DefaultDeleteEventListener.java:307)
at org.hibernate.event.def.DefaultDeleteEventListener.deleteEntity(DefaultDeleteEventListener.java:246)
at org.hibernate.event.def.DefaultDeleteEventListener.onDelete(DefaultDeleteEventListener.java:141)
at org.hibernate.event.def.DefaultDeleteEventListener.onDelete(DefaultDeleteEventListener.java:52)
at org.hibernate.impl.SessionImpl.fireDelete(SessionImpl.java:766)
at org.hibernate.impl.SessionImpl.delete(SessionImpl.java:744)
at org.hyperic.hq.dao.HibernateDAO.remove(HibernateDAO.java:146)
at org.hyperic.hq.events.server.session.AlertDAO.deleteByCreateTime(AlertDAO.java:93)
at org.hyperic.hq.events.server.session.AlertManagerImpl.deleteAlerts(AlertManagerImpl.java:222)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:80)
at org.hyperic.hq.monitor.aop.aspects.PerformanceMonitor.monitorServiceMethod(PerformanceMonitor.java:74)
at sun.reflect.GeneratedMethodAccessor287.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:622)
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:611)
at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:65)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:108)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:89)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
at $Proxy73.deleteAlerts(Unknown Source)
at org.hyperic.hq.measurement.server.session.DataCompressImpl.compressData(DataCompressImpl.java:298)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:80)
at org.hyperic.hq.monitor.aop.aspects.PerformanceMonitor.monitorServiceMethod(PerformanceMonitor.java:74)
at sun.reflect.GeneratedMethodAccessor287.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:622)
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:611)
at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:65)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:108)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:89)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
at $Proxy81.compressData(Unknown Source)
at org.hyperic.hq.measurement.DataPurgeJob.compressData(DataPurgeJob.java:130)
at org.hyperic.hq.measurement.DataPurgeJob.run(DataPurgeJob.java:86)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Activity

Hide
Dharma Srinivasan added a comment -

Eventually purging did complete. Here is the sequence noticed from logs:

Purging was initiated/failed when new server started on 2010-08-20 ~5 pm. Every hour it attempted and did not delete any alert
Server log rolled over at midnight and the new log (next day) 2010-08-21 2:24 am, alerts started getting deleted successfully.

2010-08-21 02:24:22,345 INFO [Scheduler-3] [org.hyperic.hq.measurement.server.session.DataCompressImpl@299] Done (Deleted 603227 alerts)

After this, the next time alerts successfully got delted was after a few hours (the ones between came back with 0 deleted)

2010-08-21 11:11:03,331 INFO [Scheduler-3] [org.hyperic.hq.measurement.server.session.DataCompressImpl@299] Done (Deleted 17075 alerts)

After this each delete attempt deleted varying number. Finally all 600000+ alerts are out of the system on 2010-08-22

Noticed also (not sure if it's relevant), many alert related caches were full during this entire time with a large number of misses :
org.hyperic.hq.events.server.session.Alert 100000 5433601 10592399
org.hyperic.hq.events.server.session.Alert.actionLogBag 1428 441642 182885744
org.hyperic.hq.events.ser...sion.Alert.conditionLogBag 100000 2824970 182860557
org.hyperic.hq.events.server.session.AlertCondition 100000 128753 103744
org.hyperic.hq.events.ser...lertCondition.logEntriesBag 0 0 0
org.hyperic.hq.events.server.session.AlertConditionLog 500000 1072042 9638340
org.hyperic.hq.events.server.session.AlertDefinition 80788 121233 80796
org.hyperic.hq.events.ser....AlertDefinition.actionsBag 13133 52545 26266
org.hyperic.hq.events.ser...rtDefinition.conditionsBag 80764 103331 80764
org.hyperic.hq.events.ser...lertDefinition.triggersBag 2 1 4
org.hyperic.hq.events.ser...ession.AlertDefinitionState 80788 288450 82684

Show
Dharma Srinivasan added a comment - Eventually purging did complete. Here is the sequence noticed from logs: Purging was initiated/failed when new server started on 2010-08-20 ~5 pm. Every hour it attempted and did not delete any alert Server log rolled over at midnight and the new log (next day) 2010-08-21 2:24 am, alerts started getting deleted successfully. 2010-08-21 02:24:22,345 INFO [Scheduler-3] [org.hyperic.hq.measurement.server.session.DataCompressImpl@299] Done (Deleted 603227 alerts) After this, the next time alerts successfully got delted was after a few hours (the ones between came back with 0 deleted) 2010-08-21 11:11:03,331 INFO [Scheduler-3] [org.hyperic.hq.measurement.server.session.DataCompressImpl@299] Done (Deleted 17075 alerts) After this each delete attempt deleted varying number. Finally all 600000+ alerts are out of the system on 2010-08-22 Noticed also (not sure if it's relevant), many alert related caches were full during this entire time with a large number of misses : org.hyperic.hq.events.server.session.Alert 100000 5433601 10592399 org.hyperic.hq.events.server.session.Alert.actionLogBag 1428 441642 182885744 org.hyperic.hq.events.ser...sion.Alert.conditionLogBag 100000 2824970 182860557 org.hyperic.hq.events.server.session.AlertCondition 100000 128753 103744 org.hyperic.hq.events.ser...lertCondition.logEntriesBag 0 0 0 org.hyperic.hq.events.server.session.AlertConditionLog 500000 1072042 9638340 org.hyperic.hq.events.server.session.AlertDefinition 80788 121233 80796 org.hyperic.hq.events.ser....AlertDefinition.actionsBag 13133 52545 26266 org.hyperic.hq.events.ser...rtDefinition.conditionsBag 80764 103331 80764 org.hyperic.hq.events.ser...lertDefinition.triggersBag 2 1 4 org.hyperic.hq.events.ser...ession.AlertDefinitionState 80788 288450 82684
Hide
Jennifer Hickey added a comment -

Dharma or Scott, was the alert purge interval changed? Since default is 31 days, not sure why you'd be deleting alerts older than 8/19 on 8/20. Just want to make sure there aren't multiple issues at work here...

Show
Jennifer Hickey added a comment - Dharma or Scott, was the alert purge interval changed? Since default is 31 days, not sure why you'd be deleting alerts older than 8/19 on 8/20. Just want to make sure there aren't multiple issues at work here...
Hide
Jennifer Hickey added a comment -

Also, are we 100% sure this is a regression? At first I thought this issue was due to changing tx semantics in DataCompress causing the deletion of alerts to share same tx as rest of compression code. But in looking at the log, you pretty much used up the whole 15 mins just on deleting alerts, and nothing else in the overall compress tx code is alert-related. Plus, I'm not seeing any changes to the actual deleteAlerts method from 4.4...

Show
Jennifer Hickey added a comment - Also, are we 100% sure this is a regression? At first I thought this issue was due to changing tx semantics in DataCompress causing the deletion of alerts to share same tx as rest of compression code. But in looking at the log, you pretty much used up the whole 15 mins just on deleting alerts, and nothing else in the overall compress tx code is alert-related. Plus, I'm not seeing any changes to the actual deleteAlerts method from 4.4...
Hide
Dharma Srinivasan added a comment -

Jennifer,

Yes, I had changed purge interval from 31 days to 1 day to clean out the env off the 600000+ alerts.

Scott had filed this and marked as regression and I would defer to him on that because he seemed to think it worked earlier and I had not executed this scale purge earlier myself.

Show
Dharma Srinivasan added a comment - Jennifer, Yes, I had changed purge interval from 31 days to 1 day to clean out the env off the 600000+ alerts. Scott had filed this and marked as regression and I would defer to him on that because he seemed to think it worked earlier and I had not executed this scale purge earlier myself.
Hide
Dharma Srinivasan added a comment -

Downgrading this bug to Major since the purge was subsequently successful although it took roughly 2 days.

Also, purging alerts in bulk has a workaround of using bulk alert delete from API, where alerts can be bulk deleted in batches of about 800 each

Show
Dharma Srinivasan added a comment - Downgrading this bug to Major since the purge was subsequently successful although it took roughly 2 days. Also, purging alerts in bulk has a workaround of using bulk alert delete from API, where alerts can be bulk deleted in batches of about 800 each
Hide
Jennifer Hickey added a comment -

Per standup today, moving out of the sprint.

Show
Jennifer Hickey added a comment - Per standup today, moving out of the sprint.
Hide
Idan Hod added a comment -

As part of our continuous effort to improve product quality, The Hyperic product team has decided to implement a "zero bug policy" methodology.

Following this methodology, only defects that are planned to be handled in the near future will remain open. Any other defect will be deferred, with the option to be reevaluated if the need arises, or if changes to the Hyperic roadmap make such defect a candidate for a fix.

We believe this new process will help create clarity and focus in the Hyperic roadmap and ultimately benefit our customer base.

This bug has been deferred as part of the new policy.

We appreciate your cooperation and continues contribution to the improvement of Hyperic

Show
Idan Hod added a comment - As part of our continuous effort to improve product quality, The Hyperic product team has decided to implement a "zero bug policy" methodology. Following this methodology, only defects that are planned to be handled in the near future will remain open. Any other defect will be deferred, with the option to be reevaluated if the need arises, or if changes to the Hyperic roadmap make such defect a candidate for a fix. We believe this new process will help create clarity and focus in the Hyperic roadmap and ultimately benefit our customer base. This bug has been deferred as part of the new policy. We appreciate your cooperation and continues contribution to the improvement of Hyperic

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved:
    Last comment:
    37 weeks, 4 days ago