Support Migration Notice: To update migrated JIRA cases click here to open a new case use www.vmware.com/go/sr | vFabric Hyperic 5.7.0 is Now Available

Hyperic HQ

TransportDisposedIOException and Illegalstate exceptions in log right after restarting the server

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Deferred
  • Affects Version/s: 4.5, 4.6
  • Fix Version/s: None
  • Component/s: None
  • Environment:
    MySQL, CentOS, build #105, 1342 platforms
  • Case Links:
    none
  • Regression:
    Yes

Description

Steps:

1.Stopped hq-server
2.Started hq-server in a few seconds

Found this exception right after server start:

server start (after stop)

2010-09-16 16:38:43,322 ERROR [pool-8-thread-1] [org.apache.activemq.ActiveMQMessageConsumer@679] ID:pserv01-42208-1284594449657-2:0:4:1 failed to delivered acknowledgements
javax.jms.JMSException: Peer (vm://localhost#1) disposed.
at org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:62)
at org.apache.activemq.ActiveMQConnection.doAsyncSendPacket(ActiveMQConnection.java:1227)
at org.apache.activemq.ActiveMQConnection.asyncSendPacket(ActiveMQConnection.java:1219)
at org.apache.activemq.ActiveMQSession.asyncSendPacket(ActiveMQSession.java:1799)
at org.apache.activemq.ActiveMQSession.sendAck(ActiveMQSession.java:1964)
at org.apache.activemq.ActiveMQMessageConsumer$4.run(ActiveMQMessageConsumer.java:677)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.activemq.transport.TransportDisposedIOException: Peer (vm://localhost#1) disposed.
at org.apache.activemq.transport.vm.VMTransport.oneway(VMTransport.java:89)
at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:40)
at org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
at org.apache.activemq.ActiveMQConnection.doAsyncSendPacket(ActiveMQConnection.java:1225)
... 10 more
2010-09-16 16:38:43,366 INFO [Thread-18] [com.hyperic.hq.measurement.server.session.BatchAggregateDataInserter@219] No unprocessed metrics to save to disk
2010-09-16 16:38:43,411 INFO [Thread-18] [com.hyperic.hq.measurement.server.session.BatchAggregateDataInserter@219] No unprocessed metrics to save to disk
2010-09-16 16:38:43,429 INFO [Thread-18] [org.quartz.core.QuartzScheduler@581] Scheduler org.springframework.scheduling.quartz.SchedulerFactoryBean#0_$_NON_CLUSTERED shutting down.
2010-09-16 16:38:43,430 INFO [Thread-18] [org.quartz.core.QuartzScheduler@507] Scheduler org.springframework.scheduling.quartz.SchedulerFactoryBean#0_$_NON_CLUSTERED paused.
2010-09-16 16:38:43,430 INFO [Thread-18] [org.quartz.core.QuartzScheduler@632] Scheduler org.springframework.scheduling.quartz.SchedulerFactoryBean#0_$_NON_CLUSTERED shutdown complete.
2010-09-16 16:38:43,456 INFO [Thread-18] [org.hyperic.hq.events.FileAlertConditionEvaluatorStateRepository@148] Successfully saved alert condition evaluator states to /sw/hyperic/HQ_mysql/4.5/105/server-4.5.0.BUILD-SNAPSHOT-EE/hq-engine/hq-server/ExecutionStrategyStates.dat
2010-09-16 16:38:43,493 INFO [Thread-18] [org.hyperic.hibernate.JmxExposingLocalSessionFactoryBean@246] Closing Hibernate SessionFactory
2010-09-16 16:38:43,493 INFO [Thread-18] [org.hibernate.impl.SessionFactoryImpl@769] closing
2010-09-16 16:38:45,258 WARN [pool-4-thread-1] [org.hyperic.hq.stats.ConcurrentStatsWriter@267] The CacheManager is not alive.
java.lang.IllegalStateException: The CacheManager is not alive.
at net.sf.ehcache.CacheManager.checkStatus(CacheManager.java:831)
at net.sf.ehcache.CacheManager.getCacheNames(CacheManager.java:823)
at org.hyperic.hq.stats.ConcurrentStatsCollector$1.getVal(ConcurrentStatsCollector.java:179)
at org.hyperic.hq.stats.ConcurrentStatsWriter$StatsWriter.getCSVBuf(ConcurrentStatsWriter.java:302)
at org.hyperic.hq.stats.ConcurrentStatsWriter$StatsWriter.run(ConcurrentStatsWriter.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Marking as regression since I've not seen this exception before (4.3 or 4.5)

  1. hqstats-09-16.csv
    17/Sep/10 12:18 PM
    17 kB
    Dharma Srinivasan
  2. server.log.2010-09-16full.log.gz
    17/Sep/10 12:18 PM
    2.83 MB
    Dharma Srinivasan
  3. server.log.bug
    30/Sep/10 2:00 PM
    1.66 MB
    Dharma Srinivasan

Issue Links

Activity

Hide
Kashyap Parikh added a comment -

First part of this bug is same bug as HHQ-4262. I will go ahead and mark that one duplicate.

Show
Kashyap Parikh added a comment - First part of this bug is same bug as HHQ-4262. I will go ahead and mark that one duplicate.
Hide
Kashyap Parikh added a comment -

Dharma, Can you check if alerts are firing correctly after server is up (first stack) and stats collector is working fine (second stack).

Show
Kashyap Parikh added a comment - Dharma, Can you check if alerts are firing correctly after server is up (first stack) and stats collector is working fine (second stack).
Hide
Dharma Srinivasan added a comment -

Kashyap,

There were no alerts firing and none expected to fire at that time. Server was shut down for less than a minute and brought back up. No user action other than to login to the UI after server start.

Concurrent stats seems to be working (screenshot will be attached)

The only other correlation from the stats log at this time is that the jdbc_hq_ds_in_use is high at server start up (~200) and falls back down to 10 within a couple of mins. Every server start up has this pool count in that range for a min or two.

Both times this activeMQ exception was noticed (this and the bug marked duplicate), the common pattern was server start up after a brief stop.

Show
Dharma Srinivasan added a comment - Kashyap, There were no alerts firing and none expected to fire at that time. Server was shut down for less than a minute and brought back up. No user action other than to login to the UI after server start. Concurrent stats seems to be working (screenshot will be attached) The only other correlation from the stats log at this time is that the jdbc_hq_ds_in_use is high at server start up (~200) and falls back down to 10 within a couple of mins. Every server start up has this pool count in that range for a min or two. Both times this activeMQ exception was noticed (this and the bug marked duplicate), the common pattern was server start up after a brief stop.
Hide
Jennifer Hickey added a comment -

I think these messages are just from shutdown (it's "normal" to see them then). I would think these errors on startup would cause the app not to even come alive (for example the last stack trace comes after "Closing Hibernate SessionFactory". If your SessionFactory is closed. Plus in your attached log, I'm seeing server startup messages immediately following these, indicating a successful server start:

2010-09-16 16:39:14,894 INFO [main] [org.springframework.web.context.ContextLoader@186] Root WebApplicationContext: initialization started
2010-09-16 16:39:14,954 INFO [main] [org.springframework.web.context.support.XmlWebApplicationContext@454] Refreshing Root WebApplicationContext: startup date [Thu Sep 16 16:39:14 PDT 2010]; root of context hierarchy

What was leading you to believe these messages were logged during server start?

Show
Jennifer Hickey added a comment - I think these messages are just from shutdown (it's "normal" to see them then). I would think these errors on startup would cause the app not to even come alive (for example the last stack trace comes after "Closing Hibernate SessionFactory". If your SessionFactory is closed. Plus in your attached log, I'm seeing server startup messages immediately following these, indicating a successful server start: 2010-09-16 16:39:14,894 INFO [main] [org.springframework.web.context.ContextLoader@186] Root WebApplicationContext: initialization started 2010-09-16 16:39:14,954 INFO [main] [org.springframework.web.context.support.XmlWebApplicationContext@454] Refreshing Root WebApplicationContext: startup date [Thu Sep 16 16:39:14 PDT 2010]; root of context hierarchy What was leading you to believe these messages were logged during server start?
Hide
Dharma Srinivasan added a comment -

This issue surfaced today during restart. The "transportDisposedIOException" and "cache not alive" are in the server log. This happened during server restart (.hq-server.sh restart command) and the server did not start after this.

Issuing the restart command a second time started the server successfully.

Env: Oracle large env
Scenario:
Profiler was running on this env.
Edited the conf file to remove profiler setting from server start up
Issued server restart command
Server did not start and log had exceptions similar to the ones in this bug's description.
Re-issuing restart resolved the issue

Attaching log

Show
Dharma Srinivasan added a comment - This issue surfaced today during restart. The "transportDisposedIOException" and "cache not alive" are in the server log. This happened during server restart (.hq-server.sh restart command) and the server did not start after this. Issuing the restart command a second time started the server successfully. Env: Oracle large env Scenario: Profiler was running on this env. Edited the conf file to remove profiler setting from server start up Issued server restart command Server did not start and log had exceptions similar to the ones in this bug's description. Re-issuing restart resolved the issue Attaching log
Hide
Frederic Calindas added a comment -

Also reproducible with fresh install of 4.6 SNAPSHOT using embedded Postgres DB.

Show
Frederic Calindas added a comment - Also reproducible with fresh install of 4.6 SNAPSHOT using embedded Postgres DB.
Hide
Zvika Messing added a comment - - edited

reproduces in hyeric 4.6.5 RC2.
scenario: server does not come up after restart command. same errors in server log file (as attached to the original ticket).

1 note - this reprdoces when I accidently run the restart command from "root" user and not with the hyperic related user. could this be related?

Show
Zvika Messing added a comment - - edited reproduces in hyeric 4.6.5 RC2. scenario: server does not come up after restart command. same errors in server log file (as attached to the original ticket). 1 note - this reprdoces when I accidently run the restart command from "root" user and not with the hyperic related user. could this be related?
Hide
Idan Hod added a comment -

As part of our continuous effort to improve product quality, The Hyperic product team has decided to implement a "zero bug policy" methodology.

Following this methodology, only defects that are planned to be handled in the near future will remain open. Any other defect will be deferred, with the option to be reevaluated if the need arises, or if changes to the Hyperic road-map make such defect a candidate for a fix.

We believe this new process will help create clarity and focus in the Hyperic road-map and ultimately benefit our customer base.

This bug has been deferred as part of the new policy.

We appreciate your cooperation and continues contribution to the improvement of Hyperic.

Show
Idan Hod added a comment - As part of our continuous effort to improve product quality, The Hyperic product team has decided to implement a "zero bug policy" methodology. Following this methodology, only defects that are planned to be handled in the near future will remain open. Any other defect will be deferred, with the option to be reevaluated if the need arises, or if changes to the Hyperic road-map make such defect a candidate for a fix. We believe this new process will help create clarity and focus in the Hyperic road-map and ultimately benefit our customer base. This bug has been deferred as part of the new policy. We appreciate your cooperation and continues contribution to the improvement of Hyperic.

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved:
    Last comment:
    41 weeks, 6 days ago