History | Log In     View a printable version of the current page.  
Hyperic HQ 3.2.5-EE Maintenance Release is Now Available
Issue Details (XML | Word | Printable)

Key: HHQ-1728
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Charles Lee
Reporter: Jon Travis
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hyperic HQ

Currently Down needs lots of work

Created: 04/Mar/08 08:05 AM   Updated: 24/Mar/08 03:39 PM
Component/s: None
Affects Version/s: 4.0.0, 3.2.1, 3.2.2
Fix Version/s: 4.0.0, 3.2.2

Issue Links:
Depends
 

Verify By: Kashyap Parikh
Last comment: 28 weeks, 4 days ago
Resolution Date: 19/Mar/08 12:29 PM


 Description  « Hide
The CurrentlyDown screen is unusable. Reloading the page several times shows massive differences in things which are available. IN addition, services which should be unavail are not showing up (when their respective servers are unavail)

In addition, watching the logs shows a huge amount of cache invalidations, and the # of events processed in the zevent Bus (for the down metrics calculator) is much higher than the # of resources which have ever been down.


 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Charles Lee - 18/Mar/08 03:45 PM
Scott's looking into it

Ryan Morgan - 19/Mar/08 09:38 AM

It appears the JSON output is correct based on the contents of the DownMetricsCache.

In places we see this the logs are full of:

2008-03-19 00:20:46,456 WARN [DownMetricsCalculator1] [org.hyperic.hq.measurement.server.session.DataManagerEJBImpl] No availability metric for for id 405264 in down metrics cache.
2008-03-19 00:20:46,456 WARN [DownMetricsCalculator1] [org.hyperic.hq.measurement.server.session.DataManagerEJBImpl] No availability metric for for id 404532 in down metrics cache.
2008-03-19 00:20:46,456 WARN [DownMetricsCalculator1] [org.hyperic.hq.measurement.server.session.DataManagerEJBImpl] No availability metric for for id 404388 in down metrics cache.
2008-03-19 00:20:46,456 WARN [DownMetricsCalculator1] [org.hyperic.hq.measurement.server.session.DataManagerEJBImpl] No availability metric for for id 405318 in down metrics cache.
2008-03-19 00:20:46,456 WARN [DownMetricsCalculator1] [org.hyperic.hq.measurement.server.session.DataManagerEJBImpl] No availability metric for for id 403686 in down metrics cache.
2008-03-19 00:20:46,456 WARN [DownMetricsCalculator1] [org.hyperic.hq.measurement.server.session.DataManagerEJBImpl] No availability metric for for id 403272 in down metrics cache.
2008-03-19 00:20:46,456 WARN [DownMetricsCalculator1] [org.hyperic.hq.measurement.server.session.DataManagerEJBImpl] No availability metric for for id 403866 in down metrics cache.
2008-03-19 00:20:46,456 WARN [DownMetricsCalculator1] [org.hyperic.hq.measurement.server.session.DataManagerEJBImpl] No availability metric for for id 405084 in down metrics cache.
2008-03-19 00:20:46,456 WARN [DownMetricsCalculator1] [org.hyperic.hq.measurement.server.session.DataManagerEJBImpl] No availability metric for for id 405300 in down metrics cache.
2008-03-19 00:20:46,456 WARN [DownMetricsCalculator1] [org.hyperic.hq.measurement.server.session.DataManagerEJBImpl] No availability metric for for id 403704 in down metrics cache.
2008-03-19 00:20:46,456 WARN [DownMetricsCalculator1] [org.hyperic.hq.measurement.server.session.DataManagerEJBImpl] No availability metric for for id 404442 in down metrics cache.
2008-03-19 00:20:46,456 WARN [DownMetricsCalculator1] [org.hyperic.hq.measurement.server.session.DataManagerEJBImpl] No availability metric for for id 404082 in down metrics cache.
2008-03-19 00:20:46,456 WARN [DownMetricsCalculator1] [org.hyperic.hq.measurement.server.session.DataManagerEJBImpl] No availability metric for for id 401754 in down metrics cache.


Charles Lee - 19/Mar/08 12:29 PM
Ultimately the problem is the down metrics cache size. Due to the code path, we are pushing every single availability metric into the cache, meaning that the size needs to be greater than the number of resources. However, code has been changed to reduce unnecessary insertion, so the cache can theoretically be slightly larger than the maximum number of down resources at any time. However, it's still possible in a system that has a lot of churn for things to be pushed out of the cache, so it's still recommended that we scale the cache size with the environment.

Kashyap Parikh - 24/Mar/08 03:39 PM
Looks good. Performed platform/server/services up/down tests and resources are showing up fine on currently down page after page refresh. Zevent queue also looks fine after server running for 72 hours.

ZEvent Listener Diagnostics:
    EventClass: class org.hyperic.hq.measurement.server.session.DownMetricZevent
        DownMetricsCalculator max=0.00 avg=0.00 num=108

Zevent Registered Buffers:
    DownMetricsCalculator size=0
                                       max=1820.00 avg=31.56 num=108