Support Migration Notice: To update migrated JIRA cases click here to open a new case use www.vmware.com/go/sr | vFabric Hyperic 5.7.0 is Now Available

Hyperic HQ

SRN implementation needs to be reviewed and rewritten

Details

  • Type: Improvement Improvement
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: None
  • Fix Version/s: 4.6.5
  • Component/s: None
  • Case Links:
    none
  • Regression:
    No

Description

There is lots of technical debt surrounding SRNs. It needs to be reviewed and re-written.

One bug that it causes is the SRN logic to determine if the metric is being sent at the proper rate and if not it is re-scheduled. This mechanism reschedules all resource metrics if an agent is down for too long due to this logic. This causes major performance degradation in a large HQ instance.

SRN should also be modernized to use hibernate and ehcache rather than maintaining its own internal state.

Activity

Hide
Scott Feldstein added a comment -

resolving.

Please test the following scenarios:

1) create a new platform, ensure that everything is scheduled
2) change metric collection intervals and ensure that they are collecting at the proper rate
3) change metric intervals while an agent is down. When it comes back up it should get the new schedule
4) update metric schedule templates
5) remove a resources and make sure that the metrics are unscheduled (you'd see warnings in the logs if there was an issue here)

At each step check the EAM_SRN table, for the resource that is being rescheduled, to make sure that the SRN is only incremented by 1 each time you reschedule.

Show
Scott Feldstein added a comment - resolving. Please test the following scenarios: 1) create a new platform, ensure that everything is scheduled 2) change metric collection intervals and ensure that they are collecting at the proper rate 3) change metric intervals while an agent is down. When it comes back up it should get the new schedule 4) update metric schedule templates 5) remove a resources and make sure that the metrics are unscheduled (you'd see warnings in the logs if there was an issue here) At each step check the EAM_SRN table, for the resource that is being rescheduled, to make sure that the SRN is only incremented by 1 each time you reschedule.

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved:
    Last comment:
    2 years, 14 weeks, 6 days ago