Support Migration Notice: To update migrated JIRA cases click here to open a new case use www.vmware.com/go/sr | vFabric Hyperic 5.7.0 is Now Available

Hyperic HQ

Ops center page: alerts do not load when db contains large number of alerts

Details

  • Case Links:
    none
  • Regression:
    Yes
  • Story Points:
    8
  • Tags:

Description

When hq-server database contains a large number of alerts (here, 400000+ alerts have been fired and db is set to store these alerts for a month) , then navigating to the Ops center page loads the main page frame but does not load the frame with the list of alerts. Waited for several minutes for the page to load and it kept spinning.

Further tests will be conducted to determine the consistency of this bug and when it times out. Meanwhile, filing it to start tracking the issue.

2 attempts till now:
Waited for 10 mins, did not load or time out
Waited for 45 mins (when server was under load from alerts firing), alert list on ops center did not load and did not time out

Activity

Hide
Kashyap Parikh added a comment -

Dharma, can you also check if its a regression. I believe we saw similar behavior in 4.3 and had to purge alerts via HQApi (and/or reducing the number of days alerts are saved)

Show
Kashyap Parikh added a comment - Dharma, can you also check if its a regression. I believe we saw similar behavior in 4.3 and had to purge alerts via HQApi (and/or reducing the number of days alerts are saved)
Hide
Dharma Srinivasan added a comment -

Another 4.5 observation of slow ops center load time:
Took 3 mins to load 5113 unfixed alerts (103 pages) on ops center when all agents were down

4.3 large env, currently I only have numbers for ~500 alerts. It takes about 4 seconds to load 10 alert pages
Will load that env with ~5000 alerts and add another comment to this bug

Show
Dharma Srinivasan added a comment - Another 4.5 observation of slow ops center load time: Took 3 mins to load 5113 unfixed alerts (103 pages) on ops center when all agents were down 4.3 large env, currently I only have numbers for ~500 alerts. It takes about 4 seconds to load 10 alert pages Will load that env with ~5000 alerts and add another comment to this bug
Hide
Dharma Srinivasan added a comment -

Slightly more comparable numbers between 4.3 and 4.5 (as far as number of alerts)

4.3 (452 unfixed alerts) - about 3 to 4 seconds
4.5 (588 unfixed) - 15.6 seconds

4.5 (1591 unfixed alerts) - 44 seconds
4.5 (96 filtered alerts not in escalation) - 2.8 seconds

Show
Dharma Srinivasan added a comment - Slightly more comparable numbers between 4.3 and 4.5 (as far as number of alerts) 4.3 (452 unfixed alerts) - about 3 to 4 seconds 4.5 (588 unfixed) - 15.6 seconds 4.5 (1591 unfixed alerts) - 44 seconds 4.5 (96 filtered alerts not in escalation) - 2.8 seconds
Hide
Dharma Srinivasan added a comment -

4.5 (494 alerts) - 8.7 seconds
4.3 (452 alerts) - between 2 seconds to 784 miliseconds

Show
Dharma Srinivasan added a comment - 4.5 (494 alerts) - 8.7 seconds 4.3 (452 alerts) - between 2 seconds to 784 miliseconds
Hide
Patrick Nguyen added a comment -

FIX: Batch queries for permissions, alerts, and platforms to reduce the number of db queries needed and thereby significantly improving performance.

On vmc-pserv01, which has close to 70,000 unfixed alerts, the initial load of the ops center improved from 89 minutes to 3 minutes for the hqadmin user.

Performance for hq users with permissions < hqadmin should be significantly faster because there is less data to process. Previously it would get all unfixed alerts and do permission checks on each alert definition.

Show
Patrick Nguyen added a comment - FIX: Batch queries for permissions, alerts, and platforms to reduce the number of db queries needed and thereby significantly improving performance. On vmc-pserv01, which has close to 70,000 unfixed alerts, the initial load of the ops center improved from 89 minutes to 3 minutes for the hqadmin user. Performance for hq users with permissions < hqadmin should be significantly faster because there is less data to process. Previously it would get all unfixed alerts and do permission checks on each alert definition.
Hide
Patrick Nguyen added a comment -

FIX 2: In case a refresh takes too long (longer than the refresh interval), implement custom auto refresh that will start the refresh timer when the ajax call to get the data is complete, instead of refreshing at each refresh interval.

Show
Patrick Nguyen added a comment - FIX 2: In case a refresh takes too long (longer than the refresh interval), implement custom auto refresh that will start the refresh timer when the ajax call to get the data is complete, instead of refreshing at each refresh interval.
Hide
Dharma Srinivasan added a comment -

Started testing this fix with 4.6 (build #214)

First impressions from verification on a medium scale ~1350 platforms MySQL env (will test on a large scale env before closing)

Unfixed Alerts 29572
Alerts in Escalation 455

Took 26060 ms (26 seconds) to load (on a VPN connection.. will update non VPN results soon)

Show
Dharma Srinivasan added a comment - Started testing this fix with 4.6 (build #214) First impressions from verification on a medium scale ~1350 platforms MySQL env (will test on a large scale env before closing) Unfixed Alerts 29572 Alerts in Escalation 455 Took 26060 ms (26 seconds) to load (on a VPN connection.. will update non VPN results soon)
Hide
Dharma Srinivasan added a comment -

Non VPN (wired) results

23419 unfixed alerts loaded on Ops center in 26754 ms

I currently do not have an env of the size that this bug was filed with (and unfixed alert count as high). Keeping this bug open in case I hit a much higher alert count value soon on my existing envs so I can retest.

Very good improvement noted for ~30K alert count ops center page load.

Show
Dharma Srinivasan added a comment - Non VPN (wired) results 23419 unfixed alerts loaded on Ops center in 26754 ms I currently do not have an env of the size that this bug was filed with (and unfixed alert count as high). Keeping this bug open in case I hit a much higher alert count value soon on my existing envs so I can retest. Very good improvement noted for ~30K alert count ops center page load.
Hide
Dharma Srinivasan added a comment -

Tested with 4.6 (June 14th build), MySQL + CentOS ~1350 total platforms

Re-opening to consider this scenario:

For 20609 total unfixed alerts and 455 down platforms, ops center page & alert loading time is ~18 seconds

But, when status type is changed to 'Down Resources' from 'All Alerts', it takes 386335 ms to 417277 ms (~7 mins) for the down platforms list to load

Pasting the alert stats that the header of the page displays

Current Filter Totals
Resources
Down Platforms	
455
Down Resources	
20609
Alerts
 	Low	Medium	High	Total
Unfixed Alerts	
N/A
N/A
N/A
N/A
Alerts in Escalation	
N/A
N/A
N/A
N/A

7 mins for just 455 resources down is quite a long delay.

Show
Dharma Srinivasan added a comment - Tested with 4.6 (June 14th build), MySQL + CentOS ~1350 total platforms Re-opening to consider this scenario: For 20609 total unfixed alerts and 455 down platforms, ops center page & alert loading time is ~18 seconds But, when status type is changed to 'Down Resources' from 'All Alerts', it takes 386335 ms to 417277 ms (~7 mins) for the down platforms list to load Pasting the alert stats that the header of the page displays
Current Filter Totals
Resources
Down Platforms	
455
Down Resources	
20609
Alerts
 	Low	Medium	High	Total
Unfixed Alerts	
N/A
N/A
N/A
N/A
Alerts in Escalation	
N/A
N/A
N/A
N/A

7 mins for just 455 resources down is quite a long delay.
Hide
Dharma Srinivasan added a comment -

Agree.. filing slowness of 'down platforms' option as a separate bug and closing this

Show
Dharma Srinivasan added a comment - Agree.. filing slowness of 'down platforms' option as a separate bug and closing this
Hide
Dharma Srinivasan added a comment -

Verified on 4.6

Currently setting up alert storm on a 4.5.2 functional env to verify Ops center with large number of unfixed alerts. Will test and close this once reasonably large number of alerts have accumulated on that env over the next couple of days.

Show
Dharma Srinivasan added a comment - Verified on 4.6 Currently setting up alert storm on a 4.5.2 functional env to verify Ops center with large number of unfixed alerts. Will test and close this once reasonably large number of alerts have accumulated on that env over the next couple of days.
Hide
Dharma Srinivasan added a comment -

Closing per test on 4.6 large env. Currently no large 4.5.2 env is available for this test so 4.5.2 ops center has been tested for regression with small number of alerts.

Show
Dharma Srinivasan added a comment - Closing per test on 4.6 large env. Currently no large 4.5.2 env is available for this test so 4.5.2 ops center has been tested for regression with small number of alerts.

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved:
    Last comment:
    2 years, 41 weeks ago