We recently experienced two unplanned outages on the Ning Platform: the first on Thanksgiving evening, November 22, and the second yesterday, November 27. We know that near-perfect up-time is what you’ve come to expect from Ning, and we wanted to let those who are interested know more details about this unplanned maintenance and how we generally respond when outages occur.
Platform uptime and performance are a critical part of the hosted community solution we offer you, and we know that any outage could have a significant effect on the health and activity in your communities. We delivered at least 99.9% uptime consistently from 2011 to October 2012. Obviously, in the past week we’ve failed to meet this standard. I want you to know that we take these outages (and any outage) very seriously. A team of engineers and advocates are always on-call 24/7 to quickly resolve significant incidents. If necessary, we call in additional engineers with relevant expertise. Any time we have an outage, we perform a detailed post-mortem to investigate root causes and specify changes to prevent the issue from recurring.
So, what happened?
The first outage resulted from back-end optimization work our engineering team has been doing over the past three months. The team has been making changes to our architecture to take advantage of newer equipment and powerful new cloud services that weren’t available when the back-end was originally designed. Our expectation is that the optimization process will result in stronger performance and improved disaster recovery in the long-term. However, one of the changes that was made caused a key set of databases to fail, leading to the outage on November 22nd. The engineering team has corrected the issue so it will not recur.
The outage yesterday was unfortunately the result of user error by an engineer on our operations team during normal back-end maintenance. We’re revising our systems to prevent this type of error from occurring again.
Again, I realize how important platform uptime and performance are for you and your members, and I apologize for the impact of the outages this last week.
UPDATE ON CSS/JS LOADING ISSUES:
Thanks for transparency.
I have a site, The Front Porch, which used to be called For Us Girls only....we are having trouble deleting discussions, our chat does not work. I sure with the maintenance would fix these issues.
Hey, Mitzi. It sounds like this might be unrelated to the outages we've had. Have you submitted a ticket from your dashboard? That is generally how we help support our customers. If you have a ticket number, can you share that with me? I can check up on the progress of your ticket. Thank you!
Could this have affected NING's ability to process payments? My payment has been unsuccessful today and I've had to submit a query to resolve it. Things are fine at my end.
Hey Dave, Thanks for including your reference code. These two issues are unrelated. I'll respond with more detail in that ticket.
Hi Allison. Thanks for the speedy fix.
Thanks John for the explanation - I hate human error but it's great that you are calling it the way it happened and that brings the likelihood that another human error like that won't happen again. Thank for the report.
I agree, thanks for the transparency. Humans make mistakes, thanks for fixing it
Thanks John, much appreciated feedback.
This seems to be happening more and more. Is this relative to some benefit and upgrading to a better functionality of our sites? That would make a few outages worth it.
Yes, this is related to upgrading:
The team has been making changes to our architecture to take advantage of newer equipment and powerful new cloud services that weren’t available when the back-end was originally designed. Our expectation is that the optimization process will result in stronger performance and improved disaster recovery in the long-term.
Oh, so backend stuff. Thanks. This is important stuff. Too bad it broke so much in the process of implementing.
I can't wait for more practical functionality on the front end, too, after all of these years. It would be great to get such things as, say, the forum and chat functionality to move beyond a 1990's format. That would be worth a day's broken site, for sure!
It really has been a long day with a broken site so far. Any word on when this is anticipated as being corrected?