We recently experienced two unplanned outages on the Ning Platform: the first on Thanksgiving evening, November 22, and the second yesterday, November 27. We know that near-perfect up-time is what you’ve come to expect from Ning, and we wanted to let those who are interested know more details about this unplanned maintenance and how we generally respond when outages occur.
Platform uptime and performance are a critical part of the hosted community solution we offer you, and we know that any outage could have a significant effect on the health and activity in your communities. We delivered at least 99.9% uptime consistently from 2011 to October 2012. Obviously, in the past week we’ve failed to meet this standard. I want you to know that we take these outages (and any outage) very seriously. A team of engineers and advocates are always on-call 24/7 to quickly resolve significant incidents. If necessary, we call in additional engineers with relevant expertise. Any time we have an outage, we perform a detailed post-mortem to investigate root causes and specify changes to prevent the issue from recurring.
So, what happened?
The first outage resulted from back-end optimization work our engineering team has been doing over the past three months. The team has been making changes to our architecture to take advantage of newer equipment and powerful new cloud services that weren’t available when the back-end was originally designed. Our expectation is that the optimization process will result in stronger performance and improved disaster recovery in the long-term. However, one of the changes that was made caused a key set of databases to fail, leading to the outage on November 22nd. The engineering team has corrected the issue so it will not recur.
The outage yesterday was unfortunately the result of user error by an engineer on our operations team during normal back-end maintenance. We’re revising our systems to prevent this type of error from occurring again.
We closely follow the feedback you send to our support team and here on Creators, and I want you to know that I’m aware some of you or your members have been experiencing intermittent issues with pages on your network not fully loading CSS or JavaScript correctly. We believe these issues are also related to the back-end optimization work we’ve been doing. We have two engineers working full-time to resolve this issue.
Again, I realize how important platform uptime and performance are for you and your members, and I apologize for the impact of the outages this last week.
UPDATE ON CSS/JS LOADING ISSUES:
The two engineers working on this intermittent issue have discovered and fixed what they believe was the cause of intermittent issues with pages not fully loading CSS or JavaScript correctly - a problematic configuration on one of our front-end resolvers. Note, some NCs and members who have broken (and old) JS or CSS cached in their browsers may need to do a forced-reload to get them back into a good state.
Tags: maintenance, outage, unplanned
Permalink Reply by Thiago Santos de Moraes on November 28, 2012 at 12:42pm Thanks for transparency.
Permalink Reply by Mitzi Hoover on November 28, 2012 at 1:40pm I have a site, The Front Porch, which used to be called For Us Girls only....we are having trouble deleting discussions, our chat does not work. I sure with the maintenance would fix these issues.
Permalink Reply by Eric Suesz on November 28, 2012 at 1:50pm Hey, Mitzi. It sounds like this might be unrelated to the outages we've had. Have you submitted a ticket from your dashboard? That is generally how we help support our customers. If you have a ticket number, can you share that with me? I can check up on the progress of your ticket. Thank you!
Permalink Reply by Dave Walkerden on November 28, 2012 at 1:54pm Could this have affected NING's ability to process payments? My payment has been unsuccessful today and I've had to submit a query to resolve it. Things are fine at my end.
ref:_00D80cCLt._50040Nm1RP
Permalink Reply by Allison Leahy on November 28, 2012 at 2:17pm Hey Dave, Thanks for including your reference code. These two issues are unrelated. I'll respond with more detail in that ticket.
Permalink Reply by Dave Walkerden on November 28, 2012 at 5:19pm Hi Allison. Thanks for the speedy fix.
Permalink Reply by howard mccoy on November 28, 2012 at 2:20pm Thanks John for the explanation - I hate human error but it's great that you are calling it the way it happened and that brings the likelihood that another human error like that won't happen again. Thank for the report.
Permalink Reply by Janice D Carter on November 28, 2012 at 4:52pm I agree, thanks for the transparency. Humans make mistakes, thanks for fixing it
Permalink Reply by James Higginson on November 29, 2012 at 2:18am Thanks John, much appreciated feedback.
Kind regards
James
Permalink Reply by CocteauBoy on November 29, 2012 at 7:33am This seems to be happening more and more. Is this relative to some benefit and upgrading to a better functionality of our sites? That would make a few outages worth it.
Permalink Reply by Eric Suesz on November 29, 2012 at 8:48am Yes, this is related to upgrading:
The team has been making changes to our architecture to take advantage of newer equipment and powerful new cloud services that weren’t available when the back-end was originally designed. Our expectation is that the optimization process will result in stronger performance and improved disaster recovery in the long-term.
Permalink Reply by CocteauBoy on November 29, 2012 at 9:50am Oh, so backend stuff. Thanks. This is important stuff. Too bad it broke so much in the process of implementing.
I can't wait for more practical functionality on the front end, too, after all of these years. It would be great to get such things as, say, the forum and chat functionality to move beyond a 1990's format. That would be worth a day's broken site, for sure!
It really has been a long day with a broken site so far. Any word on when this is anticipated as being corrected?

Kos replied to SweetPotato's discussion 'Suggestion: Social Channels Page Titles and Title Tags' in the group The Sandbox
Kos replied to Enrrico Torres's discussion 'Change Domain Name'
Kos replied to Enrrico Torres's discussion 'Change Domain Name'
Enrrico Torres replied to Enrrico Torres's discussion 'Change Domain Name'
Kos replied to Enrrico Torres's discussion 'Change Domain Name'
Indrie Florin Gabriel replied to Indrie Florin Gabriel's discussion 'Latest Activity'© 2013 Created by Ning.
