We recently experienced two unplanned outages on the Ning Platform: the first on Thanksgiving evening, November 22, and the second yesterday, November 27. We know that near-perfect up-time is what you’ve come to expect from Ning, and we wanted to let those who are interested know more details about this unplanned maintenance and how we generally respond when outages occur.
Platform uptime and performance are a critical part of the hosted community solution we offer you, and we know that any outage could have a significant effect on the health and activity in your communities. We delivered at least 99.9% uptime consistently from 2011 to October 2012. Obviously, in the past week we’ve failed to meet this standard. I want you to know that we take these outages (and any outage) very seriously. A team of engineers and advocates are always on-call 24/7 to quickly resolve significant incidents. If necessary, we call in additional engineers with relevant expertise. Any time we have an outage, we perform a detailed post-mortem to investigate root causes and specify changes to prevent the issue from recurring.
So, what happened?
The first outage resulted from back-end optimization work our engineering team has been doing over the past three months. The team has been making changes to our architecture to take advantage of newer equipment and powerful new cloud services that weren’t available when the back-end was originally designed. Our expectation is that the optimization process will result in stronger performance and improved disaster recovery in the long-term. However, one of the changes that was made caused a key set of databases to fail, leading to the outage on November 22nd. The engineering team has corrected the issue so it will not recur.
The outage yesterday was unfortunately the result of user error by an engineer on our operations team during normal back-end maintenance. We’re revising our systems to prevent this type of error from occurring again.
We closely follow the feedback you send to our support team and here on Creators, and I want you to know that I’m aware some of you or your members have been experiencing intermittent issues with pages on your network not fully loading CSS or JavaScript correctly. We believe these issues are also related to the back-end optimization work we’ve been doing. We have two engineers working full-time to resolve this issue.
Again, I realize how important platform uptime and performance are for you and your members, and I apologize for the impact of the outages this last week.
UPDATE ON CSS/JS LOADING ISSUES:
The two engineers working on this intermittent issue have discovered and fixed what they believe was the cause of intermittent issues with pages not fully loading CSS or JavaScript correctly - a problematic configuration on one of our front-end resolvers. Note, some NCs and members who have broken (and old) JS or CSS cached in their browsers may need to do a forced-reload to get them back into a good state.
Tags: maintenance, outage, unplanned

Permalink Reply by soaringeagle on December 9, 2012 at 10:50pm Eric, I noticed something. I did try to file a ticket on this but I much rather went through because of the ticket maintenance window.
Ever since the first maintenance issue I've had missing images and at least one discussion I found a ticket about the missing images but I don't have the ticket number handy right now. At the time I was told that the images must be missing because the members left I knew that wasn't the case but that was explanation.
Recently while doing a site map crawl. I've gotten the new error which I've never gotten before error 504 Gateway Timeout.
When investigating this I open the URL and I got XML parse error in the XML parse error said element missing or something. This leads not accessible me to think that something in the maintenance made it so that these images still are there but are now inaccessible. These images are in one of our most popular important discussions so what really be awesome if you can fix that.
And I understand that there is a hope to upgrade to cloud server technology with no ETA but could you give us a rough guess whether it will be weeks or months?
Permalink Reply by Eric Suesz on December 10, 2012 at 10:58am I don't think I have any good answers for you on these questions. I think our more technical team members will need to look into it. I fear any opinion I have would be an uneducated one when it comes to this level of detail. We're always updating and sometimes upgrading our systems, but I can't really speak to the particulars. Sorry, but that is a detail that we probably wouldn't be able to share anyway.
Permalink Reply by Paula Carter on December 14, 2012 at 5:36am I can't load pics into albums since yesterday ... have reinstalled Chrome, cleared cache etc., but still no luck. Not sure what 'forced re-load' is ... (pardon my ignorance, lol!) happy to try it if you can explain what I need to do. Thanks.
PS: Yes I have submitted a ticket :)
PPS: Googled the 'forced reload' thing, have now tried, but it didn't resolve problem.
Allison Leahy replied to soaringeagle's discussion '3.0 feature request'
Kos replied to SweetPotato's discussion 'Simple, Responsive Slider for Ning 3.0'
Kos replied to SweetPotato's discussion 'Simple, Responsive Slider for Ning 3.0'
Kos replied to John Bizley's discussion 'File Manager NO DELETION option anymore ? Plus can you expand the file name size box.' in the group The Sandbox
John Bizley replied to John Bizley's discussion 'File Manager NO DELETION option anymore ? Plus can you expand the file name size box.' in the group The Sandbox
Riccardo Rossini replied to John Bizley's discussion 'File Manager NO DELETION option anymore ? Plus can you expand the file name size box.' in the group The Sandbox
Kos replied to John Bizley's discussion 'File Manager NO DELETION option anymore ? Plus can you expand the file name size box.' in the group The Sandbox© 2013 Created by Ning.
