We recently experienced two unplanned outages on the Ning Platform: the first on Thanksgiving evening, November 22, and the second yesterday, November 27. We know that near-perfect up-time is what you’ve come to expect from Ning, and we wanted to let those who are interested know more details about this unplanned maintenance and how we generally respond when outages occur.


Platform uptime and performance are a critical part of the hosted community solution we offer you, and we know that any outage could have a significant effect on the health and activity in your communities. We delivered at least 99.9% uptime consistently from 2011 to October 2012. Obviously, in the past week we’ve failed to meet this standard. I want you to know that we take these outages (and any outage) very seriously. A team of engineers and advocates are always on-call 24/7 to quickly resolve significant incidents. If necessary, we call in additional engineers with relevant expertise. Any time we have an outage, we perform a detailed post-mortem to investigate root causes and specify changes to prevent the issue from recurring.


So, what happened?


The first outage resulted from back-end optimization work our engineering team has been doing over the past three months. The team has been making changes to our architecture to take advantage of newer equipment and powerful new cloud services that weren’t available when the back-end was originally designed. Our expectation is that the optimization process will result in stronger performance and improved disaster recovery in the long-term. However, one of the changes that was made caused a key set of databases to fail, leading to the outage on November 22nd. The engineering team has corrected the issue so it will not recur.


The outage yesterday was unfortunately the result of user error by an engineer on our operations team during normal back-end maintenance. We’re revising our systems to prevent this type of error from occurring again.


We closely follow the feedback you send to our support team and here on Creators, and I want you to know that I’m aware some of you or your members have been experiencing intermittent issues with pages on your network not fully loading CSS or JavaScript correctly. We believe these issues are also related to the back-end optimization work we’ve been doing. We have two engineers working full-time to resolve this issue.


Again, I realize how important platform uptime and performance are for you and your members, and I apologize for the impact of the outages this last week.

UPDATE ON CSS/JS LOADING ISSUES:
The two engineers working on this intermittent issue have discovered and fixed what they believe was the cause of intermittent issues with pages not fully loading CSS or JavaScript correctly - a problematic configuration on one of our front-end resolvers. Note, some NCs and members who have broken (and old) JS or CSS cached in their browsers may need to do a forced-reload to get them back into a good state.

Tags: maintenance, outage, unplanned

Views: 1253

Reply to This

Replies to This Discussion

Eric, I noticed something. I did try to file a ticket on this but I much rather went through because of the ticket maintenance window.

Ever since the first maintenance issue I've had missing images and at least one discussion I found a ticket about the missing images but I don't have the ticket number handy right now. At the time I was told that the images must be missing because the members left I knew that wasn't the case but that was explanation.

Recently while doing a site map crawl. I've gotten the new error which I've never gotten before error 504 Gateway Timeout.
When investigating this I open the URL and I got XML parse error in the XML parse error said element missing or something. This leads not accessible me to think that something in the maintenance made it so that these images still are there but are now inaccessible. These images are in one of our most popular important discussions so what really be awesome if you can fix that.

And I understand that there is a hope to upgrade to cloud server technology with no ETA but could you give us a rough guess whether it will be weeks or months?

I don't think I have any good answers for you on these questions. I think our more technical team members will need to look into it. I fear any opinion I have would be an uneducated one when it comes to this level of detail. We're always updating and sometimes upgrading our systems, but I can't really speak to the particulars. Sorry, but that is a detail that we probably wouldn't be able to share anyway.

I can't load pics into albums since yesterday ... have reinstalled Chrome, cleared cache etc., but still no luck. Not sure what 'forced re-load' is ... (pardon my ignorance, lol!) happy to try it if you can explain what I need to do. Thanks.

PS: Yes I have submitted a ticket :)

PPS: Googled the 'forced reload' thing, have now tried, but it didn't resolve problem. 

RSS

Latest Activity

Allison Leahy replied to soaringeagle's discussion '3.0 feature request'
"Thanks. I'll make sure this is filed as a feature improvement for 3.0."
1 hour ago
Kos replied to SweetPotato's discussion 'Simple, Responsive Slider for Ning 3.0'
"Ha!  Nevermind.  Suddenly it's back to normal again.  Gremlins in the hardware,…"
1 hour ago
Kos replied to SweetPotato's discussion 'Simple, Responsive Slider for Ning 3.0'
"Hey SP is your slide still running alright?  Came back from dinner and saw this.  idk if…"
2 hours ago
FedMedic replied to soaringeagle's discussion '3.0 feature request'
"+1"
4 hours ago
Kos replied to John Bizley's discussion 'File Manager NO DELETION option anymore ? Plus can you expand the file name size box.' in the group The Sandbox
"It's a glitch on Ning's end; plain and simple"
5 hours ago
John Bizley replied to John Bizley's discussion 'File Manager NO DELETION option anymore ? Plus can you expand the file name size box.' in the group The Sandbox
"Thanks Riccardo, it took a bit of moving the mouse about to find it but my mouse does change to the…"
6 hours ago
Riccardo Rossini replied to John Bizley's discussion 'File Manager NO DELETION option anymore ? Plus can you expand the file name size box.' in the group The Sandbox
"John, actually the button is only hidden, more or less you'll find it here your pointer will…"
6 hours ago
Kos replied to John Bizley's discussion 'File Manager NO DELETION option anymore ? Plus can you expand the file name size box.' in the group The Sandbox
"Even a photo preview would be wonderful so we're certain we're selecting the right…"
6 hours ago

© 2013   Created by Ning.

Badges  |  Report an Issue  |  Terms of Service