Current status of the Ning Platform is always available on the Ning Status Blog.

Update On Unplanned Maintenance

We recently experienced two unplanned outages on the Ning Platform: the first on Thanksgiving evening, November 22, and the second yesterday, November 27. We know that near-perfect up-time is what you’ve come to expect from Ning, and we wanted to let those who are interested know more details about this unplanned maintenance and how we generally respond when outages occur.


Platform uptime and performance are a critical part of the hosted community solution we offer you, and we know that any outage could have a significant effect on the health and activity in your communities. We delivered at least 99.9% uptime consistently from 2011 to October 2012. Obviously, in the past week we’ve failed to meet this standard. I want you to know that we take these outages (and any outage) very seriously. A team of engineers and advocates are always on-call 24/7 to quickly resolve significant incidents. If necessary, we call in additional engineers with relevant expertise. Any time we have an outage, we perform a detailed post-mortem to investigate root causes and specify changes to prevent the issue from recurring.


So, what happened?


The first outage resulted from back-end optimization work our engineering team has been doing over the past three months. The team has been making changes to our architecture to take advantage of newer equipment and powerful new cloud services that weren’t available when the back-end was originally designed. Our expectation is that the optimization process will result in stronger performance and improved disaster recovery in the long-term. However, one of the changes that was made caused a key set of databases to fail, leading to the outage on November 22nd. The engineering team has corrected the issue so it will not recur.


The outage yesterday was unfortunately the result of user error by an engineer on our operations team during normal back-end maintenance. We’re revising our systems to prevent this type of error from occurring again.


We closely follow the feedback you send to our support team and here on Creators, and I want you to know that I’m aware some of you or your members have been experiencing intermittent issues with pages on your network not fully loading CSS or JavaScript correctly. We believe these issues are also related to the back-end optimization work we’ve been doing. We have two engineers working full-time to resolve this issue.


Again, I realize how important platform uptime and performance are for you and your members, and I apologize for the impact of the outages this last week.

UPDATE ON CSS/JS LOADING ISSUES:
The two engineers working on this intermittent issue have discovered and fixed what they believe was the cause of intermittent issues with pages not fully loading CSS or JavaScript correctly - a problematic configuration on one of our front-end resolvers. Note, some NCs and members who have broken (and old) JS or CSS cached in their browsers may need to do a forced-reload to get them back into a good state.

You need to be a member of Ning Creators Social Network to add comments!

Join Ning Creators Social Network

Votes: 0
Email me when people reply –

Replies

  • I can't load pics into albums since yesterday ... have reinstalled Chrome, cleared cache etc., but still no luck. Not sure what 'forced re-load' is ... (pardon my ignorance, lol!) happy to try it if you can explain what I need to do. Thanks.

    PS: Yes I have submitted a ticket :)

    PPS: Googled the 'forced reload' thing, have now tried, but it didn't resolve problem. 

  • Excellent post. Thank you for the information, and clarity, and transparency.

    I've been doing a lot of reading, and to be honest, good job guys.

    Thanks again Ning for AWESOMENESS!

    Good job on your responses, tell your hard working teams and technicians I said cheers, much appreciated!

    • This reply was deleted.
      • Eric, I noticed something. I did try to file a ticket on this but I much rather went through because of the ticket maintenance window.

        Ever since the first maintenance issue I've had missing images and at least one discussion I found a ticket about the missing images but I don't have the ticket number handy right now. At the time I was told that the images must be missing because the members left I knew that wasn't the case but that was explanation.

        Recently while doing a site map crawl. I've gotten the new error which I've never gotten before error 504 Gateway Timeout.
        When investigating this I open the URL and I got XML parse error in the XML parse error said element missing or something. This leads not accessible me to think that something in the maintenance made it so that these images still are there but are now inaccessible. These images are in one of our most popular important discussions so what really be awesome if you can fix that.

        And I understand that there is a hope to upgrade to cloud server technology with no ETA but could you give us a rough guess whether it will be weeks or months?

      • Kudos. The way the Creators Network treats me, the transparency through posts like this, is how I've modeled my own customer service on Addictapic. I don't hide anything, and my members appreciate it. I don't have a lot of members, but someday I'll figure that secret that many here have figured out.... 10 000 members? Wow, I'm about to celebrate 100 members and it's been like 8 months. lol. But regardless, the members I do have appreciate my approach, and a lot of my approach is inspired by Ning itself, because I've never seen a forum/company such as this. It's quite amazing. Realistically, Ning's recovery after a CEO step down seems to be coming along nicely, so I just wanted to say,

        Good job guys, please keep it up. :)

        ***Edit***

        I've done extensive reading now on Ning history, evolution, peaks and mergers.

        Very interesting, glad to see it continue. I of course wish the best for Ning, because I personally do not feel Addictapic would be as it is anywhere else, and no other group of "creators" would have been so giving and helpful.

        1000% support from me from now on.

        No more complaining. Just gonna skate around on my silver platter. :)

        Thanks again.

  • so does this mean that nings recently upgraded to a cloud server or wikll be upgrading to a cloud server so thered be multiple redundency (1 server has an outtage its reroutted to any other still functioning server) as well as serving from the closeest access point (reduced latency and wait times due to proximity)

    if that cloud upgrades comming soon ill take the outtages and be grateful for them if the end results more stability and better performance (both the site servers and more so the api.ning wich to me seems ti have the greatter perfornmance issues)

    ive been having a whole lot of timeouts while running sitemap crawls no matter how slow i crawl..intermittent like every 5 minutes 4 pages will timeout (on average)

    in the past id seen a rare api timeout (1 out of every 1000 or so) but now its the main seerver and its like 4 out of every few hundred

    just to be sure i had my routter my modem and every cord and cable replaced every connector ..everythung

    and tested it on multiple servers (my subdomain vps server i tested at 5 times the crawl rate not a single timeout)

    so even though the servers up and stable ...and relativly fast there still is intermittent issues

    testing using http://webwait.com u will see a huge variation in load times 3 seconds 1 load 12 the next 60 the next 8 the next etc etc

    often when using pingdom tools i noticed up to 6 second wait times  just to get the 1st byte of data from the html page

    and alot of slowness in the api serer

    1 huge issue is the scorecard whateveritis (comscore) thats part of he glam integration thats slowing things the most

  • Our apologies

    We're sorry, this social network is under maintenance. It should be available shortly, so check back soon.

    again?? -_-

  • [edit update as of 18:00 Eastern - woo hoo! looks like my site is back to normal! yay! Thanks guys!]

    Ooh goodness. This is still affecting so many sites. They've been broken all day. I sure hope this is fixed, soon. My site is hideous. lol This is a pretty big mistake, and I feel for those scrambling to fix it, but I'm crossing fingers this doesn't go into too many more hours.

    • This reply was deleted.
      • That's strange because several others reported the same effects in this very thread. I'm sure it was related, then. Everything looks good now, though.

  • Good to know how well the problems were responded to.

  • This seems to be happening more and more. Is this relative to some benefit and upgrading to a better functionality of our sites? That would make a few outages worth it.

This reply was deleted.

Meanwhile, you can check our social media channels