so theres been a few discussions about sitemaps and a proper sitemap contains way more urls then the ning sitemap

 

1st lets examine the purpose of a sitemap

a sitemap should list every page in  your site you want crawled (excluding the ones u have blocked in robots.txt)

a sitemap optionaly should include priority and change frequency to help guide the bots to crawl high priority pages and those that change more frequently more often

 

 

what nings sitemap does

lists the features only typicaly 10 out of 900,000 urls it should list

 crawlers can get lost in 10 year old unimportant content visiting useless "share" pages etc  and only finding the important content by pure luck

some pages may never get found

 

 

practical example

OliverJonCross recently added a proper sitemap and in just a couple days noticed a 20-25% search traffic increase (i expect in 2 weeks that will rise much much higher especialy if the sitemaps updated often)

everytime you update a sitemap and ping the server announcing the new sitemaps are updated the search engines will download the new sitemaps and add the urls to the lists to be crawled

 

How to do a proper sitemap until ning offers a way to do 1

you need a hosting account  it can be any one you already have that allows ftp or any other way to upload files (nings file manager will not work)

a hosting account here will allow multiple sites for under $3 a month so u can use it for adding on to your site with subdomain extentions (i made other posts about this elsewhere) or you can use any existing hosting account or if you have a pc with static ip and iis or linux server you can even host it yourself

 

few tips

to host your sitemap on any other hosting account you must add a line to your rov=bots.txt

like this:

User-agent: * 
sitemap: http://www.dreadlockssite.com/sitemap.gz
Disallow: /xn/ningbar.php/
Disallow: /xn/atom/
Disallow: /xn/rest/
Disallow: /xn/css/
Disallow: /xn/loader/
Disallow: /main/search/search/
Disallow: /main/authorization/


now in this example i do host my sitemap on my ning site in the root because i do still have webdav access which ning discontinued
but to host it on any other site or server simply change the url to your sitemap

sitemaps can grow very large, over 49,999 urls google will reject it for too many urls
however usualy google will reject it sooner for size limmits so after 40,000 urls you want to create a sitemap index

sitemap indexes consist of multiple sitemaps that li]ook like this
sitemap.xml (or gz) this is your index3 urls (for a 120k page site example the 3 urls are to the 3 sitemaps not pages)
sitemap1.xml or gz 40,000 urls
sitemap2.xml (or gz) 40,000 urls
sitemap3.xml or gz 40,000 urls

so now you may have noticed 2 file extentions .xml or .gzxml sitemaps average 6000kb gz compression makes them about 600kb
gz reduces server loads give you much faster uploads and the search engines download them faster as well as save space if thats a concern

creating sitemaps
i hope someday soon ning provides a php based server side option
idealy it would have set it and forget it controls like this
exclusions (with an import robots.tcxt ioption and then customized filters from there)
priority and change frequency filters
idealy directory based filters as in
/forums/ priority 1 change freq always so every page in forums or deeper (/forums/topics/any-page-here/) will have that priority and change freq
with the option to micromanage adding priority to specific topics

aurtocrawl time a time of day to autocrawl your site update the sitemap and ping servers maybe with a "crawl at low traffic threashold" option
option to choose xml/gz

optional
option to include mobile sitemap (to crawl mobile site somewhere u need a link to /m/main/ as a starting point)
image sitemaps useless cause of the api.ning image handeling so no need for uit (although api.ning should have its own sitemap)


until ning offers a server side sitemap crawler

you will need a client side crawler i tried several..dozens inspyders the 1 i use cause it works the best and has a autorun in the background as a windows task feature so once you set it up and run it once u schedule it and forget it (as long as u have ftp i cant do that with webdav but all my other sites autoupdate fine)
install the crawler set your options do aa few crawls to "fine tune" it setting priorities crawl speed etc when you have a fast crawl speed with no timeouts schedule the crawls with plenty of time to finnish between crawls
this options slow and uses network respourses and server bandwidth
a crawl can take 24-48 hours for large networks a server side option would crawl the servers filesystem using no bandwidth in seconds to minutes
so an hourly updae would certainly be possible but might cause too much server disk activity
but a daily update at off peake load times i a very good option


ifyou try a proper sitemap and see a earch engine traffic increase please post your results here and track the increase over a 30 day period i will be interestd to see just how much more traffic you get by the end of the month

 

edit to add:

-------------------------------------------

 

after playing with inspyder and doing a sitemap on a subdomain i discovered when you have over 40,000 urls and need a sitemap index u have to manualy edit the index everytime to point to the proper place to find the sitemap files

heres what i mean

typicaly a sitemap index and sitemaps are at the www.yoursite.com/ root

mysite.com/sitemap.xml (index file ci=ontains urls of the other sitemaps)

mysite.com/sitemap1.xml (40,000 urls

mysite2.xml (40,000 urls)

now if you store your sitemkaps on sitemaps.mysite.com

when you create the sitemaps the index will contain www.mysite.com/sitemap1.xml www.mysite.com/sitemap2.xml

so what u need to do is open the index in dreamweaver or notepad

do a finsd and replace

replace www.mysite,xcom with sitemaps.mysite.com save and upload

 

Views: 3621

Reply to This

Replies to This Discussion

Good morning Soaringeagle,

Well I am happy to report its working! At least I think...I added my new subdomain site at Amazon to Google Webmaster tools-then verified with by uploading an html file to the site-THEN I was able to go to that site in Google Webmaster tools and upload my sitemap-yeah!

Anyways here is a screenshot of the new subdomain site in google webmaster tools an hour after submitting the sitemaps

Hey guys,


Is the proper sitemap a sure solution to solve the problem of the UGLY URLs?
http://creators.ning.com/forum/topics/google-indexes-mainly-ugly-ur... 

you will still get this its not a soly=ution but it will ensure that both versions of the url are crawled and found then google chiooses which url version to index  it might index the ugly version right away if it finds that 1ast but the clean version in most cases has more staying oiwer thats the actic=viry bfeed version tho some pages may get ugly urls if no good url is provided or seems something else mioght interfere too

it affects seo yea but not so drasticly that u wont get a reasonable listing

proper sitemaps do make u=it so googles always aware of every page  using the rss your only subk[mitting 20 urls at a time at a set update interval a proper sitemap submits the entire site everytime its updated so the google database has every page and deleted pages removed  its always aware of what was cjhanged how frequently and how long ago so crawls the fresher stuff far more l[often then pages that havent changed in a year

but then if 1 page does change that hasnt changed in years google knows its time to recrawl o=iit

the sir=temap catches both versions the activity feed and static url and dertermines 1 or the others a duplicate the status feed can be called a temporary url cause it wont be crawled next crawl only the static page url is so sticks more in google

and finaly u can set custom parameter removal or exlusions to exclude amny activity feed urls from the sitemap crawlers still will findand crawl them but theres a good chanv=ce they will fi=ollow whats in the sitemap more

very creative solution!

i hadnt even thought of that..good job

massive improvement

update your sitemap often and watch it double that!

personaly after 1 update finish i start it again so its running constantly uploading the update soon as its ready and start next run

Humm... something is going bad in my case and I don't know what :S

Some weeks ago, I generated the sitemap with Inspyders (contains more than 24 000 URLs) (+ I took out of the "sitemaps" in Google Webmaster Tools the "last activity feed" and I excluded "xn" in the URL parameters in GWT to solve the problem I got http://creators.ning.com/forum/topics/google-indexes-mainly-ugly-ur...).

The number of my indexed pages drastically dropped down :'( (from 25000 indexed pages to 5000)

(according to GWT, I have 105 submitted URL and 71 URLs are in the web index)

I included the sitemap in the robots.txt file as explained but I didn't hosted the sitemap in a subdomain of my domain. I'm hosting the sitemap in an other domain... (I hosted it in "example.com.pt" while my URL site is like "example.com"): does it make a big difference?

I still have to set automatically the update of the sitemap via FTP with Inspyders + follow the method of Espen Steenberg... but I'm now wondering if I did something wrong until now :S 

PS: the sitemap hosted on "example.com.pt" for my site "example.com" is not appearing on the "sitemaps" section of GWT: is it normal? is it a problem?

try removing the xn exclusiion the "ugly urkls" are not really anything to be convcerned about ..and depending how u excluded them wether u used parameter removal, or excluded them you may have prevebted them from being listed at all

i would just accept the urls as they are and just try to get as many urls lisrted as yu can

I made the jump to try this and have uploaded my sitemap as many have done here. The one thing I do notice and I would like someone to verify is this.

When I use the sitemap provided by ning whenever I post something it is immediatly seen by google. When I type in my website address and check for the last hour after posting my link show up pretty quickly. When I use this method none of my activity is showing up quickly on google. It sometimes shows up a day later or and even then I notice that some links to do show up. 

Is there a way to incorporate the ning default sitemap while still pointing to my sitemap posted outside of ning? I would assume that this would give me the best of both worlds but whenever I try and mix the two I get errors. Mind you I have no CSS or HTML background.

Maybe someone can test to confirm my findings and maybe offer a middle ground on how to achieve this.

Could I just keep the ning sitemap and just point the robot.txt to the sitemap on my exterior site with my inspider sitemap.xml?

RSS

Latest Activity

Fire-Tech replied to Phil McCluskey's discussion 'Site Manager Updates for Ning 3.0 Networks'
"Very very nice! Just the head tag alone makes me smile :) Great job...keep 'em coming!"
13 minutes ago
Phil McCluskey replied to Phil McCluskey's discussion 'Site Manager Updates for Ning 3.0 Networks'
"Hi Ningaholic, We've started working on Groups; there's still a few weeks to go, but they…"
17 minutes ago
Phil McCluskey replied to Phil McCluskey's discussion 'Site Manager Updates for Ning 3.0 Networks'
"Hi SP, Yes, right now it's a case of being in the blog you want to post to in order to add…"
19 minutes ago
Coach Jon HighPoint IELTS Admin replied to Coach Jon HighPoint IELTS Admin's discussion 'Sandbox Project Report: HighPoint IELTS Prep' in the group The Sandbox
"Good news! It seems that our request was granted. :)"
20 minutes ago
Coach Jon HighPoint IELTS Admin replied to Phil McCluskey's discussion 'Site Manager Updates for Ning 3.0 Networks'
"Whoa! @SweetPotato and I were talking about this a couple of weeks ago. Thanks for the quick…"
22 minutes ago
Larry Matthews replied to Larry Matthews's discussion 'Anyone know how to fix Table Alignment ?'
"Thank you, will try now."
32 minutes ago
James Hawkins replied to John Bizley's discussion 'I have a Question about the Social Channels ( youtube, vimeo )' in the group The Sandbox
"Hi John, Your site is looking great. However when I go to the YouTube page and click any of the…"
37 minutes ago
Apostle Solael replied to Allison Leahy's discussion 'Changes to Ning 3.0 Based on Your Feedback'
"Okay, thank you."
1 hour ago

© 2013   Created by Ning.

Badges  |  Report an Issue  |  Terms of Service