so theres been a few discussions about sitemaps and a proper sitemap contains way more urls then the ning sitemap
1st lets examine the purpose of a sitemap
a sitemap should list every page in your site you want crawled (excluding the ones u have blocked in robots.txt)
a sitemap optionaly should include priority and change frequency to help guide the bots to crawl high priority pages and those that change more frequently more often
what nings sitemap does
lists the features only typicaly 10 out of 900,000 urls it should list
crawlers can get lost in 10 year old unimportant content visiting useless "share" pages etc and only finding the important content by pure luck
some pages may never get found
OliverJonCross recently added a proper sitemap and in just a couple days noticed a 20-25% search traffic increase (i expect in 2 weeks that will rise much much higher especialy if the sitemaps updated often)
everytime you update a sitemap and ping the server announcing the new sitemaps are updated the search engines will download the new sitemaps and add the urls to the lists to be crawled
How to do a proper sitemap until ning offers a way to do 1
you need a hosting account it can be any one you already have that allows ftp or any other way to upload files (nings file manager will not work)
a hosting account here will allow multiple sites for under $3 a month so u can use it for adding on to your site with subdomain extentions (i made other posts about this elsewhere) or you can use any existing hosting account or if you have a pc with static ip and iis or linux server you can even host it yourself
to host your sitemap on any other hosting account you must add a line to your rov=bots.txt
now in this example i do host my sitemap on my ning site in the root because i do still have webdav access which ning discontinued
but to host it on any other site or server simply change the url to your sitemap
sitemaps can grow very large, over 49,999 urls google will reject it for too many urls
however usualy google will reject it sooner for size limmits so after 40,000 urls you want to create a sitemap index
sitemap indexes consist of multiple sitemaps that li]ook like this
sitemap.xml (or gz) this is your index3 urls (for a 120k page site example the 3 urls are to the 3 sitemaps not pages)
sitemap1.xml or gz 40,000 urls
sitemap2.xml (or gz) 40,000 urls
sitemap3.xml or gz 40,000 urls
so now you may have noticed 2 file extentions .xml or .gzxml sitemaps average 6000kb gz compression makes them about 600kb
gz reduces server loads give you much faster uploads and the search engines download them faster as well as save space if thats a concern
i hope someday soon ning provides a php based server side option
idealy it would have set it and forget it controls like this
exclusions (with an import robots.tcxt ioption and then customized filters from there)
priority and change frequency filters
idealy directory based filters as in
/forums/ priority 1 change freq always so every page in forums or deeper (/forums/topics/any-page-here/) will have that priority and change freq
with the option to micromanage adding priority to specific topics
aurtocrawl time a time of day to autocrawl your site update the sitemap and ping servers maybe with a "crawl at low traffic threashold" option
option to choose xml/gz
option to include mobile sitemap (to crawl mobile site somewhere u need a link to /m/main/ as a starting point)
image sitemaps useless cause of the api.ning image handeling so no need for uit (although api.ning should have its own sitemap)
until ning offers a server side sitemap crawler
you will need a client side crawler i tried several..dozens inspyders the 1 i use cause it works the best and has a autorun in the background as a windows task feature so once you set it up and run it once u schedule it and forget it (as long as u have ftp i cant do that with webdav but all my other sites autoupdate fine)
install the crawler set your options do aa few crawls to "fine tune" it setting priorities crawl speed etc when you have a fast crawl speed with no timeouts schedule the crawls with plenty of time to finnish between crawls
this options slow and uses network respourses and server bandwidth
a crawl can take 24-48 hours for large networks a server side option would crawl the servers filesystem using no bandwidth in seconds to minutes
so an hourly updae would certainly be possible but might cause too much server disk activity
but a daily update at off peake load times i a very good option
ifyou try a proper sitemap and see a earch engine traffic increase please post your results here and track the increase over a 30 day period i will be interestd to see just how much more traffic you get by the end of the month
edit to add:
after playing with inspyder and doing a sitemap on a subdomain i discovered when you have over 40,000 urls and need a sitemap index u have to manualy edit the index everytime to point to the proper place to find the sitemap files
heres what i mean
typicaly a sitemap index and sitemaps are at the www.yoursite.com/ root
mysite.com/sitemap.xml (index file ci=ontains urls of the other sitemaps)
mysite.com/sitemap1.xml (40,000 urls
mysite2.xml (40,000 urls)
now if you store your sitemkaps on sitemaps.mysite.com
so what u need to do is open the index in dreamweaver or notepad
do a finsd and replace
replace www.mysite,xcom with sitemaps.mysite.com save and upload
Good morning Soaringeagle,
Well I am happy to report its working! At least I think...I added my new subdomain site at Amazon to Google Webmaster tools-then verified with by uploading an html file to the site-THEN I was able to go to that site in Google Webmaster tools and upload my sitemap-yeah!
Anyways here is a screenshot of the new subdomain site in google webmaster tools an hour after submitting the sitemaps
Is the proper sitemap a sure solution to solve the problem of the UGLY URLs?
you will still get this its not a soly=ution but it will ensure that both versions of the url are crawled and found then google chiooses which url version to index it might index the ugly version right away if it finds that 1ast but the clean version in most cases has more staying oiwer thats the actic=viry bfeed version tho some pages may get ugly urls if no good url is provided or seems something else mioght interfere too
it affects seo yea but not so drasticly that u wont get a reasonable listing
proper sitemaps do make u=it so googles always aware of every page using the rss your only subk[mitting 20 urls at a time at a set update interval a proper sitemap submits the entire site everytime its updated so the google database has every page and deleted pages removed its always aware of what was cjhanged how frequently and how long ago so crawls the fresher stuff far more l[often then pages that havent changed in a year
but then if 1 page does change that hasnt changed in years google knows its time to recrawl o=iit
the sir=temap catches both versions the activity feed and static url and dertermines 1 or the others a duplicate the status feed can be called a temporary url cause it wont be crawled next crawl only the static page url is so sticks more in google
and finaly u can set custom parameter removal or exlusions to exclude amny activity feed urls from the sitemap crawlers still will findand crawl them but theres a good chanv=ce they will fi=ollow whats in the sitemap more
very creative solution!
i hadnt even thought of that..good job
update your sitemap often and watch it double that!
personaly after 1 update finish i start it again so its running constantly uploading the update soon as its ready and start next run
Humm... something is going bad in my case and I don't know what :S
Some weeks ago, I generated the sitemap with Inspyders (contains more than 24 000 URLs) (+ I took out of the "sitemaps" in Google Webmaster Tools the "last activity feed" and I excluded "xn" in the URL parameters in GWT to solve the problem I got http://creators.ning.com/forum/topics/google-indexes-mainly-ugly-ur...).
The number of my indexed pages drastically dropped down :'( (from 25000 indexed pages to 5000)
(according to GWT, I have 105 submitted URL and 71 URLs are in the web index)
I included the sitemap in the robots.txt file as explained but I didn't hosted the sitemap in a subdomain of my domain. I'm hosting the sitemap in an other domain... (I hosted it in "example.com.pt" while my URL site is like "example.com"): does it make a big difference?
I still have to set automatically the update of the sitemap via FTP with Inspyders + follow the method of Espen Steenberg... but I'm now wondering if I did something wrong until now :S
PS: the sitemap hosted on "example.com.pt" for my site "example.com" is not appearing on the "sitemaps" section of GWT: is it normal? is it a problem?
try removing the xn exclusiion the "ugly urkls" are not really anything to be convcerned about ..and depending how u excluded them wether u used parameter removal, or excluded them you may have prevebted them from being listed at all
i would just accept the urls as they are and just try to get as many urls lisrted as yu can
I made the jump to try this and have uploaded my sitemap as many have done here. The one thing I do notice and I would like someone to verify is this.
When I use the sitemap provided by ning whenever I post something it is immediatly seen by google. When I type in my website address and check for the last hour after posting my link show up pretty quickly. When I use this method none of my activity is showing up quickly on google. It sometimes shows up a day later or and even then I notice that some links to do show up.
Is there a way to incorporate the ning default sitemap while still pointing to my sitemap posted outside of ning? I would assume that this would give me the best of both worlds but whenever I try and mix the two I get errors. Mind you I have no CSS or HTML background.
Maybe someone can test to confirm my findings and maybe offer a middle ground on how to achieve this.
Could I just keep the ning sitemap and just point the robot.txt to the sitemap on my exterior site with my inspider sitemap.xml?