a common problem with large websites that are driven by content management systems (CMS) is the case of over indexation.
what the heck is that?
a website can end up hurting itself based on how its URL strings are generated. i’ll explain…
let’s say you visit the following site:
then you click on a category page called Category 1 and the resulting URL is:
now let’s say that you visit the home page, go to a different category page (maybe Category 2), then from that category page you click on the first category page (Category 1). but this time, the exact same category page of content loads but the URL string now reads:
so you really have, because of the different URL string, two versions of the exact same category page.
this is not only common with CMS driven websites but also very common with various types of website user tracking solutions (a method to track a user’s path through a site).
this problem also exists with websites that use session id’s to track a user through a site and append the URL strings with unique session id numbers. so that every new visit by a user or a search engine spider for example would generate a brand new iteration of every page they follow because the URL string for a static page looks different every time. such as:
both of these URL’s are the exact same page but because of the appended session id string at the end, the engines see these as multiple pages, not one.
so what’s the problem?
the problem is that search engines see multiple versions of a single page and:
a. think you are trying to submit more than one copy of a page in order to spam the index (not nice)
2. the engine must now try and determine which version of the page is the most important or most relevant (diluting the effectiveness of the page as a whole)
you should never let the engines try and decipher on their own what version of a page is the right version of a page.
each page of content on your site should have a single URL assigned to it and should be unique from the other pages on your site.
if you find that you are able to overcome your URL obstacles, don’t forget to 301 redirect the legacy iterations of pages to the proper version upon fixing your site.