Avoiding Duplicate Content on Inner Pages
We are going to talk about NOT how to avoid duplicate content on already existing pages but how to avoid it, if for example we have bought old (expired) domain that has many indexed pages that would result in 404 error onaour site.
Some webmasters place the following code in .htaccess file:
ErrorDocument 403 index.php
ErrorDocument 404 index.php
which creates the problem with duplicate content, as these directives redirect all users that visited forbidden and not found pages to the home page but keeping the URL unchanged. Another method used is to redirect the users to specially designed 403 and 404 pages by placing that code:
ErrorDocument 403 /error403.html
ErrorDocument 404 /error404.html
which is not an elegant solution because instead of home page users are internally redirected to another page keeping again the URL unchanged.
If you like this method you better change the redirect from internal to external like this:
ErrorDocument 403 http://yourdomain.com/error403.html
ErrorDocument 404 http://yourdomain.com/error404.html
Sometimes hosting companies offer custom 403 and 404 pages. I use this tricky solution:
ErrorDocument 403 http://www.sajta-mi.com/
ErrorDocument 404 http://www.sajta-mi.com/
by external redirect (R=301) we “tell” the search engines, that these pages no more exist unlike first two examples.
What shall we do with URL parameters like q=, page=, id=, etc., which are very persistent? The solution:
We copy the code from previous article for avoiding the duplicate content on home/index page:
Options +FollowSymlinks -Indexes
RewriteEngine On
RewriteCond %{HTTP_HOST} ^yourhost.com$ [NC]
RewriteRule ^(.*)$ http://www.yourthost.com/$1 [R=301,L]
RewriteCond %{THE_REQUEST} /index.php HTTP/
RewriteRule ^index.php$ / [R=301,L]
and continue on the next line:
RewriteCond %{QUERY_STRING} ^page=.*$ [OR]
RewriteCond %{QUERY_STRING} ^q=.*$ [OR]
RewriteCond %{QUERY_STRING} ^id=.*$
RewriteRule .* %{REQUEST_URI}? [R=301,L]
and parameters issue is solved. You may add OR remove lines with other parameters.
Keep in mind that you have to check very well the names of the parameters. Otherwise if you have installed applications using such parameters, they wont work properly!
2 Comments
Enter this code
Avoiding Duplicate Content on Index/Home Page
A you might know, or not know, http://www.site.com, http://www.site.com/index.php, site.com and site.com/index.php are 4 different pages for search engines, although you see the same page.
To avoid the problem with duplicate content you have to have: Apache server r…
Trackback :: March 2, 2008 @ 110:01 am
[...] We apply the advises from Avoiding Duplicate Content on Inner Pages [...]
Pingback :: March 10, 2008 @ 22:26 am