MediaWiki: Why are my image description pages not indexed by Google?
I now host and operate several wikis based on the MediaWiki software. Using the Google Webmaster Tools (WMT) I recently saw that none of my images hosted in those wikis are indexed by Google. To be more precise: The description pages of the images are not indexed. All configuration seemed to be Google-friendly, for instance, a robots.txt existed and was configured correctly, an XML sitemap (including the image description pages) was sent to Googles WMT, no "noindex" or "nofollow" code was in the pages, etc.
So, why is Google not indexing my images?
During quite some research I found several other people describing in blog posts and discussion forum threads that they had similar problems. It took me quite some while to get a first real clue, though. Note that the URL of the description pages follows the same structure all the time: www.domain.com/wiki/Name.jpg Ok, that's only true if you use URL rewriting to get such "nice" URLs - if you use the basic URLs with index.php and a bunch of parameters to that, you might not have the problem in the first place. The important thing to note when looking at the URL: even though this is supposed to be an HTML page (describing the image), it ends in ".jpg" - as if it would actually be an image file. Apparently, google thinks that it is an image file and does not load and parse (and then, index) it.
I didn't find any simple solution for this without hacking Mediawiki source code.
There is one quite simple solution, though. You can adapt the URLs by simply adding a slash character ("/") to ALL URLs of your wiki. Then, www.domain.com/wiki/Main_Page becomes www.domain.com/wiki/Main_Page/ and media file description pages end up like www.domain.com/wiki/Name.jpg/ - which is a nice thing, as Google obviously supposes those URLs to be directories, thus loads and indexes those. After changing my wiki's URLs that way, the description pages were index after a few days.
To configure your wiki that way is quite simple. I give the description here for the standard way to rewrite MediaWiki URLs using mod_rewrite - if you used another technique, you have to adapt it to that tool.
In the file "LocalSetting.php" in your wiki directory there is a variable $wgArticlePath that you have to adapt. I added the slash there, so set it to:
$wgArticlePath = "/wiki/$1/";
That makes MediaWiki format all URLs correctly with the additional slash. Obviously, you also have to modify the mod_rewrite code, that resides in the .htaccess file:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^wiki/(.*)/ w/index.php?title=$1 [PT,L,QSA]
RewriteRule ^wiki/*$ wiki/Main_Page [L,QSA]
RewriteRule ^/*$ wiki/Main_Page [L,QSA]
Additionally, I added a few lines to rewrite the old URLs (without the slash), so that older links to my site don't end in nirvana but are forwarded to the new location. Acutally, I use a 301 http redirect to the new site, so I don't lose any link power. The code is not perfect, but works in most cases (it does not work when a slash is included in the article name itself):
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^wiki/(.*)$ http://www.domain.com/wiki/$1/ [R=301,L]
Then, your site will be index by google as well. Have fun!
So, why is Google not indexing my images?
During quite some research I found several other people describing in blog posts and discussion forum threads that they had similar problems. It took me quite some while to get a first real clue, though. Note that the URL of the description pages follows the same structure all the time: www.domain.com/wiki/Name.jpg Ok, that's only true if you use URL rewriting to get such "nice" URLs - if you use the basic URLs with index.php and a bunch of parameters to that, you might not have the problem in the first place. The important thing to note when looking at the URL: even though this is supposed to be an HTML page (describing the image), it ends in ".jpg" - as if it would actually be an image file. Apparently, google thinks that it is an image file and does not load and parse (and then, index) it.
I didn't find any simple solution for this without hacking Mediawiki source code.
There is one quite simple solution, though. You can adapt the URLs by simply adding a slash character ("/") to ALL URLs of your wiki. Then, www.domain.com/wiki/Main_Page becomes www.domain.com/wiki/Main_Page/ and media file description pages end up like www.domain.com/wiki/Name.jpg/ - which is a nice thing, as Google obviously supposes those URLs to be directories, thus loads and indexes those. After changing my wiki's URLs that way, the description pages were index after a few days.
To configure your wiki that way is quite simple. I give the description here for the standard way to rewrite MediaWiki URLs using mod_rewrite - if you used another technique, you have to adapt it to that tool.
In the file "LocalSetting.php" in your wiki directory there is a variable $wgArticlePath that you have to adapt. I added the slash there, so set it to:
$wgArticlePath = "/wiki/$1/";
That makes MediaWiki format all URLs correctly with the additional slash. Obviously, you also have to modify the mod_rewrite code, that resides in the .htaccess file:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^wiki/(.*)/ w/index.php?title=$1 [PT,L,QSA]
RewriteRule ^wiki/*$ wiki/Main_Page [L,QSA]
RewriteRule ^/*$ wiki/Main_Page [L,QSA]
Additionally, I added a few lines to rewrite the old URLs (without the slash), so that older links to my site don't end in nirvana but are forwarded to the new location. Acutally, I use a 301 http redirect to the new site, so I don't lose any link power. The code is not perfect, but works in most cases (it does not work when a slash is included in the article name itself):
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^wiki/(.*)$ http://www.domain.com/wiki/$1/ [R=301,L]
Then, your site will be index by google as well. Have fun!
Kommentare