Robots

This page is under construction. For PmWiki v2.1 (or later) it is supposed to describe "robots" in general, and the settings available to control them.

Nofollow links

The default PmWiki skin adds "rel=nofollow" to all the available actions (edit, diff, upload, and print action links at the top and bottom of the page). This is a convention introduced by Google and it tells the robot crawler not to give any weight to the content accessed by "nofollow" links. [1] Note that many robots, including Yahoo! Slurp and msnbot, completely ignore the nofollow attributes on links.

The anti-spam convention that first introduced "nofollow" doesn't say anything about robots not following the links, only that links with rel="nofollow" shouldn't be given any weight in search results.

you can also add rel=nofollow in a wiki style, viz %rel=nofollow%, to specify

The file robots.txt

Here's one example of the contents of a robots.txt, that primarily would tell robots to ignore links containing 'action'.:

User-agent: *
Disallow: /pmwiki.php/Main/AllRecentChanges
Disallow: /pmwiki.php/
Disallow: */search
Disallow: SearchWiki
Disallow: *RecentChanges
Disallow: RecentChanges
Disallow: *action=
Disallow: action=
User-Agent: W3C-checklink
Disallow:

Robot variables

We can now provide better control of robot (webcrawler) interactions with a site to reduce server load and bandwidth.

$RobotPattern
used to detect robots based on the user-agent string
$RobotActions
any actions not listed in this array will return a 403 Forbidden response to robots
$EnableRobotCloakActions
setting this flag will eliminate any forbidden ?action= values from page links returned to robots, which will reduce bandwidth loads from robots even further (PITS:00563).
$MetaRobots
see layout variables

See also


This page may have a more recent version on pmwiki.org: PmWiki:Robots, and a talk page: PmWiki:Robots-Talk.