As I wanted to get some  experience with Episerver plugin I started this project. This package allow site admin to add/edit/delete “robots.txt” file for Episerver site(s). Requirements  are as below:

  1. Provide “/robots.txt” handler
  2. Support multi-site
  3. Support multilingual
  4. Support multi-channel (e.g. for mobile channel provides different robots.txt)
  5. Easy admin area
  6. Support “default” robots.txt
  7. For UAT site disallow robots to  crawl the site (using web.config to override all  existing configs)
  8. Give some basic analytics data
  9. Plugin is based on MVC (for my own learning purpose only!)

Based  on above I broke it down to into three phases

Phase 1:

  1. Provide “/robots.txt” handler
  2. Support multi-site
  3. Plugin is based on MVC (for my own learning purpose only!)

Phase 2:

  1. Easy admin area
  2. Support “default” robots.txt
  3. For UAT site disallow robots to  crawl the site (using web.config to override all  existing configs)

Phase 3:

  1. Give some basic analytics data
  2. Support multilingual
  3. Support multi-channel (e.g. for mobile channel provides different robots.txt)

I already release the first RC for Phase 1 and in this tutorial, I will try to explain the challenges and what I learned. You  can access the  repo via:

https://github.com/zanganeh/RobotsTxtHandler

More than happy to get feedbacks on githib. Next step I will describe the architecture and base of plugin, nuget  package and MyGet integration!

2 thoughts on “RobotsTxtHandler project (part 1)

  1. Hi Aria! I definitely see the value in letting (power) users have access to the robots file.
    But with more and more environments being deployed automatically (e.g. Octopus Deploy), you run a risk of having your robots.txt overwritten, or deployed in another folder – meaning the user’s changes won’t take effect. (This is the reason many developers include the robots.txt as a transformed file for each deployment environment).
    Any thoughts on how your module would handle this?

  2. Hi Arild,

    Great point you raised. Just to clarify package handles robots.txt and using Dynamic Data Store to store and request which is coming to IIS will be handled by HTTP Handler and package is not using physical I/O file but uses stored data in DDS to show robots.txt. Having said that you may want to turn this off for UAT and Dev machine (to make sure crawler is not crawled internal URL) we can override the value of DDS using web.config (I’m working on it). Hope it can help to clarify.

    Aria

Leave a Reply

Your email address will not be published. Required fields are marked *