• Recent Post

    What Is Robots.txt and Its Uses in SEO?



    Robot.txt is a text file, webmasters create to instruct web robots (typically Search Engine robots) how to crawl pages on their website…

    Robot.txt file is part of the robots exclusion protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content, and serve that content up to users.

    The Robots Exclusion Protocol (REP) also includes directives like Meta robots, as well as Page-, Sub Directory-, or Site-Wide instructions for how search engines should treat links (such as “Follow” or “noFollow”).

    In practice, robot.txt files indicate whether certain user agents (Web-crawling Software) can or can’t crawl parts of a website. These crawl instructions are specified by “disallowing” or “allowing” the behavior or certain (or all) user agents.



    Basic Format: 

    User-agent: [user-agent name]
    Disallow: [URL String not to be crawled]

    Here is a simple robots.txt file with two rules, explained below:

    # Rule 1
    User-agent: Googlebot
    Disallow: /nogooglebot/

    # Rule 2
    User-agent: *
    Allow: /

    Sitemap: http://www.example.com/sitemap.xml

    Explanation:

    1. The user agent named "Googlebot" crawler should not crawl the folder http://example.com/nogooglebot/or any subdirectories.

    2. All other user agents can access the entire site. (This could have been omitted and the result would be the same, as full access is the assumption.)

    3. The site's Sitemap file is located at http://www.example.com/sitemap.xml

    Source: https://support.google.com/webmasters/answer/6062596?hl=en

    No comments