information gathering – robots.txt

Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.
Often used to disallow parts of there site apering in search results. These manually discovered pages can often be viewed for more information.

The “/robots.txt” file is a text file, with one or more records. Usually contains a single record looking like this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/

User-agent: *
Disallow: /

The “User-agent: *” means this section applies to all robots.
The “Disallow: /” tells the robot that it should not visit any pages on the site.

User-agent: *
Allow: /

The “Allow: /” tells the robot that it can visit any pages on the site.

To view if a site has a robots.txt file simply attach the robots.txt request to the end of a url

A result of

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/

a viewing example

Downloading the robots.txt file is not required but if wanted please read bellow.

If you wish to download the options bellow will help.

Option 1

wget http://url/robots.txt

Option 2 – with protection

proxychains wget http://url/robots.txt

robots.txt is downloaded to /root folder.

Option 3 – Using Nmap to view robots.txt

nmap –script=http-robots.txt.nse (url or ip)

tools used can be read from the bellow url’s


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s