.txt files

Interesting Website .txt Files

From an OSINT perspective there are various interesting .txt files that websites can utilize that perform various functions and they contain a lot of useful information.

robots.txt

This text file, is found in the root directory of a website (e.g. https://example.com/robots.txt)

This file gives instructions to bots, crawlers, and spiders using the Robots Exclusion Protocol. [Source: Robotstxt.org ]

This file instructs what a bot, like GoogleBot, can, or cannot, include in search engines. This does not mean all bots respect the robots.txt file.

To exclude different directories in a web application, maybe some that are sensitive (e.g. /super-sensitive-secret-info/ ), by listing it in the robots.txt file to disallow it, you are letting the world know that this directory exists. From an offensive security perspective, this is juicy info.

ai.txt

This text file is newer and behaves like robots.txt. Where the two files differ is that ai.txt is used to instruct A.I. bots what files are allowed, or disallowed, from scraping to add to their data sets to train on. This file includes directories you want to allow/disallow the bots from accessing. This means that someone reading the file will see what directories might contain juicy info.

security.txt

This text file provides information to security researches about who to contact when reporting discovered vulnerabilities along with other things like disclosure policy.

There’s interesting information in this file like contact information, expiration date, public encryption key, and links to different policies.

ads.txt

“…a text file that companies can host on their web servers, listing the other companies authorized to sell their products or services. This is designed to allow online buyers to check the validity of the sellers from whom they buy, for the purposes of internet fraud prevention.” [Source: Wikipedia ]

This is interesting because this is showing, potentially, part of company’s supply chain. You’ll also find domain names, subdomains, different IDs, and contact information in this file.

Typically all of these .txt files reside in the root directory of a website. From an offensive security view, these files are sometimes rich with information bad actors can use, so it is best to ensure your digital properties have the appropriate security measures in place to help prevent unauthorized access to your systems.

Sign up for our free cybersecurity and OSINT newsletter to get news tips, tricks, tools, and more in your inbox.