Many years ago, a projected called “The Way Back Machine” (archive.org) was started. It was a simple concept, act as a search engine by indexing internet content, but do it by a timeline. This way, archive.org can show us what a website looked like 3 years ago, 5 years ago, 10 years ago and so on.
How this relates to security, is that archive.org has actually indexed many web pages that were not meant to be seen by the public. Even worse, in some cases it has indexed websites that have sensitive data. A threat actor could search through this public information to gain company data exposure, as well as find data not intended for the public.
Why Use The Way Back Machine (archive.org)
There are times when a site is in a state of error. In such cases, especially if it is a development or test environment, there are possibilities that an error or log output is captured and indexed by archive.org. It is important to run routine validation checks on your company and it’s various domains (and subdomains), to ensure any data leaks are plugged (by asking archive.org to remove such pages from their indexing).
Why Subdomains Are Important
When using the “Way Back Machine” on a live website, there will be hundreds or thousands of entries. Most of these will be clear of errors and if errors appear most sites will sanitize error output.
However, in the case of developer and test environments, they often do adhere to the same standards. It is common to have a machine in a down state for several days. Loading the web application on such a machine may produce log output or even sensitive data (database scheme, file locations and more).
Subdomain Searching
While you may know the subdomains in your organization, there might be domains you didn’t know existed. Predating your employment, someone may have created a random string of a domain that exposes something it shouldn’t. Various utilities will discover these hidden gems.
Acting as a red team threat, you want to go in with a black-box mentality. In this way, acting as an “outsider” you want to discover what a threat actor could discover on their own, without insider information.
There are many domain utilities that attempt to gather recon and intel on TLD’s, in the hopes of identifying various subdomains. Brute force utilities (like dirb) run through a dictionary of words, until they get a positive 200 OK response. This can map out a variety of subdomains on a system.
Today, a lot of websites prevent brute force directory tactics like that, yet tools have found new ways to work around them.
Maltego CE
One way to get subdomain lists, is to use Maltego’s Community Edition (which is free). By adding a Domain entity to a document, it can be iterated over using a Maltego “machine,” which produces a network visualization, including subdomains.
The way to do this, is to open a new Maltego document, and in the left side panel drag over the Domain Entity from the Entity Palette. Rename the Domain Entity to your domain in test. Right click the domain entity and choose Machine > Footprint L1 other Footprint levels will give more data, but will take more time. You can try them out as well.
Maltego will then spit out an entire network graph of domains, subdomains, network interfaces and more. This is all public information that Maltego is iterating over and building visual relationships.
Wayback Machine (archive.org)
To speed up the process, we can add the “Way Back Machine” transformer into Maltego CE. Within Maltego, clicking on “Transforms” and then “Transform Hub” will show all the transformers currently available. Scroll down until you find Wayback Machine. Click through the prompts to install it.
Once installed, we can right click any subdomains discovered to search through the Wayback Machine content. There’s just one problem: We can’t run a Wayback Machine on a DNS Entity as it only works on Domain Entities.
Drag over a new “Domain Entity” and rename it to one of the sub domains listed as a revealed DNS entity. So if your initial scan uncovered test.[your site].com as a DNS entity, we make a new Domain Entity and rename it to this found subdomain (i.e. test.[your site].com)
Right Clicking now will offer the Wayback Machine entry. This has several options:
- To Snapshot
- To Snapshots Between Dates
Shifting from Maltego to archive.org
If a tester finds some indexed events of a dev/qa/test subdomain, via Maltego, we can move to archive.org and continue this further. It’s easier to use their website to filter down to URLs, timelines and more.
Calendar View
Below is the default timeline on web.archive.org. The domain searched for, has a lot of activity. If we wanted to see what this website looked like in 2001, we can click on that year, and the results below will filter. Below the timeline are dates of recorded indexes. A blue dot is an indexed event. A green dot is a redirect and an orange dot is a 4xx type error.

Changes View
Below is the changes view. This view shows changes between indexes (from low amount of change, to high amounts of change). Like the Calendar, clicking on a colored square brings up that index screenshot.

Summary View
This is perhaps the most intriguing of the tabs. It will allow the viewer to filter results by MIME type (images, PDF, etc.).

Site Map View
This view is a circular visualization that shows various categorized URLs added or removed from a domain.
URL View
This view shows the various URLs as a list. The list can be filtered by keyword or mime type. For example, the keyword “error” can produce a filtered result of pages of your companies site, that should be removed.