The Internet Archive discovers and captures web pages through many different web crawls.
At any given time several distinct crawls are running, some for months, and some every day or longer.
View the web archive through the Wayback Machine.
This crawl was run with a Heritrix setting of "maxHops=0" (URLs including their embeds)
Survey 7 is based on a seed list of 339,249,218 URLs which is all the URLs in the Wayback Machine that we saw a 200 response code from in 2017 based on a query we ran on Feb. 1st, 2018.
The WARC files associated with this crawl are not currently available to the general public.