What Do Seeders Do?

Seeders canvass the resources of a given government agency, identifying important URLs and whether those URLs can be crawled by the Internet Archive's web crawler. They use the EDGI Nomination Chrome extension to nominate URLs to the End of Term (EOT) Web Archive if they are crawlable or to the Archivers app if they require manual archiving.

For the New Haven event, we are focusing on the Office of Water, which has a primer available in this Google document. For Seeders, checking out pieces using the workflow document for the Office of Water is very important.

If you have identified that the government web site goes deeper than what is identified in the primer, please seed these deeper web pages and subunits! No primer will be perfect — we want to make sure that as many pages are reviewed as possible!

Recommended Skills
Consider this path if you’re comfortable browsing the web and have great attention to detail. An understanding of how web pages are structured will help you with this task.

Choosing the Website

Seeders use the EDGI Archiving Primers, or a similar set of resources, to identify important and at-risk data. We are doing the Office of Water in the EPA.

Canvassing the Website and Evaluating Content

Crawlable URLs

Wherever possible, add in the Agency Office Code from the sub-primer. Talk to the DataRescue organizers to learn more.

Uncrawlable URLs

  • If URL is judged not crawlable, check one of the checkboxes next to the four types of uncrawlables in the Chrome Extension. This will add the URL to the Researching queue in the Archivers app.
  • The URL will be automatically associated with a universal unique identifier (UUID).
  • You can check whether the page or some files are archived using the Internet Archive's Wayback Machine Chrome Extension

Not Sure?

  • This sorting is only provisional; when in doubt, Seeders nominate the URL and mark it as possibly not crawlable.