The True World Politics Conversation has moved.
Join the Conversation on:
The Independent, Free Speech Social Media Platform

Tutorial for Archiving Web Pages as Dynamic PDF files

Part of a series on Information Warfare

25 Dec 2020

Video Title:
Tutorial for archiving web pages as dynamic PDF files - part of a series on Information Warfare

Runtime: 00:14:33 | Original Video Source:

Downloadable mp4 Mirror:

TWP Archive Date: 25 Dec 2020



When we as members of the public want to hold our institutions and leaders accountable for their actions and public statements, or we simply want to preserve important information from the web before it disappears, or is "disappeared".

We need to preserve and store those web pages offline in an easily transportable and accessible format that can be saved to disk or USB drive as a regular standalone file, that can be e-mailed and even be printed to paper if needed.

Simply saving an internet shortcut to the web page or copying and pasting a link into an e-mail is not good enough, because of the increasingly ephemeral nature of the internet.

This video tutorial will show you how to save web pages as PDF documents to make effective offline archives of web pages.

These saved PDF files will have copy- able and searchable text, and have preserved embedded hyperlinks that can be clicked from the PDF document, to open the corresponding web page in your web browser.

We will go over best practices and learn how to scale the web image so that it fits best on the PDF page, and how to remove unwanted elements from the page like advertising and pop up windows, and also add comments, stamps and our own hyperlinks to the page.

System requirements: This process is designed for, and tested to work with Windows based desktop computers.

Browser requirements: This process works best with Google Chrome Browser variants like ''Dissenter'' or ''Brave'' that have built in ad blockers. However,
Google Chrome version 87 or newer, will work fine for most web pages. The use of Mozilla Firefox is not recommended because it will not preserve hyperlinks.

PDF file editor requirements: This process works best with ''Nitro Pro'' PDF editor, but Adobe Acrobat X Pro or newer could also be used.

Step one:

Selecting the correct parameters and adjusting the image size

Locate the main browser menu and select print, or press Ctrl P to bring up the print dialog window.

Now select the print destination from the drop down menu.

Your selections may vary in the print destination menu depending on what printers and software you have installed on your computer.

Select Save as PDF. This selection seems to work better than the others when it comes to preserving hyperlinks and some embedded page metadata. It also automatically generates a file name for the PDF based on the web page title.

For pages, select All.

For Layout, Portrait is recommended because most web pages are designed to display best this way, additionally this page orientation will yield the best results when printing to paper.

Next click on the More Settings menu to open it.

For paper size select Letter for American Standard 8 ½ inch by 11 inch paper; or select whatever paper size is most common where you operate.

For pages per sheet, leave this set to one; as it maintains proper proportions of the web page. This can always be changed in the future when re-printing these documents.

For Margins, always leave this set at Default to preserve the standard margins around the edges of the printed page that all printers are known to be able to print. If you change this setting, some printers may not be able to print the page on one sheet of paper correctly.

For Scale select Custom, this allows you to scale the image from 10% to 200% in order to optimize page display for a particular page, and also allows you to manipulate where page breaks occur, in order to maintain paragraphs and graphics positioning. Many web pages will display adequately at 100%, but adjusting this setting from 65% to 125% is often required to get the best results on some web pages. It is not advisable to scale pages below 60%, doing so often makes the page difficult to read when printed to paper.

For Options:

Headers and footers check box. Always, always, keep the Headers and footers box checked. This option saves the date, page title, and page URL to the headers and footers of each printed page. This information is absolutely crucial for maintaining good historical archives and identifying sources of information.

For Background graphics check box: in most circumstances you would keep this box checked to preserve the background graphics of a page. However, some web pages are poorly design and utilize graphics that are not formatted to the page correctly, causing them to obscure text or other page elements. Un-checking this box can eliminate this problem.

Now that you are familiar with the page settings and parameters of the print dialog window, you can make adjustments and observe the changes to the output in the print preview window, before committing to saving the PDF file.

Best practices for PDF file names.

When naming PDF archive files you should avoid all punctuation, spaces, and special characters, and only use lower case letters and numbers. By preserving this UNIX based naming convention, it makes it easier to search for this document on your own computer, and also makes it easier to publish and search on the web.

The easiest and best way to select a UNIX formatted file name for a PDF archive file, is by using the original server file name of the web page that you printed it from. Simply go to the address bar of your browser and locate the last trailing slash in the URL and copy everything until the first dot. Usually this dot will be followed by one of the common page types; HTML, PHP, or it may not have a dot at the end.

Best Practices for creating PDF files that are more likely to be authenticated as true captures of a web page.

When you are trying to archive information, you need to think like an archaeologist in the way that a dig site is preserved and everything is cataloged. Just saving the target information itself, without context makes it nearly impossible to authenticate. You need to preserve as much of the page and it's elements as possible. The meta data saved in decorative images, advertisements and other non targeted page elements often gets encoded into the PDF file when it is saved, this meta data from unwanted page elements can be used to corroborate the authenticity of the PDF file as an accurate capture of the web page at a particular time. You should make every effort to preserve the entire page.

However, if the non targeted elements of the web page greatly obscure or confuse the target information, it may be desirable to eliminate them.

There are a few tricks you can use to accomplish this.

We mentioned, unchecking the background graphics checkbox before.

Another effective trick is Selected printing

While pressing and holding the left mouse button, Select and highlight the contiguous text and images that you want saved in your PDF file. When this is complete, move the mouse cursor over any part of the highlighted area, click the right mouse button and select Print.

After the print dialog window opens you can preview what will be saved to PDF and make adjustments.

Step two:

Post production editing.

Earlier we showed you how to remove unwanted page elements during printing.
Now we will show you post production techniques in Nitro Pro PDF Editor for removing unwanted page elements, that obscure our target information. Note that Adobe Acrobat Pro has similar PDF editing capabilities.

Open your PDF file and Identify page elements that you want to remove.
Click on the Nitro Pro Home tab and click on Edit or or press Ctrl E to enter editing mode.

Then simply select the page element by clicking on it with your left mouse button, then delete it by pressing the delete key on your keyboard, or right clicking and selecting delete.

If you make a mistake, you can press Ctrl Z to un-do the last change.

After removing all the unwanted page elements, remember to re-save your PDF document before closing.

Step three:

Adding a custom stamp with website URL, highlighting text and other post production PDF document edits.

Note: Use these editing techniques sparingly. The more changes that you make to a PDF document, the more embedded meta data you alter or destroy; which could render your document unable to be authenticated as a true copy of a web page for archival purposes.

Open your PDF file in Nitro Pro PDF Editor and Click on the Review tab.
Identify page elements that you want to add to the document.

We will open the stamp dialog window and select a custom logo stamp, which had already been configured into this computer's installation of Nitro Pro PDF Editor.

We will insert it onto the page, and size it to fit in the desired location in the upper right corner of the first page of the document.

Next we will click on the Page Layout tab and select link. We then position the mouse cursor, click and hold the left mouse button to drag out a target area for the link.

After releasing the left mouse button, the link target is indicated and the Create Link dialog box automatically opens.

For Link appearance, Link type select: Invisible Rectangle, from the drop down menu.

For Link action: select Open a webpage, from the radio buttons.

Click Next to proceed to the Edit Web Link dialog box.

Type or paste a full, correct website URL into the box and click the OK button.

Now open the Home tab and select the hand tool or press Ctrl H to exit editing mode.

Use the mouse cursor to verify the link insertion by hovering over the link target area.

Highlighting text:

Click on the Review tab and select Highlight.

Position the mouse cursor near the text you want to highlight then click and hold the left mouse button as you select the text that you wish to highlight and release the left mouse button.

When highlighting is complete, press Ctrl H to exit editing mode.

Remember to save your edited PDF document again to preserve the changes in the saved document.

Software Recommendations:

Note, we have not received any compensation from the makers of these software products, we recommend them solely in the interest of properly equipping other interested parties to make quality archives of web pages and the information they contain, in order to hold our leaders and institutions accountable for their actions.

We recommend Google Chrome Browser Variants like Dissenter and Brave that automatically block ads, and have excellent web page to PDF conversion capabilities.

However, we recommend viewing a web page with different browsers such as MS Internet Explorer, as well as Mozilla Firefox.

There have been documented cases of Google Chrome variant browsers censoring web pages in real time. While this may be a an errant function of their ad blockers it must be noted, and they can not be completely trusted to display a web page accurately.

Additional Recommendations for archiving information from the web:

The ability to archive video from the internet in standard file formats like MP4, so that they can be stored and distributed offline is also very important to keeping the Powers That Be, accountable.

We also recommend:

4k Video Downloader by Open Media, excellent for quickly and easily downloading videos from Youtube and many other video sharing platforms.

Camtasia by TechSmith, if you can view a video on your computer, Camtasia can download and save it. However, it does take some practice to set up and use.

Thank You for watching this tutorial on how to effectively archive web pages as PDF documents.

Good luck and Good Hunting, for the truth.