Linux and Tux.

Wget Advanced Examples and Use Cases

From handling cookies and session data to manage SSL certificates, connect to to proxies and using custom headers. These are some of the most advanced Wget use cases.

Introduction

Wget is not just for simple file downloads—it also supports powerful options for handling cookies, custom headers, proxies, SSL certificates, and more. These advanced features allow you to script complex workflows, handle authentication securely, or scrape entire directories in locked-down environments. Below are some advanced use cases and examples to help you get the most out of Wget.

These advanced features help you tailor Wget to complex tasks: maintaining authenticated sessions, working with proxies, handling SSL certificates, or automating large-scale downloads. Whether you’re scripting parallel requests or mirroring pages from behind a corporate proxy, Wget’s extensive capabilities allow you to handle nearly any downloading challenge from the command line.

Not: if you’re looking for a more general and basic Wget tutorial and cheat sheet, see this link: The Ultimate Wget Guide and Cheat Sheet.


Handling Cookies and Sessions

Using a Cookie File

If you have previously exported browser cookies or used Wget to save cookies, you can use them in subsequent requests:

wget --load-cookies=cookies.txt https://example.com/private/data.zip
  • –load-cookies: Loads cookies from the specified file (e.g., cookies.txt).

Saving Cookies

To save cookies returned by the server to a file for future use:

wget --save-cookies=auth_cookies.txt --keep-session-cookies https://example.com/login
  • –save-cookies: Saves cookies to the specified file.
  • –keep-session-cookies: Retains session cookies for subsequent downloads.

Using Custom Headers

Sometimes you need to send custom headers (e.g., when accessing APIs):

wget --header="Authorization: Bearer your_token_here" https://api.example.com/data
  • –header: Adds an HTTP header to the request.

You can add multiple –header options for different custom headers.

Working with Proxies

HTTP Proxy

Set the http_proxy environment variable or use the -e option:

export http_proxy="http://proxyserver:port"
wget https://example.com/

Or:

wget -e http_proxy=http://proxyserver:port https://example.com/

SOCKS Proxy

You can use torsocks or proxychains to wrap Wget for SOCKS proxy usage:

torsocks wget https://example.com/

SSL Certificate Handling

Ignoring Certificate Errors (Insecure)

For testing or in trusted internal networks:

wget --no-check-certificate https://example.com/
  • –no-check-certificate: Skips validation of the server’s TLS certificate (use only if you trust the target).

Using Custom CA Certificates

If you have a self-signed certificate or a private CA:

wget --ca-certificate=/path/to/ca.crt https://internal.example.com/

Batch Downloads & Input Files

Download Multiple URLs from a File

wget -i urls.txt
  • -i: Reads URLs from urls.txt and downloads each in sequence.

Parallel Downloads

Wget itself doesn’t provide built-in parallel downloading, but you can pair it with xargs:

xargs -P 4 -n 1 wget < urls.txt
  • -P 4: Allows up to 4 simultaneous Wget processes.
  • -n 1: Passes one URL at a time to Wget.

Advanced Authentication Scenarios

Basic HTTP Authentication

wget --user="username" --password="secret" https://example.com/protected/file.zip

Using .wgetrc

You can place common credentials or default options in your ~/.wgetrc:

user=username
password=secret

Then simply run:

wget https://example.com/protected/file.zip

Combining Wget with Other Tools

  1. Piping Output to Other Commands:
    Sometimes you can stream an output directly to another tool:
wget -qO- https://example.com/data.csv | head -n 10
  • -qO-: Quiet mode and output to stdout instead of a file.
  1. Chaining Commands:
    You can combine Wget with tar/gzip to fetch and extract archives in one command:
wget -qO- https://example.com/archive.tar.gz | tar xz

Advanced Recursive Scraping

Limiting Depth & File Types

wget -r -l 2 -A "*.jpg" https://example.com/gallery/
  • -l 2: Limits recursion depth to 2 levels deep.
  • -A “*.jpg”: Accepts only JPEG files.

Mirroring with Timestamps

wget -m -N --timestamping https://example.com/
  • -m (mirror) includes -N (timestamping).
  • –timestamping: Only downloads updated files after the last mirror run.

Leave a Reply

Your email address will not be published. Required fields are marked *