Manuel Cesarini.it - Web Developer & Digital Transformation Architect

Hugo and External APIs: Managing Cache for Always-Updated Content

Building static sites with Hugo is a fantastic experience: speed, security, and deployment simplicity are just some of the benefits. However, when you start integrating external APIs to power your site's content, you quickly run into a common challenge: managing Hugo's cache for API calls.

Hugo, to optimize build times, aggressively caches remote resources. This behavior is usually a virtue, but when data provided by the API changes frequently, it can become an obstacle, preventing your site from displaying the latest updates.

Fortunately, there are effective solutions to overcome this problem and ensure your Hugo site is always synchronized with the latest information from your APIs. Let's look at them.

Solution 1: Configure Hugo's Cache Times

The most direct way to control Hugo's cache behavior is to act directly within its configuration. You can specify how long Hugo should keep remote resources in its cache, or even completely disable caching for these operations.

In your Hugo configuration file (hugo.yaml or config.toml), you can add or modify the caches section for getResource:

YAML
# hugo.yaml
caches:
  getResource:
    dir: ':cacheDir/:project'
    maxAge: '1m' # Cache for only 1 minute
    # To disable caching completely, use '0s'
    # maxAge: '0s'

Explanation:

dir: ':cacheDir/:project': Specifies the directory where Hugo stores cache files. ':cacheDir' and ':project' are predefined variables pointing to Hugo's global cache directory and the current project directory, respectively.
maxAge: '1m': This parameter is crucial. It indicates the maximum time (in minutes, seconds, hours, etc.) for a resource retrieved via getResource to remain in the cache. By setting maxAge: '1m', the resource will be refreshed every minute.
maxAge: '0s': If you need the data to be always up-to-date with every build, you can set maxAge to 0s. This effectively disables caching for getResource, ensuring that Hugo makes a new API call with each execution.

This solution is simple to implement and ideal when you have a clear idea of how often your data changes.

Solution 2: Cache Busting with a Unique URL Parameter

A more "elegant" and often preferred approach, especially when you don't want to completely disable caching but want to ensure the API always returns the freshest data if changes occur, is to use a cache busting technique.

This technique involves adding a unique parameter (like a timestamp or a random number) to the API request URL. This way, each request will generate a slightly different URL, forcing Hugo (and any intermediate caching layers) to consider the resource as new and retrieve it again.

Here's how you can implement it in your Hugo code:

Code snippet (random number)

{{ $bearer := (printf "Bearer %s" site.Params.api_token) }}
{{ $opts := dict "headers" (dict "Authorization" $bearer) }}
{{ $random := (index (seq 100 | shuffle) 0 )}} # Generates a random number
{{ $articlesUrl := (printf "%s/items/my_pages?_t=%d" site.Params.api_url $random) }} # Adds the timestamp as a _t parameter
{{ with resources.GetRemote $articlesUrl $opts | transform.Unmarshal }}
    {{ range index .data }}
        {{ $frontMatter := printf "---\nslug: %q\ntitle: %q\n" .slug .title }}
        {{ if .subtitle }}
            {{ $frontMatter = printf "%ssubtitle: %q\n" $frontMatter .subtitle }}
        {{ end }}
        {{ $frontMatter = printf "%s---\n" $frontMatter }}
        {{ $content := .body | safeHTML }}
        {{ $output := printf "%s\n%s" $frontMatter $content }}
        {{ $filename := printf "content/%s.md" .slug }}
        {{ $resource := resources.FromString $filename $output }} 
        {{ $file := $resource.RelPermalink }} 
    {{ end }}
{{ end }}

Explanation:

{{ $random := (index (seq 100 | shuffle) 0 )}}: This single line is designed to generate a random number between 0 and 99 (inclusive). You can also use a timestamp: {{ $timestamp := now.Unix }} (This line gets the current Unix timestamp, which is the number of seconds that have passed since January 1, 1970 (Epoch). This number is always unique for each moment)
{{ $articlesUrl := (printf "%s/items/my_pages?_t=%d" site.Params.api_url $random) }}: Here, the $random is added as a _t query parameter (you can use any name for the parameter; the important thing is that it's present) to the API URL. For example, if your site.Params.api_url was https://api.example.com, the URL might become https://api.example.com/items/my_pages?_t=22.

Every time the Hugo site is rebuilt, the timestamp will be different, generating a unique URL and forcing Hugo to make a new request to the API, bypassing its internal cache for that specific URL.

This method is particularly useful because "cache busting" is also a common practice for managing web resources (CSS, JavaScript) and integrates well with data update logic.

Which Solution to Choose?

Both solutions are valid and solve Hugo's cache problem with external APIs.

The caches.getResource.maxAge configuration is more suitable if you have precise control over desired update times and want to apply a general rule to all getResource calls.
Cache busting via timestamp/random number is more granular and flexible, allowing you to apply this logic only to specific API calls where data freshness is critical, without affecting other resources that could benefit from caching. It's also a more robust solution because it acts at the URL level, potentially affecting intermediate caches outside of Hugo as well.

Choose the solution that best suits your needs and the structure of your project. With these techniques, you can fully leverage the power of Hugo even when your content comes from dynamic external sources.

Hugo API Cache