Sym Piracha

Docs as code: automating the generation process

I started my career working on documentation, or DocsOps as it’s often called in the industry.

In many SaaS companies, documentation is often overlooked. Yet it’s critical to the developer experience (DX). Strong documentation helps users treat your product as a reliable abstraction layer and confidently build logic on top of it.

There are different types of documentation, each fulfilling a distinct purpose. At the foundation, however, is reference documentation. This layer covers the raw building blocks of your software, low-level API calls, method signatures, configuration parameters, and similar elements.

This post explores how reference documentation can be generated automatically and integrated into development workflows. The goal is to reduce manual intervention and ensure docs remain consistent and in sync with the codebase. I’ll focus on two common cases:

  1. API documentation — for example, REST APIs
  2. Library/SDK documentation — for example, JavaScript front-end libraries (though the same principles apply to any runtime or client library)

The broader objective is to treat docs as a first-class part of the product pipeline: automated, consistent, and always up to date.

Note: This approach is intended for larger teams and mature products where documentation is critical to the developer experience. For small projects or internal tools, a lightweight manual process is usually enough. But once your software becomes a platform for others to build on, treating documentation as code and automating its generation pays off quickly.

1. Annotate the code

The first step is to annotate your codebase with clear, written comments on all public interfaces. This includes detailed descriptions of input parameters, return values, and possible error conditions. You can also use tags such as @deprecated, @since, or @experimental to mark elements and communicate lifecycle status and stability.

In addition to annotations, it’s useful to implement lint rules and automated CI checks that prevent undocumented public APIs from being merged into the codebase. This ensures that documentation quality is enforced the same way as tests or style rules.

Lowering the burden matters. Engineers will resist if annotation feels heavy. Lightweight conventions (like short JSDoc comments, decorators, or Swagger annotations) plus CI hints that show “what’s missing” keep compliance realistic. The goal is to make documentation part of the normal development rhythm, not an afterthought or a chore.

API documentation

For REST services, the recommended approach is to annotate the code following the OpenAPI specification. It provides a standard way to describe endpoints, parameters, request/response schemas, and error codes. OpenAPI is widely supported by tooling that can generate interactive documentation, client SDKs, and server stubs.

When documenting a REST API, annotate each endpoint with the following information:

  • HTTP Method: GET, POST, PUT, DELETE, PATCH
  • Path: For example, /users/{id}
  • Description: What the endpoint does
  • Parameters: Query parameters, path parameters, and headers
  • Request Body: JSON or XML payloads, with schema definitions
  • Responses: Status codes and example payloads
  • Errors: Common failure codes and their meaning

Library and SDK documentation

For libraries and SDKs, the same principle applies. Describe the public interface in a consistent and machine-readable way—but the tooling differs. Since these projects do not expose HTTP endpoints, OpenAPI isn’t applicable. Instead, use language-specific documentation generators such as TypeDoc (for JavaScript/TypeScript), Javadoc (for Java), or Sphinx (for Python). These tools parse inline comments and annotations to produce reliable reference docs that evolve in step with the code.

While the format differs, the objective is the same: provide a stable, authoritative description of the API surface that can feed into the documentation pipeline.

2. Produce an IR

When releasing a new version of an API or library, documentation should be generated automatically. Rather than jumping directly to a finished docs site, it’s best to first produce an intermediate representation (IR).

The IR is a stable artifact that captures the public interface at a given version. It decouples the source of truth (your code) from the presentation layer (HTML/Markdown), making documentation pipelines more flexible and repeatable.

  • APIs: Generate or update the OpenAPI specification during the build. Normalize and validate the spec, resolve all $refs (including multi-file specs), and freeze it as a versioned artifact. This becomes the input for downstream tooling such as API reference renderers, client SDK generators, or mock servers.

  • Libraries and SDKs: Use language-specific tools like TypeDoc (JavaScript/TypeScript), Javadoc (Java), or Sphinx (Python). These can output different formats. An HTML site is the simplest option but offers limited customizability. Alternatively, a structured IR format such as JSON (TypeDoc) or YAML (DocFX) provides more flexibility and can be transformed into consistent docs experiences across multiple languages.

Despite format differences, the principle remains the same: every release should produce a versioned snapshot of the public interface. This IR can then be consumed by the documentation pipeline without depending directly on the product’s build tool.

Why not skip the IR? It’s tempting to jump straight to generating Markdown or HTML, but the intermediate layer buys you flexibility. By decoupling “what the code exposes” from “how the docs look,” you can evolve your documentation site generator without forcing every product repo to change. This matters once you have multiple teams, stacks, or runtimes in play. In small projects, the overhead may not be justified — but at scale, IRs are a guardrail against fragmentation.

3. Trigger the documentation pipeline

Once the IR has been generated, the release process should also notify the documentation system so that presentable docs can be built and deployed.

Think of this as a producer–consumer relationship: the product codebase produces a versioned IR bundle, and the documentation infrastructure consumes that bundle to render pages.

Each run should output a stable, versioned IR package that contains:

  • The IR itself (for example, openapi.yaml or typedoc.json).
  • Minimal metadata such as version, commit SHA, and build timestamp.

When to trigger

Documentation builds should run automatically whenever the public interface changes. Common triggers include:

  • Creating a new release tag.
  • Opening a pull request that modifies the API surface.
  • Issuing a manual trigger for backfilled versions.

How it flows

The product repository publishes the IR as an artifact (for example, in a storage bucket, release asset, or registry). The documentation pipeline then ingests that artifact, independent of the product’s build process, and uses it to generate the reference site.

This separation ensures that:

  • The product team only needs to guarantee the IR is accurate and versioned.
  • The documentation team can choose whatever rendering mechanism they prefer without being tied to the product repo’s tooling.

Clear ownership boundaries help here. Typically, product engineers own generating and publishing the IR as part of their release process, while the docs or platform team owns consuming that IR and rendering it into a site. This avoids finger-pointing if docs builds fail: each side has a well-defined responsibility.

By structuring the workflow this way, documentation scales across multiple teams and services while remaining consistent, automated, and in sync with the codebase.

4. Generate documentation pages

Once the IR is produced, the next step is to transform it into human-readable documentation pages. These can be Markdown, MDX, or static HTML, depending on your site generator and publishing stack.

  • Generate pages automatically for each release.
  • Build ephemeral previews for pull requests so reviewers can see changes in context.
  • Provide stable permalinks (for example, /v/3.29.3/...) so version-specific references remain valid over time.

This ensures that documentation is always synchronized with the corresponding release.

Generating pages

If the IR is an OpenAPI specification, you can use widely available tools that consume it to generate API reference pages. For SDKs, language-specific IR formats (for example, TypeDoc JSON or DocFX YAML) may first need to be converted into Markdown or another markup format before being integrated into your docs site.

The challenge is that each IR format is slightly different. A robust approach is to introduce a translation layer:

  • For each type of IR, write a converter that transforms it into a common, generic representation (such as Markdown).
  • Feed that representation into your documentation build system (like Hugo, Jekyll, or something completely custom for more complex use cases).

This is similar to how compilers work: different languages compile down into a common machine-level format, which can then run on any supported architecture. By treating IRs the same way, you only need to swap out or add new converters when adopting a new tool, while keeping the rest of the documentation pipeline unchanged.

Writing custom translation layers is not trivial and can be complicated. One of the main challenges is schema diversity: OpenAPI, TypeDoc, Javadoc, and Sphinx all represent concepts like methods, parameters, and return types in different ways. To reduce this complexity, it’s best to align on standards within your organization wherever possible so that teams are not reinventing conversions for every project.

Keep maintenance in mind. Translation layers should be reused, not reinvented for each team. The safest path is to lean on established standards and existing open-source converters whenever possible, and only write custom logic for gaps. Otherwise, you risk turning the pipeline itself into another system that constantly breaks as formats evolve.

Another challenge is evolving specifications. Formats such as OpenAPI or language-specific doc generators evolve over time, and keeping translation logic up to date requires continuous maintenance. Without dedicated ownership, the translation layer can quickly become brittle.

Because of these difficulties, the most sustainable approach is to lean on established standards, reuse existing converters where they exist, and only introduce custom translation logic when absolutely necessary. This keeps the pipeline flexible without adding unnecessary overhead.

5. Validate and enforce quality gates

Before publishing, run automated checks to catch errors and enforce consistency:

  • Verify that there are no broken links, unresolved schema references, or missing descriptions.
  • Run code examples through compilers or interpreters to confirm they work.
  • Enforce accessibility and SEO best practices.
  • Apply editorial style checks (for example, Microsoft Style Guide rules) to maintain a consistent tone and voice.

These gates reduce manual review effort and ensure a minimum level of quality across all docs.

6. Perform human review (when needed)

Automation handles the heavy lifting, but human review still plays an important role:

  • Provide reviewers with a preview URL so they can see changes in context.
  • Use a “docs diff” tool to highlight only the modified sections side by side.

This makes it easier to focus reviews on clarity, accuracy, and usability, rather than formatting or style compliance.


Conclusion

By automating documentation generation, teams reduce drift between code and docs, minimize manual effort, and deliver a more reliable developer experience. Treating documentation as code ensures it evolves alongside your product, scales across teams, and remains a trusted part of the platform.