Architecting for AI
Embedding system prompts, context cues, and usage patterns in documentation to guide AI behavior.
Applying the doctrine of AI-first documentation is not only about writing style, but also about content architecture. We need to design our documentation corpus in a way that integrates seamlessly with AI systems. This involves thinking about how docs are segmented, annotated, and even how they might “communicate” with the AI through prompts or metadata. Here are key strategies for architecting documentation that plays well with AI:
1. Use Frontmatter and Metadata as a Schema: In an AI-native doc system, every documentation page or module should begin with a metadata block (e.g., YAML frontmatter) that categorizes that content. Examples of useful metadata fields: version, product or subsystem, topic category, keywords, intended audience, last updated date, and any flags (like “experimental” or “deprecated”). This metadata is incredibly useful for an ingestion pipeline:
- It allows the search index to filter or boost results. If a query includes “PostgreSQL 14”, the system can prioritize chunks with
version: 14. - It lets the AI dynamically tailor responses. Knowing the intended audience (say “beginner” vs “expert”) could let an AI simplify an explanation if the user is a novice, or provide extra technical detail if the user is advanced.
- It helps maintain context. For instance, if a piece of documentation has
product: opsfoliovsproduct: surveilr, an AI chatbot that knows the user is asking about Surveilr can ignore or de-prioritize Opsfolio docs to avoid confusion. In a human scenario, a reader might accidentally read the wrong product’s docs and get confused; an AI can be smarter by using metadata to stay in the lane.
The key is to plan your frontmatter schema and use it consistently. This might require adding these metadata fields retroactively to legacy docs, but it’s a one-time effort that greatly enhances machine interpretability. Some modern documentation platforms and static site generators support YAML frontmatter natively, which we can leverage. Even if your current docs system doesn’t, you can maintain metadata in an external index or as comments in the docs that the ingestion script reads.
2. Embed AI-oriented Cues and Warnings: Think about scenarios where an AI might need to be cautious or add context. You can embed cues for these in the docs. A classic example is a guardrail prompt for sensitive or dangerous operations. Suppose you have a section of Linux documentation on formatting disks. For a human, you might write a warning: “Warning**:** This command will erase the disk.” An AI might or might not include that warning when giving an answer about disk formatting, depending on if it retrieves that part of the text. To be safe, make such warnings highly visible in text (using a clear “Warning:” label as we normally would) and consider an AI-specific note. For instance, in the documentation text itself one could add: “(AI note: Always remind the user to backup data before formatting.)”. If your ingestion pipeline captures such parenthetical notes (perhaps marked in a special way), the AI assistant could be programmed to incorporate them into its response as appropriate. Some documentation teams handle this by maintaining a system prompt file containing general guidelines (like “Always warn about destructive actions”) so the AI’s behavior is guided globally. But embedding the reminder next to the actual instruction in the docs is a belt-and-suspenders approach – it ensures that even if the global prompt is forgotten, the retrieved doc chunk carries the caution.
Another use of embedded cues is indicating related information. For example, a documentation page about an API endpoint could have a note: “See also: Authentication (required to use this endpoint)”. A human might see that and navigate accordingly. An AI, on retrieving that chunk, could use that cue to fetch the Authentication section as well for a more complete answer. In architecture terms, we’re linking nodes of knowledge in a way that an AI can follow. Simple explicit hyperlinks or references in text work because the vector search might catch the anchor text. Additionally, if you know certain questions span multiple modules, you might include a brief mention of one inside the other. For instance, in a module about “Troubleshooting login issues”, include a line like “If you need to reset your password, see the Password Reset section.” A user query about login problems that also involve password might cause the AI to retrieve both sections thanks to that cross-reference.
3. Incorporate Usage Patterns and Examples as First-class Documentation: AI systems trained on your docs can provide not just static answers, but dynamic examples and patterns. If you include rich examples in your documentation (which you absolutely should), consider structuring them in a way that is easy for the AI to extract. For instance, if documenting a function, have a sub-section “Example” with a step-by-step example usage. If documenting a concept, include a short Q&A in the doc itself (“Q: How do I do X? A: By ...”). These effectively pre-package a mini conversation that the AI can directly draw from. The AnythingLLM platform and others have shown that including FAQ-style content significantly boosts answer quality. It’s like pre-seeding the AI with a set of question-answer pairs.
Also, identify usage patterns – common ways users interact with your product – and ensure they are documented clearly. For example, a usage pattern in PostgreSQL might be “connecting to the database using SSL.” Instead of hoping the user pieces together an answer from various paragraphs, you might have a dedicated section “How to connect with SSL” that lists steps. This not only helps human readers but gives the AI a ready-made answer to a very likely question.
From an architecture perspective, this may lead you to create new sections or documentation pages that didn’t exist in the human-first docs. In human-first docs, you might not have an explicit “How do I…?” section if the information is sprinkled throughout a guide. But for AI consumption, it’s worth consolidating and making an explicit module for each significant user task or problem scenario. Essentially, think in terms of questions and answers when deciding what modules to include. A good practice is to gather actual user questions (from support tickets, forums, Stack Overflow, etc.) and ensure each one maps to a specific place in your documentation structure. If it doesn’t, create a doc for it (even if it overlaps with existing content). Redundancy is less harmful in AI docs than obscurity; if two questions have overlapping answers, you can have two entries and trust the AI to pick up the relevant pieces. It’s more important that the question’s phrasing exists somewhere in the docs so the AI can latch onto it.
4. Leverage Diagrams and Mermaid (with Descriptive Text): Visuals like architecture diagrams or flowcharts can be very helpful to human readers. LLMs can’t directly “see” an image, but if you use Mermaid diagrams (which are text-based) or include the textual description of an image, those become part of the ingestible content. For example, instead of just embedding an image of your system architecture, include a list of bullet points that describe the image (“Component A connects to B, then sends data to C…”). An AI might not present the image, but it can convey the same info using those bullets. Mermaid diagrams, being written in text, might even be parsed or at least partially understood by an LLM (they typically ignore code blocks, but a caption above it explaining the diagram could be useful). The key is: don’t let important info live only in images or non-textual formats. Always accompany visuals with text explanation. In AI-first docs, we sometimes write the descriptive text first (for the AI’s sake) and then include a diagram for the human. This ensures the knowledge in the diagram isn’t lost on the AI. Moreover, including diagrams as Mermaid in Markdown (like we do in this doctrine) means that even if an AI doesn’t process the diagram, a human viewing the docs on GitHub can see it rendered, so it’s a win-win for dual audiences.
5. System Prompt Integration: Consider maintaining a “System Guide” as part of your documentation set. This could be a document that outlines the tone, style, and safety rules for answering questions about your product. For example, it might say “Always answer with a polite, professional tone. If the question is about pricing or sales, refer them to contact sales. Do not reveal internal codenames or unannounced features.” Normally, this is something you’d configure in the AI platform rather than in docs. But by writing it down in a markdown file (even if you don’t publish it to end-users), you treat it as part of the documentation canon. Some integrated solutions ingest everything – you might choose to ingest the system guide with a tag like role: system so that your AI middleware knows to prepend it as a system message. The doctrine here is to treat the AI’s configuration as documentation too. It should be under version control, written in human language, and perhaps even reviewed by tech writers or subject matter experts. By formalizing it, you reduce the risk of a misconfigured AI (which is analogous to a doc with a big hole in it).
In practice, architecting for AI might involve re-organizing your documentation repository. You might create new folders for different content types (how-tos, reference, troubleshooting) corresponding to how you expect to index and retrieve them. You might introduce a naming convention for doc files that aligns with common queries (e.g., how-to-reset-password.md). The architecture should serve two masters: it should still make sense for a human browsing the repository or site, and it should be highly convenient for an AI to pinpoint relevant information.
Let’s apply these ideas briefly to an open-source example like SQLite. SQLite’s documentation in the old world is a mix of a long FAQ, a single long “lang_features” page, and some scattered files. In an AI-first re-architecture, we might split the FAQ into individual Q&A entries as separate markdown files (all under an “faq/” directory, each with frontmatter tags for topic). We’d break the language specification into modules for each concept (expressions, datatypes, functions, etc.). We’d add frontmatter indicating which SQLite version they apply to (since SQLite evolves over time). We might also add a top-level “system.md” describing SQLite’s typical usage contexts (since an AI might benefit from knowing that SQLite is embedded, file-based, etc., as a baseline). We ensure that every common user question (“How do I backup a SQLite database?”, “What is the maximum database size in SQLite?”) is answered explicitly in at least one module. The result: a user asks a chatbot “What’s the maximum size of a SQLite database?”, the bot finds the “maximum size” doc (with that phrase in the title or metadata), and returns the factual answer (e.g., ~281 TB for SQLite) along with any caveats mentioned. Without that explicit doc, the AI might try to remember or guess from training data, which could be outdated or wrong.
Architecting documentation for AI requires thinking of your documentation set as a structured data source and not just a collection of pages. It’s about organizing knowledge in a way that an AI can efficiently query it, much like designing a database schema for queries. By embedding rich context, guidance, and cross-links into the docs, we essentially “teach” the AI how to use our documentation effectively. In the following section, we zoom in on the concept of “fact modules” and patterns – essentially continuing this thread by discussing how to identify and package the facts in your product into AI-ready chunks.
How is this guide?
Last updated on