Document QnA

Document QnA (also known as RAG, Retrieval-Augmented Generation) is an element to query and interact with documents. Once you load them in, you can ask questions of the whole document repository and the model will respond with answers based on the documents. The model will also provide citations from where it got its information in your documents.

This article is a deep dive into the Document QnA Element. To get started quickly, check out Templates: Document QnA

  1. Start by creating a new Canvas and drag the Document QnA element onto the Canvas.
  2. Open the Document QnA settings and and make the following adjustments:
  • If you have a trained model you’d like to use, select the Trained Artifact from the drop down. This is where the models you’ve trained with the LLM Trainer are stored. This is an optional step for Document QnA.
  • Base Model Architecture: This is the base model you will be chatting with. For a full list of supported models and their strengths and weaknesses, see our Supported LLM Base Models.
  • Embedding/Retrieval Model: Embedding models are similar to LLMs in the sense that each has a different size and capability. The default (mixedbread) offers a great balance of speed and performance but here is a list of all of them: https://www.sbert.net/docs/sentence_transformer/pretrained_models.html
  • Model System Prompt: This default prompts your model to act as an assistant. You can leave this setting alone, or play with it to change how the model responds.
    For example: ask the model to talk like a Pirate or Rock Star, speak in a casual tone, or speak in plain language. If you have brand voice guidelines, this is the place to add them.
  • Temperature: A balance between predictability and creativity. Lower settings prioritize learned patterns, giving more deterministic outputs. Higher temperatures encourage creativity and diversity.
  • Model Storage Path: Using the “Select Directory” button, choose the folder where you want to save the base model for the chat.
  • Max Tokens: A setting limiting the number of tokens from the LLMs output. By default this is loaded from the Trained Artifact.
  • Model Adapter Folder Path: A backup to the built in model artifact registry. If you have a model that is not in your registry, you can plug it in here.
  • Document Chunk Size: How big of a chunk each document gets broken up into, i.e. how many words (500 words for instance is default)
  • Document Overlap: How many words to overlap per chunk.
  • Number of Docs per Query: Number of chunks to retrieve per query of the database