Orchard Core AI Data Sources - Prompt Templates

Configure AI Data Sources

You are an Orchard Core expert. Generate code, configuration, and recipes for configuring AI data sources in Orchard Core using CrestApps modules to enable knowledge base indexing and RAG (Retrieval-Augmented Generation) search.

Guidelines

AI Data Sources provide knowledge base indexing and RAG search capabilities for AI profiles in Orchard Core.
A data source maps a source index (e.g., Lucene, Elasticsearch, Azure AI Search content index) to an AI knowledge base index that stores chunked, embedded documents for vector search.
The indexing pipeline reads documents from the source index, generates embeddings via a configured embedding deployment, chunks content, and writes vector documents into the knowledge base index.
Supported vector search backends are Azure AI Search and Elasticsearch. Install the matching backend module for your environment.
The DataSourceAlignmentBackgroundTask runs daily at 2 AM to keep knowledge base indexes aligned with their mapped data sources.
Content item changes (create, update, publish, unpublish, remove) are automatically tracked and queued for re-indexing via DataSourceContentHandler.
Data source configuration (source index, knowledge base index, field mappings) is locked after initial creation and cannot be changed.
AI profiles reference data sources on the Knowledge tab, where you configure strictness, top-N documents, in-scope filtering, and OData filters.
Strictness controls how closely results must match the query. Top-N documents limits how many retrieved documents are included in the AI context.
Always secure API keys using user secrets or environment variables; never hardcode them.
Install CrestApps packages in the web/startup project.

Feature Overview

Feature	Feature ID	Description
AI Data Sources (Core)	`CrestApps.OrchardCore.AI.DataSources`	Core data source management, indexing pipeline, and RAG search
AI Data Sources - Azure AI Search	`CrestApps.OrchardCore.AI.DataSources.AzureAI`	Azure AI Search backend for embeddings, vector search, and indexing
AI Data Sources - Elasticsearch	`CrestApps.OrchardCore.AI.DataSources.Elasticsearch`	Elasticsearch backend for embeddings, vector search, and indexing

NuGet Packages

Package	Description
`CrestApps.OrchardCore.AI.DataSources`	Core data source management and RAG search
`CrestApps.OrchardCore.AI.DataSources.AzureAI`	Azure AI Search support for data source embeddings and vector search
`CrestApps.OrchardCore.AI.DataSources.Elasticsearch`	Elasticsearch support for data source embeddings and vector search

Install the core package plus at least one backend package in your web/startup project.

Enabling AI Data Sources with Azure AI Search

{
  "steps": [
    {
      "name": "Feature",
      "enable": [
        "CrestApps.OrchardCore.AI",
        "CrestApps.OrchardCore.AI.DataSources",
        "CrestApps.OrchardCore.AI.DataSources.AzureAI",
        "OrchardCore.AzureAI"
      ],
      "disable": []
    }
  ]
}

Enabling AI Data Sources with Elasticsearch

{
  "steps": [
    {
      "name": "Feature",
      "enable": [
        "CrestApps.OrchardCore.AI",
        "CrestApps.OrchardCore.AI.DataSources",
        "CrestApps.OrchardCore.AI.DataSources.Elasticsearch",
        "OrchardCore.Elasticsearch"
      ],
      "disable": []
    }
  ]
}

How Data Sources Work

Architecture Overview

Source Index — An existing Orchard Core index profile (Lucene, Elasticsearch, or Azure AI Search) that contains the original content documents.
AI Knowledge Base Index — A vector-enabled index profile (Azure AI Search or Elasticsearch) that stores chunked, embedded documents for semantic search.
Embedding Deployment — An AI deployment of type Embedding (e.g., text-embedding-3-large) used to generate vector embeddings for document chunks.
Data Source — The configuration object that links a source index to a knowledge base index, specifying key, title, and content field mappings.

Indexing Pipeline

When a data source is synchronized:

Documents are read from the source index via IDataSourceDocumentReader.
Content fields are extracted and chunked based on field mappings (key, title, content).
Embeddings are generated for each chunk using the configured embedding deployment.
Embedded chunks are written to the knowledge base index via IDataSourceContentManager.

RAG Search Flow

When an AI profile with a linked data source receives a query:

The query is embedded using the same embedding deployment.
A vector (KNN) search is executed against the knowledge base index.
Top-N matching documents are retrieved, filtered by strictness and optional OData filters.
Retrieved context is injected into the AI prompt for grounded, knowledge-based responses.

Automatic Alignment

Content Changes — DataSourceContentHandler tracks content item lifecycle events (created, updated, published, unpublished, removed) and queues affected documents for re-indexing or removal.
Source Index Sync — When a source index is synchronized, all data sources using that source index are automatically re-indexed.
Knowledge Base Index Sync — When a knowledge base index is synchronized, all data sources mapped to it are re-indexed.
Background Alignment — DataSourceAlignmentBackgroundTask runs daily at 2 AM (0 2 * * *) and calls SyncAllAsync to align all knowledge base indexes.

Key Services

Service	Description
`DataSourceIndexingService`	Core indexing service that orchestrates document reading, embedding, and writing
`IAIDataSourceIndexingQueue`	Request-scoped queue that collects work items and processes them after the HTTP response completes
`IAIDataSourceIndexingService`	Interface for sync, delete, and document management operations
`DataSourceAlignmentBackgroundTask`	Daily background task for full alignment of all data sources
`IDataSourceContentManager`	Provider-specific service for vector search and document management
`IDataSourceDocumentReader`	Provider-specific service for reading documents from the knowledge base index

Creating Data Sources

Creating a Data Source via Admin UI

Navigate to Artificial Intelligence → Data Sources.
Click Create.
Enter a Display Name for the data source.
Select the Source Index Profile (the existing content index to read from).
Select the AI Knowledge Base Index Profile (the vector-enabled index to write to).
Map the Key Field, Title Field, and Content Field from the source index.
Click Save.

After creation, the source index, knowledge base index, and field mappings are locked and cannot be changed.

Creating a Data Source via Recipe

{
  "steps": [
    {
      "name": "AIDataSource",
      "DataSources": [
        {
          "ItemId": "my-data-source-id",
          "DisplayText": "My Knowledge Base",
          "SourceIndexProfileName": "content-index",
          "AIKnowledgeBaseIndexProfileName": "kb-vector-index",
          "KeyFieldName": "ContentItemId",
          "TitleFieldName": "Content.ContentItem.DisplayText",
          "ContentFieldName": "Content.ContentItem.BodyPart.Body"
        }
      ]
    }
  ]
}

Triggering a Manual Re-Index

After creating or modifying source content, you can trigger a manual re-index:

Navigate to Artificial Intelligence → Data Sources.
Find the target data source in the list.
Click Sync Index to queue a full re-indexing of that data source.

Linking Data Sources to AI Profiles

Configuring a Profile via Admin UI

Navigate to Artificial Intelligence → Profiles and edit the target AI profile.
Go to the Knowledge tab.
Select a Data Source from the dropdown.
Configure retrieval parameters:
- Strictness — How closely results must match (1–5).
- Top N Documents — Maximum number of documents to retrieve (1–20).
- In Scope — When enabled, restricts AI responses to only the retrieved context.
- Filter — Optional OData filter expression for additional result filtering.
Click Save.

Configuring a Profile via Recipe

{
  "steps": [
    {
      "name": "AIProfile",
      "profiles": [
        {
          "Name": "knowledge-assistant",
          "DisplayText": "Knowledge Assistant",
          "Type": "Chat",
          "TitleType": "InitialPrompt",
          "ChatDeploymentName": "gpt-4o",
          "UtilityDeploymentName": "gpt-4o-mini",
          "Properties": {
            "AIProfileMetadata": {
              "SystemMessage": "You are a helpful assistant. Answer questions using the provided knowledge base context."
            },
            "DataSourceMetadata": {
              "DataSourceId": "my-data-source-id"
            },
            "AIDataSourceRagMetadata": {
              "Strictness": 3,
              "TopNDocuments": 5,
              "IsInScope": true,
              "Filter": null
            }
          }
        }
      ]
    }
  ]
}

Configuring the Knowledge Base Index

Embedding Deployment

Each knowledge base index profile requires an embedding deployment. When creating or editing a knowledge base index profile of type DataSourceConstants.IndexingTaskType, you select the embedding deployment from available deployments that support AIDeploymentType.Embedding.

Configuring Default RAG Settings

Default strictness and top-N documents can be configured in site settings:

Navigate to Configuration → Settings → AI.
Find the Data Sources section.
Set Default Strictness (1–5) and Default Top N Documents (1–20).
Click Save.

These defaults are applied when a profile does not specify its own values.

Azure AI Search Index Fields

When using Azure AI Search as the knowledge base backend, the index profile handler automatically configures these fields:

Field	Type	Properties
`ChunkId`	Text	Key, Filterable
`ReferenceId`	Text	—
`DataSourceId`	Text	Filterable
`ReferenceType`	Text	Filterable
`ChunkIndex`	Integer	—
`Title`	Text	Searchable
`Content`	Text	Searchable
`Timestamp`	DateTime	Filterable, Sortable
`Embedding`	Vector	Dimensions from embedding deployment

Elasticsearch Index Fields

When using Elasticsearch as the knowledge base backend, the index profile handler configures these mappings:

Field	Mapping Type	Notes
`ChunkId`	Keyword	Key field
`ReferenceId`	Keyword	—
`DataSourceId`	Keyword	—
`ReferenceType`	Keyword	—
`ChunkIndex`	Integer	—
`Title`	Text	—
`Content`	Text	Default search field
`Timestamp`	Date	—
`Embedding`	DenseVector	Cosine similarity
`Filters`	Object	Dynamic mapping

Deployment Support

Data sources can be included in Orchard Core deployment plans:

Navigate to Configuration → Import/Export → Deployment Plans.
Add an AI Data Source step.
Choose Include All or select specific data sources.
Execute the deployment plan.

Security Best Practices

Only users with the ManageAIDataSources permission (default role Administrator) can create, edit, or delete data sources.
Secure embedding API keys using user secrets or environment variables.
Use OData filters on AI profiles to restrict which knowledge base documents are accessible per profile.
Avoid using AuthenticationType: "None" for any connected AI services in production environments.
Review data source field mappings carefully before creation — they cannot be changed after the initial save.