OpenMAIC 课程生成的三阶段流水线架构

Course Content Generation Process

The process for generating course content involves several stages, each responsible for transforming user inputs into structured educational materials using machine learning models and AI-driven tools. This document outlines the steps involved in these stages.

Stage 1: Request Processing and Initial Response Handling

Overview

In this stage, initial requests are processed to generate an outline of the course material. This includes parsing input data, constructing prompts for language models, and handling responses from those models.

Input Data

The input data includes user requirements such as course goals, duration, and reference materials like PDF content summaries and available images. The system also handles image mappings and web search results to enrich the context provided to the AI model.

Example Request

{
  "requirements": {
    "duration_minutes": 20,
    "subject": "生物",
    "topic": "光合作用"
  },
  "pdf_images": [
    { "id": "img_1", "description": "示意图" }
  ],
  "web_search_results": ["2024最新研究表明..."],
  "image_mapping": { "img_1": "data:image/png;base64,..."}
}

Prompt Construction

Prompts are constructed based on the input data. These prompts include user requirements, language settings, reference materials, and web search results.

Example User Prompt

## 用户要求
帮我创建一个关于光合作用的初中生物课程,时长20分钟

## 语言设置
**必需的语言**: zh-CN

## 参考材料

### PDF 内容摘要
光合作用是指绿色植物通过叶绿素...

### 可用图片
[Image img_1] 光合作用示意图 (尺寸: 800×600, 宽高比: 1.33)

### 网络搜索结果
2024年最新研究表明...

Model Interaction

The system interacts with the language model to generate an initial outline of the course content in JSON format.

Example API Call Parameters

{
  "model": languageModel,
  "system": prompts.system,
  "prompt": prompts.user,
  "maxOutputTokens": modelInfo?.outputWindow
}

Response Handling

Responses from the AI model are processed incrementally to extract and enrich course outlines.

Example Incremental JSON Parsing

for await (const chunk of result.textStream) {
  fullText += chunk;

  const newOutlines = extractNewOutlines(fullText, parsedOutlines.length);

  for (const outline of newOutlines) {
    const enrichedOutline = enrichOutline(outline);
    parsedOutlines.push(enrichedOutline);

    controller.enqueue(encoder.encode(`data: ${JSON.stringify({
      type: 'outline',
      data: enrichedOutline,
      index: parsedOutlines.length - 1
    })}\n\n`));
  }
}

Output Example

{
  "id": "scene_1",
  "type": "slide",
  "title": "光合作用概述",
  "description": "介绍光合作用的定义、场所和基本过程",
  "keyPoints": [
    "光合作用的定义",
    "叶绿体是光合作用的场所"
  ],
  "teachingObjective": "理解光合作用的基本概念",
  "estimatedDuration": 180,
  "order": 1,
  "suggestedImageIds": ["img_1"],
  "mediaGenerations": [
    {
      "type": "image",
      "prompt": "示意图",
      "elementId": "gen_img_1"
    }
  ]
}

Stage 2: Outline to Scene Content Generation

Overview

In this stage, the system takes individual course outlines and generates detailed content for each scene based on their type (slide, quiz, interactive, or PBL).

Input Data

The input data includes a single outline object along with all other outlines in the course. It also contains context such as PDF images, image mappings, and additional information like stage name and ID.

Example Request

const request = {
  "outline": { /* 单个大纲对象 */ },
  "allOutlines": [/* 所有大纲列表 */],
  "pdfImages": [{ id: "img_1", description: "示意图" }],
  "imageMapping": { "img_1": "data:image/png;base64,..."},
  "stageInfo": {
    name: "光合作用课程",
    language: "zh-CN"
  },
  "stageId": "stage_abc123"
};

Type Distribution

The system distributes the generation process based on the type of outline (slide, quiz, etc.) to different content generators.

Example Type Handling Logic

export async function generateSceneContent(
  outline: SceneOutline,
  aiCall: AICallFn,
  ...): Promise<GenerationResult | null> {

  switch (outline.type) {
    case 'slide':
      return generateSlideContent(outline, aiCall, ...);
    case 'quiz':
      return generateQuizContent(outline, aiCall, ...);
    default:
      return null;
  }
}

Slide Content Generation

For slide content generation, the system constructs prompts and interacts with AI models to produce detailed slide components.

Example Prompt Construction for Slides

const prompts = buildPrompt(PROMPT_IDS.SLIDE_CONTENT, {
  title: outline.title,
  description: outline.description,
  keyPoints: (outline.keyPoints || []).map((p, i) => `${i + 1}. ${p}`).join('\n'),
  elements: '(根据要点自动生成)',
  assignedImages: assignedImagesText
});

Final Response Handling

Responses are parsed and enriched to produce final content for each scene.

Example JSON Parsing and Enrichment

const response = await aiCall(prompts.system, prompts.user);
const generatedData = parseJsonResponse<GeneratedSlideData>(response);

// 后处理...

Summary

The process involves multiple stages of data parsing, prompt construction, model interaction, and response handling to generate structured educational content. This system ensures that each stage is modular and flexible, allowing for efficient generation of course materials.

This document provides a comprehensive overview of the stages involved in generating course outlines and detailed scene content from user inputs.

Element Types

TextElement

Text elements are used to display text content on the canvas. Here is an example of how a TextElement might be defined in JSON format:

{
  "id": "text_001",
  "type": "text",
  "left": 60,
  "top": 80,
  "width": 880,
  "height": 76,
  "content": "<p style=\"font-size: 24px;\">Title text</p>",
  "defaultFontName": "",
  "defaultColor": "#333333"
}

ImageElement

Image elements are used to display images within the canvas. The ImageElement JSON definition includes details such as position, size, and source image identifier:

{
  "id": "image_001",
  "type": "image",
  "left": 100,
  "top": 150,
  "width": 400,
  "height": 300,
  "src": "img_1",
  "fixedRatio": true
}

Text Size Adjustment

The TextElement height is often adjusted based on the text size. For example, a larger font size might require a taller element to prevent text overflow:

{
  "id": "text_large",
  "type": "text",
  "left": 60,
  "top": 80,
  "width": 880,
  "height": 120,   // Larger height for bigger font sizes
  "content": "<p style=\"font-size: 36px;\">Large Title</p>",
  "defaultFontName": "",
  "defaultColor": "#333333"
}

Text Size Guidelines

The height of a text element should be determined based on the font size and number of lines. For example, in the table below, each row indicates the recommended height for different font sizes:

Font Size (px)Recommended Height (px)
1640
2476
36120

Element Positioning

Elements should be positioned relative to the canvas bounds. Common positioning strategies include center alignment, left or right alignment based on content type:

{
  "id": "center_aligned",
  "type": "text",
  "left": 450,
  "top": 200,
  "width": 880,
  "height": 76,
  "content": "<p style=\"font-size: 24px;\">Center Aligned Text</p>",
  "defaultFontName": "",
  "defaultColor": "#333333"
}

Background and Border

Elements may have a background or border to enhance their visibility:

{
  "id": "bordered",
  "type": "text",
  "left": 60,
  "top": 80,
  "width": 880,
  "height": 76,
  "content": "<p style=\"font-size: 24px;\">Bordered Text</p>",
  "defaultFontName": "",
  "defaultColor": "#333333",
  "borderWidth": 1,
  "borderColor": "#000000",
  "backgroundColor": "#FFFFFF"
}

Design Requirements

Speech Content

Generated speech content should be natural and engaging. It must also maintain continuity throughout the session:

{
    "type": "text",
    "content": "同学们好!今天我们来学习光合作用,这是绿色植物最重要的生理过程之一。"
  }

Opening/Transition

The opening of each page should feel cohesive with the previous content. For example, a greeting is only necessary on the first page:

{
    "type": "text",
    "content": "让我们来看看光合作用的三个关键点:定义、场所和反应式。"
  },

Closing Summary

The last page should summarize and conclude the session effectively:

{
    "type": "text",
    "content": "今天我们学习了光合作用的基本概念,希望大家能熟练掌握这些知识。",
}

Action Sequences

Action sequences define the timing and order of elements appearing on the screen. Each action corresponds to a specific element ID or content type:

{
  "type": "action",
  "name": "spotlight",
  "params": { "elementId": "title_001" }
}

Timing Adjustment

Timing adjustments ensure smooth transitions between elements and actions. Typically, each action occurs after a predefined delay (e.g., every 3 seconds):

function processActions(
  actions: Action[],
  elements: PPTElement[],
  agents?: AgentInfo[]
): Action[] {
  return actions.map((action, index) => ({
    ...action,
    id: `action_${nanoid(8)}`,
    timing: index * 3000, // Default delay of 3 seconds between actions
  }));
}

Error Handling and Recovery

In case an action fails to execute properly (e.g., invalid element ID), fallback mechanisms ensure the session continues smoothly:

{
  "type": "action",
  "name": "spotlight",
  "params": { 
    "elementId": elements.find(el => el.id === action.params?.elementId) ? action.params.elementId : elements[0]?.id, 
  },
}

By following these guidelines and structures, the course content can be effectively generated and presented in an interactive and engaging manner.