Transformers.js

概述与核心价值

Transformers.js 是一个基于 JavaScript 的开源库，允许在浏览器和 Node.js 中直接运行 Hugging Face 的 Transformer 模型，无需服务器依赖。它将最先进的机器学习能力引入 JavaScript 开发环境，支持自然语言处理、计算机视觉、音频处理和多模态任务。

核心价值示意图：

Python训练框架 → ONNX转换 → Transformers.js → 浏览器/Node.js部署
    ↓                   ↓           ↓               ↓
PyTorch/TensorFlow   .onnx文件    JavaScript    前端应用
                   模型格式       运行时        边缘设备

核心架构与技术特点

运行时架构

Transformers.js 采用多层架构设计，通过 ONNX 运行时在浏览器中执行模型。该库支持多种后端，包括 WebAssembly、WebGPU 和 WebNN，可根据运行环境自动选择最佳执行路径。

架构示意图：

[输入数据] → [预处理] → [ONNX模型] → [推理引擎] → [后处理] → [输出结果]
     ↓          ↓          ↓           ↓           ↓         ↓
  文本/图像   分词器    计算图表示   ONNX Runtime  格式转换   预测结果
                        序列化     硬件加速             结构化数据

技术特点

Transformers.js 具有几个突出的技术特点：即时反馈能力，用户输入数据后模型直接在浏览器中处理，无需等待服务器响应；隐私保护特性，所有数据处理都在客户端完成，敏感信息不会离开用户设备；轻量级部署，模型直接下载并执行于客户端，减少了服务器负担和带宽消耗；多模态支持，涵盖文本、图像、音频等多种数据类型。

环境安装与配置

安装与导入

通过 npm 安装 Transformers.js：

bash

npm install @xenova/transformers

ESM 导入方式：

javascript

import { pipeline, Tensor, env } from '@xenova/transformers';

// 配置环境变量
env.allowLocalModels = false;
env.backends.onnx.wasm.numThreads = 1;

运行环境检测

javascript

import { env } from '@xenova/transformers';

class EnvironmentDetector {
    static checkCapabilities() {
        const capabilities = {
            hasWebGPU: typeof navigator !== 'undefined' && 
                       navigator.gpu !== undefined,
            hasWASM: typeof WebAssembly === 'object',
            backends: env.backends
        };
        
        console.log('环境能力检测:', capabilities);
        return capabilities;
    }
    
    static async optimizeForHardware() {
        const caps = this.checkCapabilities();
        
        if (caps.hasWebGPU) {
            env.backends.onnx.preferredOutputDevice = 'gpu';
            console.log('已启用 WebGPU 加速');
        } else {
            console.log('使用 WASM 后端');
        }
    }
}

// 初始化环境检测
await EnvironmentDetector.optimizeForHardware();

核心 API 与常用任务

Pipeline API

Pipeline 是 Transformers.js 最核心的 API，它封装了完整的模型推理流程，包括预处理、推理和后处理步骤。

Pipeline 工作流程示意图：

原始输入 → [任务识别] → [模型加载] → [输入预处理] → [推理执行] → [结果解析] → 结构化输出
    ↓          ↓           ↓           ↓           ↓          ↓           ↓
  文本数据   文本分类    自动下载    分词/标准化    ONNX运行   概率计算    标签/分数

代码示例：

javascript

import { pipeline } from '@xenova/transformers';

class TransformersPipeline {
    constructor() {
        this.pipes = new Map();
    }
    
    // 初始化管道
    async initPipeline(task, modelId = null) {
        const options = modelId ? { model: modelId } : {};
        
        try {
            const pipe = await pipeline(task, options);
            this.pipes.set(task, pipe);
            console.log(`${task} 管道初始化完成`);
            return pipe;
        } catch (error) {
            console.error(`管道初始化失败: ${error}`);
            throw error;
        }
    }
    
    // 情感分析
    async analyzeSentiment(text) {
        let pipe = this.pipes.get('sentiment-analysis');
        if (!pipe) {
            pipe = await this.initPipeline('sentiment-analysis');
        }
        
        const results = await pipe(text);
        return results.map(result => ({
            label: result.label,
            score: Math.round(result.score * 100) / 100,
            sentiment: result.score > 0.5 ? '积极' : '消极'
        }));
    }
    
    // 文本生成
    async generateText(prompt, maxLength = 50) {
        let pipe = this.pipes.get('text-generation');
        if (!pipe) {
            pipe = await this.initPipeline('text-generation', 'Xenova/distilgpt2');
        }
        
        const results = await pipe(prompt, {
            max_new_tokens: maxLength,
            temperature: 0.7,
            do_sample: true
        });
        
        return results[0].generated_text;
    }
    
    // 问答系统
    async answerQuestion(question, context) {
        let pipe = this.pipes.get('question-answering');
        if (!pipe) {
            pipe = await this.initPipeline('question-answering');
        }
        
        const result = await pipe({
            question: question,
            context: context
        });
        
        return {
            answer: result.answer,
            confidence: Math.round(result.score * 1000) / 1000,
            start: result.start,
            end: result.end
        };
    }
}

// 使用示例
const demo = async () => {
    const pipelineManager = new TransformersPipeline();
    
    // 情感分析示例
    const sentiment = await pipelineManager.analyzeSentiment('I love this product!');
    console.log('情感分析:', sentiment);
    
    // 文本生成示例
    const generated = await pipelineManager.generateText('Once upon a time');
    console.log('生成文本:', generated);
    
    // 问答示例
    const qa = await pipelineManager.answerQuestion(
        'What is the capital of France?',
        'Paris is the capital and most populous city of France.'
    );
    console.log('问答结果:', qa);
};

自定义模型与预处理

javascript

import { AutoTokenizer, AutoModelForSequenceClassification, Tensor } from '@xenova/transformers';

class CustomTextClassifier {
    constructor() {
        this.tokenizer = null;
        this.model = null;
        this.labels = ['负面', '正面'];
    }
    
    async initialize(modelId = 'Xenova/distilbert-base-uncased-finetuned-sst-2-english') {
        console.log('加载自定义分类器...');
        
        // 并行加载 tokenizer 和模型
        [this.tokenizer, this.model] = await Promise.all([
            AutoTokenizer.from_pretrained(modelId),
            AutoModelForSequenceClassification.from_pretrained(modelId)
        ]);
        
        console.log('自定义分类器加载完成');
    }
    
    async classify(text) {
        if (!this.tokenizer || !this.model) {
            await this.initialize();
        }
        
        // 手动分词处理
        const inputs = this.tokenizer(text, {
            padding: true,
            truncation: true,
            return_tensors: 'js'
        });
        
        // 执行推理
        const outputs = await this.model(inputs);
        
        // 处理输出
        const logits = outputs.logits;
        const probabilities = this.softmax(Array.from(logits.data));
        
        const results = probabilities.map((prob, index) => ({
            label: this.labels[index] || `类别 ${index}`,
            score: Math.round(prob * 100) / 100
        }));
        
        // 按分数排序
        results.sort((a, b) => b.score - a.score);
        return results;
    }
    
    softmax(arr) {
        const max = Math.max(...arr);
        const exp = arr.map(x => Math.exp(x - max));
        const sum = exp.reduce((a, b) => a + b);
        return exp.map(x => x / sum);
    }
    
    // 批量处理
    async classifyBatch(texts) {
        const results = [];
        
        for (const text of texts) {
            const result = await this.classify(text);
            results.push({
                text: text,
                prediction: result[0]
            });
        }
        
        return results;
    }
}

// 使用自定义分类器
const classifierDemo = async () => {
    const classifier = new CustomTextClassifier();
    
    const singleResult = await classifier.classify('This movie is fantastic!');
    console.log('单条分类:', singleResult);
    
    const batchResults = await classifier.classifyBatch([
        'I love this product!',
        'This is terrible.',
        'It is okay, not great.'
    ]);
    console.log('批量分类:', batchResults);
};

计算机视觉与多模态应用

图像分类

javascript

import { pipeline, Tensor } from '@xenova/transformers';

class ImageProcessor {
    constructor() {
        this.classifier = null;
        this.detector = null;
    }
    
    async initialize() {
        // 初始化图像分类管道
        this.classifier = await pipeline('image-classification');
        
        // 初始化零样本分类管道
        this.zeroShot = await pipeline('zero-shot-image-classification');
        
        console.log('图像处理器初始化完成');
    }
    
    // 基础图像分类
    async classifyImage(imageElement) {
        if (!this.classifier) await this.initialize();
        
        const results = await this.classifier(imageElement);
        return results.slice(0, 3); // 返回前3个结果
    }
    
    // 零样本图像分类
    async zeroShotClassification(imageElement, candidateLabels) {
        if (!this.zeroShot) await this.initialize();
        
        const results = await this.zeroShot(imageElement, candidateLabels);
        return results.map(result => ({
            label: result.label,
            score: Math.round(result.score * 100) / 100
        }));
    }
    
    // 从URL加载并分类图像
    async classifyImageFromUrl(imageUrl) {
        return new Promise((resolve, reject) => {
            const img = new Image();
            img.crossOrigin = 'anonymous';
            
            img.onload = async () => {
                try {
                    const results = await this.classifyImage(img);
                    resolve(results);
                } catch (error) {
                    reject(error);
                }
            };
            
            img.onerror = reject;
            img.src = imageUrl;
        });
    }
}

// 图像处理示例
const imageDemo = async () => {
    const processor = new ImageProcessor();
    
    // 零样本分类示例
    const imageElement = document.getElementById('test-image');
    const zeroShotResults = await processor.zeroShotClassification(
        imageElement,
        ['animal', 'human', 'landscape', 'building', 'food']
    );
    console.log('零样本分类:', zeroShotResults);
};

多模态管道

javascript

import { pipeline } from '@xenova/transformers';

class MultimodalProcessor {
    async initialize() {
        // 初始化视觉问答管道
        this.vqa = await pipeline('visual-question-answering');
        
        // 初始化文档问答管道  
        this.documentQA = await pipeline('document-question-answering');
        
        console.log('多模态处理器初始化完成');
    }
    
    // 视觉问答
    async visualQuestionAnswering(imageElement, question) {
        if (!this.vqa) await this.initialize();
        
        const result = await this.vqa(imageElement, question);
        return {
            answer: result.answer,
            score: Math.round(result.score * 100) / 100
        };
    }
    
    // 文档问答
    async documentQuestionAnswering(imageElement, question) {
        if (!this.documentQA) await this.initialize();
        
        const result = await this.documentQA(imageElement, question);
        return result;
    }
}

性能优化与高级功能

WebGPU 加速

Transformers.js v3 支持 WebGPU，提供了显著的性能提升。

javascript

import { env, pipeline } from '@xenova/transformers';

class OptimizedPipeline {
    static async createOptimizedPipeline(task, modelId) {
        // 配置 WebGPU 优先
        env.backends.onnx.preferredOutputDevice = 'gpu';
        
        const options = {
            model: modelId,
            quantized: true, // 使用量化模型
            progress_callback: (data) => {
                console.log(`加载进度: ${Math.round(data.progress * 100)}%`);
            }
        };
        
        try {
            const pipe = await pipeline(task, options);
            console.log(`优化管道创建完成 (WebGPU: ${env.backends.onnx.preferredOutputDevice === 'gpu'})`);
            return pipe;
        } catch (error) {
            console.warn('WebGPU 失败，回退到 WASM:', error);
            env.backends.onnx.preferredOutputDevice = 'cpu';
            return await pipeline(task, options);
        }
    }
    
    // 批量处理优化
    static async processBatch(pipe, inputs, batchSize = 4) {
        const results = [];
        
        for (let i = 0; i < inputs.length; i += batchSize) {
            const batch = inputs.slice(i, i + batchSize);
            const batchResults = await Promise.all(
                batch.map(input => pipe(input))
            );
            results.push(...batchResults);
            
            // 避免阻塞主线程
            await new Promise(resolve => setTimeout(resolve, 0));
        }
        
        return results;
    }
}

// 性能监控
class PerformanceMonitor {
    constructor() {
        this.metrics = {
            inferenceTimes: [],
            memoryUsage: []
        };
    }
    
    startTimer() {
        this.startTime = performance.now();
    }
    
    endTimer() {
        const duration = performance.now() - this.startTime;
        this.metrics.inferenceTimes.push(duration);
        return duration;
    }
    
    getStats() {
        const times = this.metrics.inferenceTimes;
        if (times.length === 0) return null;
        
        const avg = times.reduce((a, b) => a + b) / times.length;
        const max = Math.max(...times);
        const min = Math.min(...times);
        
        return {
            averageTime: Math.round(avg * 100) / 100,
            maxTime: Math.round(max * 100) / 100,
            minTime: Math.round(min * 100) / 100,
            totalRuns: times.length
        };
    }
}

模型量化与优化

Transformers.js v3 引入了新的量化格式，包括 8 位 (q8、int8、uint8) 和 4 位 (q4、bnb4、q4f16) 等选项。

javascript

import { pipeline, env } from '@xenova/transformers';

class QuantizedModelManager {
    static async loadQuantizedModel(task, modelId, quantization = 'q4') {
        const options = {
            model: modelId,
            quantized: true,
            // 量化配置
            dtype: quantization,
            revision: 'main', // 模型版本
            // 缓存配置
            cache_dir: './models',
            local_files_only: false
        };
        
        console.log(`加载量化模型: ${modelId}, 格式: ${quantization}`);
        
        const pipe = await pipeline(task, options);
        return pipe;
    }
    
    // 模型缓存管理
    static async manageCache() {
        const cacheStatus = await env.getCache();
        console.log('缓存状态:', {
            size: cacheStatus.size,
            fileCount: cacheStatus.fileCount,
            location: cacheStatus.location
        });
        
        // 清理过期缓存
        await env.cleanCache();
        console.log('缓存清理完成');
    }
}

// 量化模型使用示例
const quantizedDemo = async () => {
    // 加载 4-bit 量化模型
    const qaPipe = await QuantizedModelManager.loadQuantizedModel(
        'question-answering',
        'Xenova/distilbert-base-uncased-distilled-squad',
        'q4'
    );
    
    const result = await qaPipe({
        question: 'What is the capital of France?',
        context: 'Paris is the capital and most populous city of France.'
    });
    
    console.log('量化模型结果:', result);
};

生态系统集成

与 Hugging Face Hub 集成

Transformers.js 与 Hugging Face Hub 紧密集成，可以直接使用 Hub 上的模型。

javascript

import { pipeline, HfInference } from '@xenova/transformers';

class HuggingFaceIntegration {
    constructor(token = null) {
        this.client = token ? new HfInference(token) : null;
    }
    
    // 搜索模型
    async searchModels(searchTerm, limit = 5) {
        if (!this.client) {
            console.warn('未提供 Hugging Face token，使用匿名访问');
            this.client = new HfInference();
        }
        
        try {
            const models = await this.client.listModels({
                search: searchTerm,
                limit: limit,
                sort: 'downloads'
            });
            
            return models.map(model => ({
                id: model.id,
                downloads: model.downloads,
                tags: model.tags,
                pipeline_tag: model.pipeline_tag
            }));
        } catch (error) {
            console.error('模型搜索失败:', error);
            return [];
        }
    }
    
    // 直接从 Hub 加载模型
    async loadFromHub(modelId, task = null) {
        console.log(`从 Hub 加载模型: ${modelId}`);
        
        // 自动检测任务类型
        const actualTask = task || await this.detectTask(modelId);
        
        const pipe = await pipeline(actualTask, {
            model: modelId,
            revision: 'main'
        });
        
        return pipe;
    }
    
    async detectTask(modelId) {
        // 简化实现 - 实际应用中可以从模型配置推断
        if (modelId.includes('gpt') || modelId.includes('text-gen')) {
            return 'text-generation';
        } else if (modelId.includes('bert') || modelId.includes('class')) {
            return 'text-classification';
        } else if (modelId.includes('clip') || modelId.includes('vision')) {
            return 'zero-shot-image-classification';
        }
        
        return 'text-classification'; // 默认回退
    }
}

// Hub 集成示例
const hubDemo = async () => {
    const hub = new HuggingFaceIntegration();
    
    // 搜索文本分类模型
    const models = await hub.searchModels('text classification', 3);
    console.log('搜索到的模型:', models);
    
    if (models.length > 0) {
        const pipe = await hub.loadFromHub(models[0].id);
        // 使用模型...
    }
};

实际应用案例

javascript

import { pipeline } from '@xenova/transformers';

// 实时翻译应用
class TranslatorApp {
    async initialize() {
        this.translator = await pipeline('translation', {
            model: 'Xenova/helsinki-translation-en-fr'
        });
    }
    
    async translateText(text, targetLang = 'french') {
        if (!this.translator) await this.initialize();
        
        const result = await this.translator(text, {
            tgt_lang: targetLang
        });
        
        return result[0].translation_text;
    }
}

// 智能客服系统
class ChatbotService {
    async initialize() {
        this.classifier = await pipeline('text-classification');
        this.generator = await pipeline('text-generation', {
            model: 'Xenova/distilgpt2'
        });
    }
    
    async processMessage(message) {
        if (!this.classifier || !this.generator) await this.initialize();
        
        // 分析消息情感
        const sentiment = await this.classifier(message);
        
        // 根据情感生成回复
        const prompt = this.createPrompt(message, sentiment[0].label);
        const response = await this.generator(prompt, {
            max_new_tokens: 50,
            temperature: 0.7
        });
        
        return {
            response: response[0].generated_text,
            sentiment: sentiment[0].label,
            confidence: sentiment[0].score
        };
    }
    
    createPrompt(message, sentiment) {
        return `用户消息: "${message}" (情感: ${sentiment})
        助手回复:`;
    }
}

在 GitHub 上编辑此页

Transformers.js ​

概述与核心价值 ​

核心架构与技术特点 ​

运行时架构 ​

技术特点 ​

环境安装与配置 ​

安装与导入 ​

运行环境检测 ​

核心 API 与常用任务 ​

Pipeline API ​

自定义模型与预处理 ​

计算机视觉与多模态应用 ​

图像分类 ​

多模态管道 ​

性能优化与高级功能 ​

WebGPU 加速 ​

模型量化与优化 ​

生态系统集成 ​

与 Hugging Face Hub 集成 ​

实际应用案例 ​

Transformers.js

概述与核心价值

核心架构与技术特点

运行时架构

技术特点

环境安装与配置

安装与导入

运行环境检测

核心 API 与常用任务

Pipeline API

自定义模型与预处理

计算机视觉与多模态应用

图像分类

多模态管道

性能优化与高级功能

WebGPU 加速

模型量化与优化

生态系统集成

与 Hugging Face Hub 集成

实际应用案例