google-gemini-embeddings — quality + safety report

Name: google-gemini-embeddings — quality + safety report
Item: google-gemini-embeddings
Rating: 90
Author: Skillproof

In the Skillier index (secondsky__google-gemini-embeddings) · scanned 2026-06-03 · engine: builtin+triage

Quality

90/100

Safety

6 heuristic flags to review

Heuristic flags from the builtin scanner, which is known to over-flag (it trips on legitimate env-reading integrations, security skills, and library .eval calls). This is NOT an authoritative malicious verdict — re-scan with SkillSpector for the authoritative result. Run the authoritative scan →

📇 This skill is in the Skillier index (curated · deduped · quality-filtered). Install Skillier to route & load it into your AI client.

Quality notes

No quality issues flagged. ✓

About this skill

Google Gemini embeddings API gemini-embedding-001 for RAG and semantic search. Use for vector search, Vectorize integration, or encountering dimension mismatches, rate limits, text truncation.

📄 Read the SKILL.md

---
name: google-gemini-embeddings
description: "Google Gemini embeddings API (gemini-embedding-001) for RAG and semantic search. Use for vector search, Vectorize integration, or encountering dimension mismatches, rate limits, text truncation."
license: MIT
metadata:
  version: 1.0.0
  last_updated: 2025-11-21
  tested_package_version: "@google/genai@1.27.0"
  target_audience: "Developers building RAG, semantic search, or vector-based applications"
  complexity: intermediate
  estimated_reading_time: "8 minutes"
  tokens_saved: "~60%"
  errors_prevented: 8
  production_tested: true
  keywords:
    - gemini embeddings
    - gemini-embedding-001
    - google embeddings
    - semantic search
    - RAG
    - vector search
    - document clustering
    - similarity search
    - retrieval augmented generation
    - vectorize integration
    - cloudflare vectorize embeddings
    - 768 dimensions
    - embed content gemini
    - batch embeddings
    - embeddings api
    - cosine similarity
    - vector normalization
    - retrieval query
    - retrieval document
    - task types
    - dimension mismatch
    - embeddings rate limit
    - text truncation
    - "@google/genai"
---
# Google Gemini Embeddings

**Complete production-ready guide for Google Gemini embeddings API**

This skill provides comprehensive coverage of the `gemini-embedding-001` model for generating text embeddings, including SDK usage, REST API patterns, batch processing, RAG integration with Cloudflare Vectorize, and advanced use cases like semantic search and document clustering.

---

## Table of Contents

1. [Quick Start](#1-quick-start)
2. [gemini-embedding-001 Model](#2-gemini-embedding-001-model)
3. [Basic Embeddings](#3-basic-embeddings)
4. [Batch Embeddings](#4-batch-embeddings)
5. [Task Types](#5-task-types)
6. [Top 5 Errors](#6-top-5-errors)
7. [Best Practices](#7-best-practices)
8. [When to Load References](#8-when-to-load-references)

---

## 1. Quick Start

### Installation

Install the Google Generative AI SDK:

```bash
bun add @google/genai@^1.27.0
```

For TypeScript projects:

```bash
bun add -d typescript@^5.0.0
```

### Environment Setup

Set your Gemini API key as an environment variable:

```bash
export GEMINI_API_KEY="your-api-key-here"
```

Get your API key from: https://aistudio.google.com/apikey

### First Embedding Example

```typescript
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const response = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: 'What is the meaning of life?',
  config: {
    taskType: 'RETRIEVAL_QUERY',
    outputDimensionality: 768
  }
});

console.log(response.embedding.values); // [0.012, -0.034, ...]
console.log(response.embedding.values.length); // 768
```

**Result**: A 768-dimension embedding vector representing the semantic meaning of the text.

---

## 2. gemini-embedding-001 Model

### Model Specifications

**Current Model**: `gemini-embedding-001` (stable, production-ready)
- **Status**: Stable
- **Experimental**: `gemini-embedding-exp-03-07` (deprecated October 2025, do not use)

### Dimensions

The model supports flexible output dimensionality using **Matryoshka Representation Learning**:

| Dimension | Use Case | Storage | Performance |
|-----------|----------|---------|-------------|
| **768** | Recommended for most use cases | Low | Fast |
| **1536** | Balance between accuracy and efficiency | Medium | Medium |
| **3072** | Maximum accuracy (default) | High | Slower |

**Default**: 3072 dimensions
**Recommended**: 768 dimensions for most RAG applications

Load `references/dimension-guide.md` when you need detailed comparisons of storage costs, accuracy trade-offs, or migration strategies between dimensions.

Load `references/model-comparison.md` when comparing Gemini embeddings with OpenAI (text-embedding-3-small/large) or Cloudflare Workers AI (BGE).

### Rate Limits

| Tier | RPM | TPM | RPD |
|------|-----|-----|-----|
| **Free** | 100 | 30,000 | 1,000 |
| **Tier 1** | 3,000 | 1,000,000 | - |

**RPM** = Requests Per Minute, **TPM** = Tokens Per Minute, **RPD** = Requests Per Day

### Context Window

- **Input Limit**: 2,048 tokens per text
- **Input Type**: Text only (no images, audio, or video)

---

## 3. Basic Embeddings

### SDK Approach (Node.js)

**Single text embedding**:

```typescript
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const response = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: 'The quick brown fox jumps over the lazy dog',
  config: {
    taskType: 'SEMANTIC_SIMILARITY',
    outputDimensionality: 768
  }
});

console.log(response.embedding.values);
// [0.00388, -0.00762, 0.01543, ...]
```

### Fetch Approach (Cloudflare Workers)

**For Workers/edge environments without SDK support**:

```typescript
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const apiKey = env.GEMINI_API_KEY;
    const text = "What is the meaning of life?";

    const response = await fetch(
      'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
      {
        method: 'POST',
        headers: {
          'x-goog-api-key': apiKey,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({
          content: {
            parts: [{ text }]
          },
          taskType: 'RETRIEVAL_QUERY',
          outputDimensionality: 768
        })
      }
    );

    const data = await response.json();

    // Response format:
    // {
    //   embedding: {
    //     values: [0.012, -0.034, ...]
    //   }
    // }

    return new Response(JSON.stringify(data), {
      headers: { 'Content-Type': 'application/json' }
    });
  }
};
```

### Response Parsing

```typescript
interface EmbeddingResponse {
  embedding: {
    values: number[];
  };
}

const response: EmbeddingResponse = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: 'Sample text',
  config: { taskType: 'SEMANTIC_SIMILARITY', outputDimensionality: 768 }
});

const embedding: number[] = response.embedding.values;
const dimensions: number = embedding.length; // 768
```

---

## 4. Batch Embeddings

### Multiple Texts in One Request (SDK)

Generate embeddings for multiple texts simultaneously:

```typescript
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const texts = [
  "What is the meaning of life?",
  "How does photosynthesis work?",
  "Tell me about the history of the internet."
];

const response = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  contents: texts, // Array of strings
  config: {
    taskType: 'RETRIEVAL_DOCUMENT',
    outputDimensionality: 768
  }
});

// Process each embedding
response.embeddings.forEach((embedding, index) => {
  console.log(`Text ${index}: ${texts[index]}`);
  console.log(`Embedding: ${embedding.values.slice(0, 5)}...`);
  console.log(`Dimensions: ${embedding.values.length}`);
});
```

### Chunking for Rate Limits

When processing large datasets, chunk requests to stay within rate limits:

```typescript
async function batchEmbedWithRateLimit(
  texts: string[],
  batchSize: number = 100, // Free tier: 100 RPM
  delayMs: number = 60000 // 1 minute delay between batches
): Promise<number[][]> {
  const allEmbeddings: number[][] = [];

  for (let i = 0; i < texts.length; i += batchSize) {
    const batch = texts.slice(i, i + batchSize);

    console.log(`Processing batch ${i / batchSize + 1} (${batch.length} texts)`);

    const response = await ai.models.embedContent({
      model: 'gemini-embedding-001',
      contents: batch,
      config: {
        taskType: 'RETRIEVAL_DOCUMENT',
        outputDimensionality: 768
      }
    });

    allEmbeddings.push(...response.embeddings.map(e => e.values));

    // Wait before next batch (except last batch)
    if (i + batchSize < texts.length) {
      await new Promise(resolve => setTimeout(resolve, delayMs));
    }
  }

  return allEmbeddings;
}

// Usage
const embeddings = await batchEmbedWithRateLimit(documents, 100);
```

---

## 5. Task Types

The `taskType` parameter optimizes embeddings for specific use cases. **Always specify a task type for best results.**

### Available Task Types (8 total)

| Task Type | Use Case | Example |
|-----------|----------|---------|
| **RETRIEVAL_QUERY** | User search queries | "How do I fix a flat tire?" |
| **RETRIEVAL_DOCUMENT** | Documents to be indexed/searched | Product descriptions, articles |
| **SEMANTIC_SIMILARITY** | Comparing text similarity | Duplicate detection, clustering |
| **CLASSIFICATION** | Categorizing texts | Spam detection, sentiment analysis |
| **CLUSTERING** | Grouping similar texts | Topic modeling, content organization |
| **CODE_RETRIEVAL_QUERY** | Code search queries | "function to sort array" |
| **QUESTION_ANSWERING** | Questions seeking answers | FAQ matching |
| **FACT_VERIFICATION** | Verifying claims with evidence | Fact-checking systems |

### RAG Systems (Most Common)

```typescript
// When embedding user queries
const queryEmbedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: userQuery,
  config: {
    taskType: 'RETRIEVAL_QUERY', // ← Use RETRIEVAL_QUERY
    outputDimensionality: 768
  }
});

// When embedding documents for indexing
const docEmbedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: documentText,
  config: {
    taskType: 'RETRIEVAL_DOCUMENT', // ← Use RETRIEVAL_DOCUMENT
    outputDimensionality: 768
  }
});
```

**Impact**: Using correct task type improves search relevance by 10-30%.

---

## 6. Top 5 Errors

### Error 1: Dimension Mismatch

**Error**: `Vector dimensions do not match. Expected 768, got 3072`

**Cause**: Not specifying `outputDimensionality` parameter (defaults to 3072).

**Fix**:
```typescript
// ❌ BAD: No outputDimensionality (defaults to 3072)
const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: text
});

// ✅ GOOD: Match Vectorize index dimensions
const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: text,
  config: { outputDimensionality: 768 } // ← Match your index
});
```

### Error 2: Rate Limiting (429 Too Many Requests)

**Error**: `429 Too Many Requests - Rate limit exceeded`

**Cause**: Exceeded 100 requests per minute (free tier).

**Fix**:
```typescript
// ✅ GOOD: Exponential backoff
async function embedWithRetry(text: string, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await ai.models.embedContent({
        model: 'gemini-embedding-001',
        content: text,
        config: { taskType: 'SEMANTIC_SIMILARITY', outputDimensionality: 768 }
      });
    } catch (error: any) {
      if (error.status === 429 && attempt < maxRetries - 1) {
        const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }
      throw error;
    }
  }
}
```

### Error 3: Text Truncation (Silent)

**Error**: No error! Text is **silently truncated** at 2,048 tokens.

**Cause**: Input text exceeds 2,048 token limit.

**Fix**: Chunk long texts before embedding:
```typescript
function chunkText(text: string, maxTokens = 2000): string[] {
  const words = text.split(/\s+/);
  const chunks: string[] = [];
  let currentChunk: string[] = [];

  for (const word of words) {
    currentChunk.push(word);

    // Rough estimate: 1 token ≈ 0.75 words
    if (currentChunk.length * 0.75 >= maxTokens) {
      chunks.push(currentChunk.join(' '));
      currentChunk = [];
    }
  }

  if (currentChunk.length > 0) {
    chunks.push(currentChunk.join(' '));
  }

  return chunks;
}
```

### Error 4: Incorrect Task Type

**Error**: No error, but search quality

… (truncated)

Scan or optimize your own skill →

Want a live grade + an embeddable README badge? Run your skill through the free scanner.

Graded independently by Skillproof — nothing to sell the author. Quality is mechanical + corpus-grounded; safety flags are heuristic (builtin+triage), not a malicious verdict.