Bosbase JS SDK: LLM Documents Tutorial

This tutorial will guide you through using the Bosbase JavaScript SDK to create, manage, and query LLM documents. LLM documents are backed by an embedded chromem-go vector store, enabling semantic search and retrieval-augmented generation (RAG) capabilities.

Table of Contents

Introduction

LLM documents in Bosbase are designed for semantic search and RAG applications. Each document contains:

  • Content: The text that will be embedded for semantic search
  • Metadata: Optional key-value pairs for filtering and categorization
  • Embedding: Vector representation (auto-generated if not provided)

The SDK provides a simple interface to:

  • Create logical collections to organize documents
  • Insert documents with automatic embedding generation
  • Query documents using semantic similarity search
  • Update, list, and delete documents

Application Scenarios & Use Cases

What Problems Does LLM Documents Solve?

LLM documents address critical challenges in modern applications that need to work with unstructured text data and provide intelligent search capabilities:

1. Traditional Keyword Search Limitations

Problem: Traditional keyword-based search fails when users ask questions in natural language or use different terminology than what's in the documents.

Solution: Semantic search understands the meaning behind queries, not just exact word matches. Users can ask "How do I reset my password?" and find documents about "password recovery" or "account access restoration" even if those exact words aren't in the query.

Example Scenario: A customer support knowledge base where users search for "I can't log in" and find relevant articles about authentication, password issues, and account recovery.

2. Building Intelligent Chatbots and AI Assistants

Problem: LLMs like GPT-4 have knowledge cutoffs and may not have access to your company-specific information, policies, or recent updates.

Solution: RAG (Retrieval-Augmented Generation) combines semantic search with LLMs. When a user asks a question, the system first retrieves relevant documents from your knowledge base, then uses that context to generate accurate, up-to-date answers.

Example Scenario: An internal company chatbot that answers questions about HR policies, product documentation, or technical procedures by retrieving the most relevant internal documents before generating a response.

3. Knowledge Management at Scale

Problem: As organizations grow, finding the right information across thousands of documents, wikis, and knowledge bases becomes increasingly difficult.

Solution: Semantic search enables employees to find information using natural language queries, regardless of where the information is stored or how it's worded.

Example Scenario: A large organization with multiple departments can create a unified search experience across product documentation, engineering wikis, sales materials, and customer support articles.

4. Personalized Content Discovery

Problem: Users struggle to find relevant content in large content libraries, leading to poor engagement and user experience.

Solution: Semantic search can understand user intent and surface relevant content even when users don't know exactly what they're looking for.

Example Scenario: An e-learning platform where students can search for "ways to improve my writing" and discover courses on grammar, creative writing, business communication, and academic writing, even if those courses don't contain those exact keywords.

Real-World Application Scenarios

Customer Support & Help Desks

Use Case: Build intelligent support systems that understand customer questions in natural language.

How It Works:

  • Store all support articles, FAQs, and troubleshooting guides as LLM documents
  • When customers ask questions, use semantic search to find the most relevant articles
  • Provide instant, accurate answers without requiring customers to know exact keywords

Business Value:

  • Reduces support ticket volume by 40-60%
  • Improves customer satisfaction with faster, more accurate responses
  • Enables 24/7 self-service support

Example:

// Customer asks: "My payment didn't go through"
const results = await pb.llmDocuments.query(
    {
        queryText: "My payment didn't go through",
        limit: 3,
        where: { category: 'billing' }
    },
    { collection: 'support-docs' }
);
// Returns articles about: payment failures, declined cards, billing issues

Internal Knowledge Bases

Use Case: Create searchable company wikis and documentation systems.

How It Works:

  • Store company policies, procedures, technical documentation, and best practices
  • Employees can search using natural language: "What's the vacation policy?" or "How do I set up a new project?"
  • Find information across departments and documentation types

Business Value:

  • Reduces time spent searching for information
  • Ensures employees find the most current and accurate information
  • Improves onboarding for new employees

Example:

// Employee searches: "What's the process for expense reimbursement?"
const results = await pb.llmDocuments.query(
    {
        queryText: "What's the process for expense reimbursement?",
        limit: 5,
        where: { department: 'finance', type: 'policy' }
    },
    { collection: 'company-wiki' }
);

Use Case: Enable customers to find products using natural language descriptions.

How It Works:

  • Store product descriptions, features, and specifications as LLM documents
  • Customers can search using intent: "comfortable running shoes for flat feet" instead of exact product names
  • Surface products based on semantic similarity to customer needs

Business Value:

  • Increases conversion rates by helping customers find what they need
  • Reduces bounce rates from failed searches
  • Improves customer experience with intuitive search

Example:

// Customer searches: "warm winter jacket for hiking"
const results = await pb.llmDocuments.query(
    {
        queryText: "warm winter jacket for hiking",
        limit: 10,
        where: { category: 'outerwear', activity: 'hiking' }
    },
    { collection: 'product-catalog' }
);

Use Case: Help legal teams and compliance officers quickly find relevant documents, cases, or regulations.

How It Works:

  • Store legal documents, case law, regulations, and compliance guidelines
  • Search using natural language: "What are the data retention requirements for financial records?"
  • Filter by jurisdiction, document type, or date

Business Value:

  • Dramatically reduces research time
  • Ensures comprehensive coverage of relevant documents
  • Helps maintain compliance with changing regulations

Example:

// Legal team searches: "GDPR requirements for customer data deletion"
const results = await pb.llmDocuments.query(
    {
        queryText: "GDPR requirements for customer data deletion",
        limit: 5,
        where: { jurisdiction: 'EU', type: 'regulation' }
    },
    { collection: 'legal-docs' }
);

Content Recommendation Systems

Use Case: Recommend relevant articles, videos, or content based on user interests and behavior.

How It Works:

  • Store content metadata and descriptions as LLM documents
  • When users interact with content, use semantic search to find similar items
  • Recommend content that's semantically similar, not just categorically similar

Business Value:

  • Increases content engagement and time on site
  • Improves user retention through better discovery
  • Maximizes content value by surfacing relevant pieces

Example:

// User reads article about "JavaScript async patterns"
// System recommends similar content
const recommendations = await pb.llmDocuments.query(
    {
        queryText: "JavaScript async patterns promises callbacks",
        limit: 5,
        where: { type: 'article', status: 'published' }
    },
    { collection: 'blog-content' }
);

Research & Academic Applications

Use Case: Enable researchers to find relevant papers, studies, and resources using semantic search.

How It Works:

  • Store research papers, abstracts, and study descriptions
  • Researchers can search using research questions or concepts
  • Find papers that are semantically related, even if they use different terminology

Business Value:

  • Accelerates literature review processes
  • Discovers connections between research areas
  • Improves research quality through comprehensive discovery

Example:

// Researcher searches: "machine learning approaches to climate prediction"
const results = await pb.llmDocuments.query(
    {
        queryText: "machine learning approaches to climate prediction",
        limit: 20,
        where: { field: 'climate-science', method: 'machine-learning' }
    },
    { collection: 'research-papers' }
);

Use Case: Help developers find relevant documentation, code examples, and API references.

How It Works:

  • Store API documentation, code examples, tutorials, and guides
  • Developers can search using questions: "How do I authenticate API requests?" or "Example of handling errors"
  • Find documentation that explains concepts, not just exact API endpoints

Business Value:

  • Reduces developer onboarding time
  • Improves developer productivity
  • Decreases support requests for common questions

Example:

// Developer searches: "How to handle authentication errors in the API"
const results = await pb.llmDocuments.query(
    {
        queryText: "How to handle authentication errors in the API",
        limit: 5,
        where: { type: 'api-docs', category: 'authentication' }
    },
    { collection: 'developer-docs' }
);

Key Benefits Summary

  1. Natural Language Understanding: Users don't need to know exact keywords or terminology
  2. Context-Aware Search: Finds relevant content based on meaning, not just word matching
  3. Scalable: Works efficiently with thousands or millions of documents
  4. Easy Integration: Simple JavaScript SDK makes it easy to add to any application
  5. Cost-Effective: Embedded vector store means no additional infrastructure costs
  6. Flexible Filtering: Combine semantic search with metadata filters for precise results
  7. RAG-Ready: Perfect foundation for building retrieval-augmented generation applications

When to Use LLM Documents

Use LLM Documents when you need to:

  • ✅ Search through unstructured text content
  • ✅ Build chatbots or AI assistants with domain knowledge
  • ✅ Create intelligent search experiences
  • ✅ Implement RAG (Retrieval-Augmented Generation) systems
  • ✅ Find similar or related content
  • ✅ Build recommendation systems
  • ✅ Enable natural language queries

Consider alternatives when:

  • ❌ You only need exact keyword matching (traditional search may be sufficient)
  • ❌ Your content is highly structured (SQL queries might be better)
  • ❌ You need real-time updates with sub-second latency (consider specialized vector databases)
  • ❌ You're working with non-text data (images, audio, etc.)

Installation

Using npm

npm install bosbase --save

Using ES modules

import BosBase from 'bosbase';

Using CommonJS

const BosBase = require('bosbase/cjs');

Getting Started

First, initialize a BosBase client instance:

import BosBase from 'bosbase';

// Initialize the client with your BosBase instance URL
const pb = new BosBase('http://localhost:8090');

// If you need authentication (for protected operations)
// await pb.collection('users').authWithPassword('[email protected]', 'password');

Creating Collections

Collections are logical namespaces that organize your documents. You can create a collection with optional metadata:

// Create a collection for your knowledge base
await pb.llmDocuments.createCollection('knowledge-base', {
    domain: 'internal',
    description: 'Company knowledge base'
});

// Create another collection for customer support
await pb.llmDocuments.createCollection('support-docs', {
    domain: 'customer-facing',
    category: 'support'
});

Listing Collections

To see all available collections:

const collections = await pb.llmDocuments.listCollections();

collections.forEach(collection => {
    console.log(`Collection: ${collection.name}`);
    console.log(`  Count: ${collection.count} documents`);
    console.log(`  Metadata:`, collection.metadata);
});

Deleting Collections

To remove a collection and all its documents:

await pb.llmDocuments.deleteCollection('knowledge-base');

Creating Documents

Basic Document Creation

The simplest way to create a document is to provide content:

const doc = await pb.llmDocuments.insert(
    {
        content: 'Leaves are green because chlorophyll absorbs red and blue light.',
    },
    { collection: 'knowledge-base' }
);

console.log(`Created document with ID: ${doc.id}`);

Documents with Metadata

Add metadata to help filter and categorize documents:

const doc = await pb.llmDocuments.insert(
    {
        content: 'The sky is blue because of Rayleigh scattering.',
        metadata: {
            topic: 'physics',
            difficulty: 'intermediate',
            source: 'textbook'
        }
    },
    { collection: 'knowledge-base' }
);

Documents with Custom IDs

You can specify a custom ID when creating a document:

await pb.llmDocuments.insert(
    {
        id: 'sky-blue-explanation',
        content: 'The sky is blue because of Rayleigh scattering.',
        metadata: { topic: 'physics' }
    },
    { collection: 'knowledge-base' }
);

Batch Document Creation

Create multiple documents efficiently:

const documents = [
    {
        content: 'Photosynthesis is the process by which plants convert light energy into chemical energy.',
        metadata: { topic: 'biology', category: 'process' }
    },
    {
        content: 'Water boils at 100 degrees Celsius at sea level.',
        metadata: { topic: 'chemistry', category: 'fact' }
    },
    {
        content: 'The speed of light in a vacuum is approximately 299,792,458 meters per second.',
        metadata: { topic: 'physics', category: 'constant' }
    }
];

// Insert documents one by one
for (const doc of documents) {
    await pb.llmDocuments.insert(doc, { collection: 'knowledge-base' });
    console.log(`Inserted: ${doc.content.substring(0, 50)}...`);
}

Querying Documents

Query documents using natural language. The SDK will automatically convert your query text into an embedding and find similar documents:

const result = await pb.llmDocuments.query(
    {
        queryText: 'Why is the sky blue?',
        limit: 5
    },
    { collection: 'knowledge-base' }
);

result.results.forEach(match => {
    console.log(`ID: ${match.id}`);
    console.log(`Similarity: ${match.similarity}`);
    console.log(`Content: ${match.content}`);
    console.log(`Metadata:`, match.metadata);
    console.log('---');
});

Filtered Queries

Filter results by metadata:

const result = await pb.llmDocuments.query(
    {
        queryText: 'Why is the sky blue?',
        limit: 3,
        where: { topic: 'physics' }
    },
    { collection: 'knowledge-base' }
);

Multiple Metadata Filters

You can filter by multiple metadata fields:

const result = await pb.llmDocuments.query(
    {
        queryText: 'How do plants make energy?',
        limit: 5,
        where: {
            topic: 'biology',
            difficulty: 'beginner'
        }
    },
    { collection: 'knowledge-base' }
);

Advanced Query Options

For more control, you can use query embeddings directly or add negative examples:

const result = await pb.llmDocuments.query(
    {
        queryText: 'What causes colors in nature?',
        limit: 10,
        where: { topic: 'biology' },
        negative: {
            text: 'artificial colors',
            mode: 'filter',
            filterThreshold: 0.5
        }
    },
    { collection: 'knowledge-base' }
);

Managing Documents

Getting a Single Document

Retrieve a document by its ID:

const doc = await pb.llmDocuments.get('sky-blue-explanation', {
    collection: 'knowledge-base'
});

console.log('Document:', doc);

Listing Documents with Pagination

List all documents in a collection with pagination:

// Get first page (25 items per page by default)
const page = await pb.llmDocuments.list({
    collection: 'knowledge-base',
    page: 1,
    perPage: 25
});

console.log(`Total items: ${page.totalItems}`);
console.log(`Page ${page.page} of ${Math.ceil(page.totalItems / page.perPage)}`);

page.items.forEach(doc => {
    console.log(`- ${doc.id}: ${doc.content.substring(0, 50)}...`);
});

// Get next page
if (page.page * page.perPage < page.totalItems) {
    const nextPage = await pb.llmDocuments.list({
        collection: 'knowledge-base',
        page: page.page + 1,
        perPage: 25
    });
}

Updating Documents

Update document content or metadata:

// Update metadata only
await pb.llmDocuments.update(
    'sky-blue-explanation',
    {
        metadata: {
            topic: 'physics',
            reviewed: 'true',
            lastUpdated: new Date().toISOString()
        }
    },
    { collection: 'knowledge-base' }
);

// Update content and metadata
await pb.llmDocuments.update(
    'sky-blue-explanation',
    {
        content: 'The sky appears blue due to Rayleigh scattering, where shorter wavelengths of light are scattered more than longer wavelengths.',
        metadata: {
            topic: 'physics',
            accuracy: 'high',
            reviewed: 'true'
        }
    },
    { collection: 'knowledge-base' }
);

Deleting Documents

Remove a document from a collection:

await pb.llmDocuments.delete('sky-blue-explanation', {
    collection: 'knowledge-base'
});

Complete Examples

Example 1: Building a Knowledge Base

import BosBase from 'bosbase';

const pb = new BosBase('http://localhost:8090');

async function buildKnowledgeBase() {
    // Create collection
    await pb.llmDocuments.createCollection('company-kb', {
        domain: 'internal',
        version: '1.0'
    });

    // Add company policies
    const policies = [
        {
            content: 'All employees must complete security training annually. Training covers phishing, password management, and data handling procedures.',
            metadata: { category: 'policy', type: 'security', department: 'all' }
        },
        {
            content: 'Remote work is allowed up to 3 days per week. Employees must coordinate with their managers and ensure adequate home office setup.',
            metadata: { category: 'policy', type: 'workplace', department: 'all' }
        },
        {
            content: 'Expense reports must be submitted within 30 days. All receipts must be itemized and include business purpose.',
            metadata: { category: 'policy', type: 'finance', department: 'all' }
        }
    ];

    for (const policy of policies) {
        await pb.llmDocuments.insert(policy, { collection: 'company-kb' });
    }

    console.log('Knowledge base created successfully!');
}

buildKnowledgeBase().catch(console.error);

Example 2: Semantic Search for Support

async function searchSupportDocs(userQuestion) {
    const result = await pb.llmDocuments.query(
        {
            queryText: userQuestion,
            limit: 5,
            where: { category: 'support' }
        },
        { collection: 'support-docs' }
    );

    if (result.results.length === 0) {
        return 'No relevant documentation found.';
    }

    // Format results for display
    let response = 'Here are the most relevant answers:\n\n';
    
    result.results.forEach((match, index) => {
        response += `${index + 1}. (Similarity: ${(match.similarity * 100).toFixed(1)}%)\n`;
        response += `${match.content}\n\n`;
    });

    return response;
}

// Usage
const answer = await searchSupportDocs('How do I reset my password?');
console.log(answer);

Example 3: RAG Application

async function retrieveContextForLLM(userQuery) {
    // Query relevant documents
    const searchResult = await pb.llmDocuments.query(
        {
            queryText: userQuery,
            limit: 3,
            where: { verified: 'true' }
        },
        { collection: 'knowledge-base' }
    );

    // Extract content from results
    const context = searchResult.results
        .map(result => result.content)
        .join('\n\n');

    // Use this context with your LLM
    const prompt = `Based on the following context, answer the user's question.

Context:
${context}

Question: ${userQuery}

Answer:`;

    // Send to your LLM API (OpenAI, Anthropic, etc.)
    // const llmResponse = await callLLM(prompt);
    
    return { context, results: searchResult.results };
}

// Usage
const { context, results } = await retrieveContextForLLM(
    'What is the company policy on remote work?'
);
console.log('Retrieved context:', context);

Example 4: Document Management System

class DocumentManager {
    constructor(pb, collectionName) {
        this.pb = pb;
        this.collection = collectionName;
    }

    async addDocument(content, metadata = {}) {
        return await this.pb.llmDocuments.insert(
            { content, metadata },
            { collection: this.collection }
        );
    }

    async search(query, limit = 5, filters = {}) {
        return await this.pb.llmDocuments.query(
            {
                queryText: query,
                limit,
                where: filters
            },
            { collection: this.collection }
        );
    }

    async getAllDocuments(page = 1, perPage = 50) {
        return await this.pb.llmDocuments.list({
            collection: this.collection,
            page,
            perPage
        });
    }

    async updateDocument(id, updates) {
        return await this.pb.llmDocuments.update(
            id,
            updates,
            { collection: this.collection }
        );
    }

    async deleteDocument(id) {
        return await this.pb.llmDocuments.delete(id, {
            collection: this.collection
        });
    }

    async getDocument(id) {
        return await this.pb.llmDocuments.get(id, {
            collection: this.collection
        });
    }
}

// Usage
const manager = new DocumentManager(pb, 'knowledge-base');

// Add a document
const doc = await manager.addDocument(
    'The mitochondria is the powerhouse of the cell.',
    { topic: 'biology', level: 'high-school' }
);

// Search
const results = await manager.search('What is the function of mitochondria?');

// Update
await manager.updateDocument(doc.id, {
    metadata: { topic: 'biology', level: 'high-school', reviewed: 'true' }
});

// List all
const allDocs = await manager.getAllDocuments();

Best Practices

1. Organize with Collections

Use collections to logically separate different types of documents:

// Good: Separate collections for different domains
await pb.llmDocuments.createCollection('product-docs', { domain: 'products' });
await pb.llmDocuments.createCollection('api-docs', { domain: 'api' });
await pb.llmDocuments.createCollection('troubleshooting', { domain: 'support' });

2. Use Meaningful Metadata

Add structured metadata to enable filtering:

await pb.llmDocuments.insert(
    {
        content: '...',
        metadata: {
            category: 'tutorial',
            difficulty: 'beginner',
            language: 'javascript',
            lastUpdated: '2024-01-15',
            author: 'team-docs'
        }
    },
    { collection: 'knowledge-base' }
);

3. Chunk Large Documents

For better search results, break large documents into smaller chunks:

function chunkText(text, chunkSize = 500, overlap = 50) {
    const chunks = [];
    for (let i = 0; i < text.length; i += chunkSize - overlap) {
        chunks.push(text.slice(i, i + chunkSize));
    }
    return chunks;
}

const longDocument = '...'; // Your long text
const chunks = chunkText(longDocument);

for (let i = 0; i < chunks.length; i++) {
    await pb.llmDocuments.insert(
        {
            content: chunks[i],
            metadata: {
                documentId: 'doc-123',
                chunkIndex: i,
                totalChunks: chunks.length
            }
        },
        { collection: 'knowledge-base' }
    );
}

4. Handle Errors Gracefully

Always wrap SDK calls in try-catch blocks:

async function safeInsert(content, metadata, collection) {
    try {
        return await pb.llmDocuments.insert(
            { content, metadata },
            { collection }
        );
    } catch (error) {
        console.error('Failed to insert document:', error);
        throw error;
    }
}

5. Optimize Query Limits

Adjust the limit based on your use case:

// For RAG: 3-5 documents usually sufficient
const ragResults = await pb.llmDocuments.query(
    { queryText: '...', limit: 3 },
    { collection: 'knowledge-base' }
);

// For search results page: 10-20 documents
const searchResults = await pb.llmDocuments.query(
    { queryText: '...', limit: 15 },
    { collection: 'knowledge-base' }
);

6. Monitor Similarity Scores

Use similarity scores to filter low-quality matches:

const result = await pb.llmDocuments.query(
    { queryText: '...', limit: 10 },
    { collection: 'knowledge-base' }
);

// Filter results by similarity threshold
const relevantResults = result.results.filter(
    match => match.similarity > 0.7
);

if (relevantResults.length === 0) {
    console.log('No highly relevant results found');
}

7. Batch Operations

When inserting many documents, consider batching to avoid overwhelming the server:

async function batchInsert(documents, collection, batchSize = 10) {
    for (let i = 0; i < documents.length; i += batchSize) {
        const batch = documents.slice(i, i + batchSize);
        
        await Promise.all(
            batch.map(doc => 
                pb.llmDocuments.insert(doc, { collection })
            )
        );
        
        console.log(`Inserted batch ${Math.floor(i / batchSize) + 1}`);
    }
}

Summary

The Bosbase JS SDK provides a powerful and easy-to-use interface for working with LLM documents:

  • Collections organize your documents into logical groups
  • Insert documents with automatic embedding generation
  • Query documents using semantic similarity search
  • Manage documents with update, list, and delete operations

This enables you to build RAG applications, semantic search systems, and knowledge bases with minimal code. The vector store handles all the complexity of embeddings and similarity calculations, so you can focus on your application logic.

For more information, refer to the Bosbase documentation and the LLM Documents API documentation.