Bosbase JS SDK: LLM Documents Tutorial
This tutorial will guide you through using the Bosbase JavaScript SDK to create, manage, and query LLM documents. LLM documents are backed by an embedded chromem-go vector store, enabling semantic search and retrieval-augmented generation (RAG) capabilities.
Table of Contents
- Introduction
- Application Scenarios & Use Cases
- Installation
- Getting Started
- Creating Collections
- Creating Documents
- Querying Documents
- Managing Documents
- Complete Examples
- Best Practices
Introduction
LLM documents in Bosbase are designed for semantic search and RAG applications. Each document contains:
- Content: The text that will be embedded for semantic search
- Metadata: Optional key-value pairs for filtering and categorization
- Embedding: Vector representation (auto-generated if not provided)
The SDK provides a simple interface to:
- Create logical collections to organize documents
- Insert documents with automatic embedding generation
- Query documents using semantic similarity search
- Update, list, and delete documents
Application Scenarios & Use Cases
What Problems Does LLM Documents Solve?
LLM documents address critical challenges in modern applications that need to work with unstructured text data and provide intelligent search capabilities:
1. Traditional Keyword Search Limitations
Problem: Traditional keyword-based search fails when users ask questions in natural language or use different terminology than what's in the documents.
Solution: Semantic search understands the meaning behind queries, not just exact word matches. Users can ask "How do I reset my password?" and find documents about "password recovery" or "account access restoration" even if those exact words aren't in the query.
Example Scenario: A customer support knowledge base where users search for "I can't log in" and find relevant articles about authentication, password issues, and account recovery.
2. Building Intelligent Chatbots and AI Assistants
Problem: LLMs like GPT-4 have knowledge cutoffs and may not have access to your company-specific information, policies, or recent updates.
Solution: RAG (Retrieval-Augmented Generation) combines semantic search with LLMs. When a user asks a question, the system first retrieves relevant documents from your knowledge base, then uses that context to generate accurate, up-to-date answers.
Example Scenario: An internal company chatbot that answers questions about HR policies, product documentation, or technical procedures by retrieving the most relevant internal documents before generating a response.
3. Knowledge Management at Scale
Problem: As organizations grow, finding the right information across thousands of documents, wikis, and knowledge bases becomes increasingly difficult.
Solution: Semantic search enables employees to find information using natural language queries, regardless of where the information is stored or how it's worded.
Example Scenario: A large organization with multiple departments can create a unified search experience across product documentation, engineering wikis, sales materials, and customer support articles.
4. Personalized Content Discovery
Problem: Users struggle to find relevant content in large content libraries, leading to poor engagement and user experience.
Solution: Semantic search can understand user intent and surface relevant content even when users don't know exactly what they're looking for.
Example Scenario: An e-learning platform where students can search for "ways to improve my writing" and discover courses on grammar, creative writing, business communication, and academic writing, even if those courses don't contain those exact keywords.
Real-World Application Scenarios
Customer Support & Help Desks
Use Case: Build intelligent support systems that understand customer questions in natural language.
How It Works:
- Store all support articles, FAQs, and troubleshooting guides as LLM documents
- When customers ask questions, use semantic search to find the most relevant articles
- Provide instant, accurate answers without requiring customers to know exact keywords
Business Value:
- Reduces support ticket volume by 40-60%
- Improves customer satisfaction with faster, more accurate responses
- Enables 24/7 self-service support
Example:
// Customer asks: "My payment didn't go through"
const results = await pb.llmDocuments.query(
{
queryText: "My payment didn't go through",
limit: 3,
where: { category: 'billing' }
},
{ collection: 'support-docs' }
);
// Returns articles about: payment failures, declined cards, billing issuesInternal Knowledge Bases
Use Case: Create searchable company wikis and documentation systems.
How It Works:
- Store company policies, procedures, technical documentation, and best practices
- Employees can search using natural language: "What's the vacation policy?" or "How do I set up a new project?"
- Find information across departments and documentation types
Business Value:
- Reduces time spent searching for information
- Ensures employees find the most current and accurate information
- Improves onboarding for new employees
Example:
// Employee searches: "What's the process for expense reimbursement?"
const results = await pb.llmDocuments.query(
{
queryText: "What's the process for expense reimbursement?",
limit: 5,
where: { department: 'finance', type: 'policy' }
},
{ collection: 'company-wiki' }
);E-Commerce Product Search
Use Case: Enable customers to find products using natural language descriptions.
How It Works:
- Store product descriptions, features, and specifications as LLM documents
- Customers can search using intent: "comfortable running shoes for flat feet" instead of exact product names
- Surface products based on semantic similarity to customer needs
Business Value:
- Increases conversion rates by helping customers find what they need
- Reduces bounce rates from failed searches
- Improves customer experience with intuitive search
Example:
// Customer searches: "warm winter jacket for hiking"
const results = await pb.llmDocuments.query(
{
queryText: "warm winter jacket for hiking",
limit: 10,
where: { category: 'outerwear', activity: 'hiking' }
},
{ collection: 'product-catalog' }
);Legal & Compliance Document Search
Use Case: Help legal teams and compliance officers quickly find relevant documents, cases, or regulations.
How It Works:
- Store legal documents, case law, regulations, and compliance guidelines
- Search using natural language: "What are the data retention requirements for financial records?"
- Filter by jurisdiction, document type, or date
Business Value:
- Dramatically reduces research time
- Ensures comprehensive coverage of relevant documents
- Helps maintain compliance with changing regulations
Example:
// Legal team searches: "GDPR requirements for customer data deletion"
const results = await pb.llmDocuments.query(
{
queryText: "GDPR requirements for customer data deletion",
limit: 5,
where: { jurisdiction: 'EU', type: 'regulation' }
},
{ collection: 'legal-docs' }
);Content Recommendation Systems
Use Case: Recommend relevant articles, videos, or content based on user interests and behavior.
How It Works:
- Store content metadata and descriptions as LLM documents
- When users interact with content, use semantic search to find similar items
- Recommend content that's semantically similar, not just categorically similar
Business Value:
- Increases content engagement and time on site
- Improves user retention through better discovery
- Maximizes content value by surfacing relevant pieces
Example:
// User reads article about "JavaScript async patterns"
// System recommends similar content
const recommendations = await pb.llmDocuments.query(
{
queryText: "JavaScript async patterns promises callbacks",
limit: 5,
where: { type: 'article', status: 'published' }
},
{ collection: 'blog-content' }
);Research & Academic Applications
Use Case: Enable researchers to find relevant papers, studies, and resources using semantic search.
How It Works:
- Store research papers, abstracts, and study descriptions
- Researchers can search using research questions or concepts
- Find papers that are semantically related, even if they use different terminology
Business Value:
- Accelerates literature review processes
- Discovers connections between research areas
- Improves research quality through comprehensive discovery
Example:
// Researcher searches: "machine learning approaches to climate prediction"
const results = await pb.llmDocuments.query(
{
queryText: "machine learning approaches to climate prediction",
limit: 20,
where: { field: 'climate-science', method: 'machine-learning' }
},
{ collection: 'research-papers' }
);Developer Documentation & Code Search
Use Case: Help developers find relevant documentation, code examples, and API references.
How It Works:
- Store API documentation, code examples, tutorials, and guides
- Developers can search using questions: "How do I authenticate API requests?" or "Example of handling errors"
- Find documentation that explains concepts, not just exact API endpoints
Business Value:
- Reduces developer onboarding time
- Improves developer productivity
- Decreases support requests for common questions
Example:
// Developer searches: "How to handle authentication errors in the API"
const results = await pb.llmDocuments.query(
{
queryText: "How to handle authentication errors in the API",
limit: 5,
where: { type: 'api-docs', category: 'authentication' }
},
{ collection: 'developer-docs' }
);Key Benefits Summary
- Natural Language Understanding: Users don't need to know exact keywords or terminology
- Context-Aware Search: Finds relevant content based on meaning, not just word matching
- Scalable: Works efficiently with thousands or millions of documents
- Easy Integration: Simple JavaScript SDK makes it easy to add to any application
- Cost-Effective: Embedded vector store means no additional infrastructure costs
- Flexible Filtering: Combine semantic search with metadata filters for precise results
- RAG-Ready: Perfect foundation for building retrieval-augmented generation applications
When to Use LLM Documents
Use LLM Documents when you need to:
- ✅ Search through unstructured text content
- ✅ Build chatbots or AI assistants with domain knowledge
- ✅ Create intelligent search experiences
- ✅ Implement RAG (Retrieval-Augmented Generation) systems
- ✅ Find similar or related content
- ✅ Build recommendation systems
- ✅ Enable natural language queries
Consider alternatives when:
- ❌ You only need exact keyword matching (traditional search may be sufficient)
- ❌ Your content is highly structured (SQL queries might be better)
- ❌ You need real-time updates with sub-second latency (consider specialized vector databases)
- ❌ You're working with non-text data (images, audio, etc.)
Installation
Using npm
npm install bosbase --saveUsing ES modules
import BosBase from 'bosbase';Using CommonJS
const BosBase = require('bosbase/cjs');Getting Started
First, initialize a BosBase client instance:
import BosBase from 'bosbase';
// Initialize the client with your BosBase instance URL
const pb = new BosBase('http://localhost:8090');
// If you need authentication (for protected operations)
// await pb.collection('users').authWithPassword('[email protected]', 'password');Creating Collections
Collections are logical namespaces that organize your documents. You can create a collection with optional metadata:
// Create a collection for your knowledge base
await pb.llmDocuments.createCollection('knowledge-base', {
domain: 'internal',
description: 'Company knowledge base'
});
// Create another collection for customer support
await pb.llmDocuments.createCollection('support-docs', {
domain: 'customer-facing',
category: 'support'
});Listing Collections
To see all available collections:
const collections = await pb.llmDocuments.listCollections();
collections.forEach(collection => {
console.log(`Collection: ${collection.name}`);
console.log(` Count: ${collection.count} documents`);
console.log(` Metadata:`, collection.metadata);
});Deleting Collections
To remove a collection and all its documents:
await pb.llmDocuments.deleteCollection('knowledge-base');Creating Documents
Basic Document Creation
The simplest way to create a document is to provide content:
const doc = await pb.llmDocuments.insert(
{
content: 'Leaves are green because chlorophyll absorbs red and blue light.',
},
{ collection: 'knowledge-base' }
);
console.log(`Created document with ID: ${doc.id}`);Documents with Metadata
Add metadata to help filter and categorize documents:
const doc = await pb.llmDocuments.insert(
{
content: 'The sky is blue because of Rayleigh scattering.',
metadata: {
topic: 'physics',
difficulty: 'intermediate',
source: 'textbook'
}
},
{ collection: 'knowledge-base' }
);Documents with Custom IDs
You can specify a custom ID when creating a document:
await pb.llmDocuments.insert(
{
id: 'sky-blue-explanation',
content: 'The sky is blue because of Rayleigh scattering.',
metadata: { topic: 'physics' }
},
{ collection: 'knowledge-base' }
);Batch Document Creation
Create multiple documents efficiently:
const documents = [
{
content: 'Photosynthesis is the process by which plants convert light energy into chemical energy.',
metadata: { topic: 'biology', category: 'process' }
},
{
content: 'Water boils at 100 degrees Celsius at sea level.',
metadata: { topic: 'chemistry', category: 'fact' }
},
{
content: 'The speed of light in a vacuum is approximately 299,792,458 meters per second.',
metadata: { topic: 'physics', category: 'constant' }
}
];
// Insert documents one by one
for (const doc of documents) {
await pb.llmDocuments.insert(doc, { collection: 'knowledge-base' });
console.log(`Inserted: ${doc.content.substring(0, 50)}...`);
}Querying Documents
Basic Semantic Search
Query documents using natural language. The SDK will automatically convert your query text into an embedding and find similar documents:
const result = await pb.llmDocuments.query(
{
queryText: 'Why is the sky blue?',
limit: 5
},
{ collection: 'knowledge-base' }
);
result.results.forEach(match => {
console.log(`ID: ${match.id}`);
console.log(`Similarity: ${match.similarity}`);
console.log(`Content: ${match.content}`);
console.log(`Metadata:`, match.metadata);
console.log('---');
});Filtered Queries
Filter results by metadata:
const result = await pb.llmDocuments.query(
{
queryText: 'Why is the sky blue?',
limit: 3,
where: { topic: 'physics' }
},
{ collection: 'knowledge-base' }
);Multiple Metadata Filters
You can filter by multiple metadata fields:
const result = await pb.llmDocuments.query(
{
queryText: 'How do plants make energy?',
limit: 5,
where: {
topic: 'biology',
difficulty: 'beginner'
}
},
{ collection: 'knowledge-base' }
);Advanced Query Options
For more control, you can use query embeddings directly or add negative examples:
const result = await pb.llmDocuments.query(
{
queryText: 'What causes colors in nature?',
limit: 10,
where: { topic: 'biology' },
negative: {
text: 'artificial colors',
mode: 'filter',
filterThreshold: 0.5
}
},
{ collection: 'knowledge-base' }
);Managing Documents
Getting a Single Document
Retrieve a document by its ID:
const doc = await pb.llmDocuments.get('sky-blue-explanation', {
collection: 'knowledge-base'
});
console.log('Document:', doc);Listing Documents with Pagination
List all documents in a collection with pagination:
// Get first page (25 items per page by default)
const page = await pb.llmDocuments.list({
collection: 'knowledge-base',
page: 1,
perPage: 25
});
console.log(`Total items: ${page.totalItems}`);
console.log(`Page ${page.page} of ${Math.ceil(page.totalItems / page.perPage)}`);
page.items.forEach(doc => {
console.log(`- ${doc.id}: ${doc.content.substring(0, 50)}...`);
});
// Get next page
if (page.page * page.perPage < page.totalItems) {
const nextPage = await pb.llmDocuments.list({
collection: 'knowledge-base',
page: page.page + 1,
perPage: 25
});
}Updating Documents
Update document content or metadata:
// Update metadata only
await pb.llmDocuments.update(
'sky-blue-explanation',
{
metadata: {
topic: 'physics',
reviewed: 'true',
lastUpdated: new Date().toISOString()
}
},
{ collection: 'knowledge-base' }
);
// Update content and metadata
await pb.llmDocuments.update(
'sky-blue-explanation',
{
content: 'The sky appears blue due to Rayleigh scattering, where shorter wavelengths of light are scattered more than longer wavelengths.',
metadata: {
topic: 'physics',
accuracy: 'high',
reviewed: 'true'
}
},
{ collection: 'knowledge-base' }
);Deleting Documents
Remove a document from a collection:
await pb.llmDocuments.delete('sky-blue-explanation', {
collection: 'knowledge-base'
});Complete Examples
Example 1: Building a Knowledge Base
import BosBase from 'bosbase';
const pb = new BosBase('http://localhost:8090');
async function buildKnowledgeBase() {
// Create collection
await pb.llmDocuments.createCollection('company-kb', {
domain: 'internal',
version: '1.0'
});
// Add company policies
const policies = [
{
content: 'All employees must complete security training annually. Training covers phishing, password management, and data handling procedures.',
metadata: { category: 'policy', type: 'security', department: 'all' }
},
{
content: 'Remote work is allowed up to 3 days per week. Employees must coordinate with their managers and ensure adequate home office setup.',
metadata: { category: 'policy', type: 'workplace', department: 'all' }
},
{
content: 'Expense reports must be submitted within 30 days. All receipts must be itemized and include business purpose.',
metadata: { category: 'policy', type: 'finance', department: 'all' }
}
];
for (const policy of policies) {
await pb.llmDocuments.insert(policy, { collection: 'company-kb' });
}
console.log('Knowledge base created successfully!');
}
buildKnowledgeBase().catch(console.error);Example 2: Semantic Search for Support
async function searchSupportDocs(userQuestion) {
const result = await pb.llmDocuments.query(
{
queryText: userQuestion,
limit: 5,
where: { category: 'support' }
},
{ collection: 'support-docs' }
);
if (result.results.length === 0) {
return 'No relevant documentation found.';
}
// Format results for display
let response = 'Here are the most relevant answers:\n\n';
result.results.forEach((match, index) => {
response += `${index + 1}. (Similarity: ${(match.similarity * 100).toFixed(1)}%)\n`;
response += `${match.content}\n\n`;
});
return response;
}
// Usage
const answer = await searchSupportDocs('How do I reset my password?');
console.log(answer);Example 3: RAG Application
async function retrieveContextForLLM(userQuery) {
// Query relevant documents
const searchResult = await pb.llmDocuments.query(
{
queryText: userQuery,
limit: 3,
where: { verified: 'true' }
},
{ collection: 'knowledge-base' }
);
// Extract content from results
const context = searchResult.results
.map(result => result.content)
.join('\n\n');
// Use this context with your LLM
const prompt = `Based on the following context, answer the user's question.
Context:
${context}
Question: ${userQuery}
Answer:`;
// Send to your LLM API (OpenAI, Anthropic, etc.)
// const llmResponse = await callLLM(prompt);
return { context, results: searchResult.results };
}
// Usage
const { context, results } = await retrieveContextForLLM(
'What is the company policy on remote work?'
);
console.log('Retrieved context:', context);Example 4: Document Management System
class DocumentManager {
constructor(pb, collectionName) {
this.pb = pb;
this.collection = collectionName;
}
async addDocument(content, metadata = {}) {
return await this.pb.llmDocuments.insert(
{ content, metadata },
{ collection: this.collection }
);
}
async search(query, limit = 5, filters = {}) {
return await this.pb.llmDocuments.query(
{
queryText: query,
limit,
where: filters
},
{ collection: this.collection }
);
}
async getAllDocuments(page = 1, perPage = 50) {
return await this.pb.llmDocuments.list({
collection: this.collection,
page,
perPage
});
}
async updateDocument(id, updates) {
return await this.pb.llmDocuments.update(
id,
updates,
{ collection: this.collection }
);
}
async deleteDocument(id) {
return await this.pb.llmDocuments.delete(id, {
collection: this.collection
});
}
async getDocument(id) {
return await this.pb.llmDocuments.get(id, {
collection: this.collection
});
}
}
// Usage
const manager = new DocumentManager(pb, 'knowledge-base');
// Add a document
const doc = await manager.addDocument(
'The mitochondria is the powerhouse of the cell.',
{ topic: 'biology', level: 'high-school' }
);
// Search
const results = await manager.search('What is the function of mitochondria?');
// Update
await manager.updateDocument(doc.id, {
metadata: { topic: 'biology', level: 'high-school', reviewed: 'true' }
});
// List all
const allDocs = await manager.getAllDocuments();Best Practices
1. Organize with Collections
Use collections to logically separate different types of documents:
// Good: Separate collections for different domains
await pb.llmDocuments.createCollection('product-docs', { domain: 'products' });
await pb.llmDocuments.createCollection('api-docs', { domain: 'api' });
await pb.llmDocuments.createCollection('troubleshooting', { domain: 'support' });2. Use Meaningful Metadata
Add structured metadata to enable filtering:
await pb.llmDocuments.insert(
{
content: '...',
metadata: {
category: 'tutorial',
difficulty: 'beginner',
language: 'javascript',
lastUpdated: '2024-01-15',
author: 'team-docs'
}
},
{ collection: 'knowledge-base' }
);3. Chunk Large Documents
For better search results, break large documents into smaller chunks:
function chunkText(text, chunkSize = 500, overlap = 50) {
const chunks = [];
for (let i = 0; i < text.length; i += chunkSize - overlap) {
chunks.push(text.slice(i, i + chunkSize));
}
return chunks;
}
const longDocument = '...'; // Your long text
const chunks = chunkText(longDocument);
for (let i = 0; i < chunks.length; i++) {
await pb.llmDocuments.insert(
{
content: chunks[i],
metadata: {
documentId: 'doc-123',
chunkIndex: i,
totalChunks: chunks.length
}
},
{ collection: 'knowledge-base' }
);
}4. Handle Errors Gracefully
Always wrap SDK calls in try-catch blocks:
async function safeInsert(content, metadata, collection) {
try {
return await pb.llmDocuments.insert(
{ content, metadata },
{ collection }
);
} catch (error) {
console.error('Failed to insert document:', error);
throw error;
}
}5. Optimize Query Limits
Adjust the limit based on your use case:
// For RAG: 3-5 documents usually sufficient
const ragResults = await pb.llmDocuments.query(
{ queryText: '...', limit: 3 },
{ collection: 'knowledge-base' }
);
// For search results page: 10-20 documents
const searchResults = await pb.llmDocuments.query(
{ queryText: '...', limit: 15 },
{ collection: 'knowledge-base' }
);6. Monitor Similarity Scores
Use similarity scores to filter low-quality matches:
const result = await pb.llmDocuments.query(
{ queryText: '...', limit: 10 },
{ collection: 'knowledge-base' }
);
// Filter results by similarity threshold
const relevantResults = result.results.filter(
match => match.similarity > 0.7
);
if (relevantResults.length === 0) {
console.log('No highly relevant results found');
}7. Batch Operations
When inserting many documents, consider batching to avoid overwhelming the server:
async function batchInsert(documents, collection, batchSize = 10) {
for (let i = 0; i < documents.length; i += batchSize) {
const batch = documents.slice(i, i + batchSize);
await Promise.all(
batch.map(doc =>
pb.llmDocuments.insert(doc, { collection })
)
);
console.log(`Inserted batch ${Math.floor(i / batchSize) + 1}`);
}
}Summary
The Bosbase JS SDK provides a powerful and easy-to-use interface for working with LLM documents:
- Collections organize your documents into logical groups
- Insert documents with automatic embedding generation
- Query documents using semantic similarity search
- Manage documents with update, list, and delete operations
This enables you to build RAG applications, semantic search systems, and knowledge bases with minimal code. The vector store handles all the complexity of embeddings and similarity calculations, so you can focus on your application logic.
For more information, refer to the Bosbase documentation and the LLM Documents API documentation.