Milvus Vector Search¶
High-performance vector database optimized for billion-scale vector search with IBM watsonx integration.
Overview¶
Milvus is an open-source vector database built for AI applications, offering high-performance similarity search and analytics for embedding vectors. This building block provides a complete FastAPI service for ingesting documents from IBM Cloud Object Storage (COS) into Milvus with Docling-based parsing and IBM Watsonx embeddings.
IBM Products Used¶
This building block leverages the following IBM products and services:
- watsonx.data: Data lakehouse platform with integrated Milvus vector database support
- watsonx.ai: Foundation models and embedding services for document vectorization
- IBM Cloud Object Storage (COS): Scalable object storage for document repositories
- Milvus: Open-source vector database for semantic search (integrated with watsonx.data)
Features¶
Data Ingestion Service¶
- FastAPI-based REST API for document ingestion
- Docling-based document parsing and processing
- IBM Watsonx embedding generation
- Automatic vector storage and indexing in Milvus
- Interactive Swagger UI for API testing
Document Processing¶
- Support for multiple document formats (PDF, HTML, JSON, Markdown)
- Intelligent chunking strategies:
- DOCLING_DOCS: Structure-aware chunking based on document layout
- MARKDOWN: Preserves markdown formatting during chunking
- RECURSIVE: Hierarchical text splitting
- Metadata extraction and preservation
Vector Operations¶
- Automatic collection creation and schema management
- Efficient vector upsert operations
- Configurable embedding dimensions
- Index optimization for fast similarity search
Architecture¶
IBM COS → FastAPI Service → Docling Parser → Watsonx Embeddings → Milvus DB
The service pulls documents from COS, processes them with Docling, generates embeddings using Watsonx, and stores the vectors in Milvus for semantic search.
Getting Started¶
Prerequisites¶
Requirements
- watsonx.data environment with Milvus database configured
- Setup Guide
- Python 3.13 installed locally
- git installed locally
- Milvus credentials (host, port, username, password)
- IBM COS credentials (API key, endpoint, service instance ID)
Installation¶
-
Clone the repository:
git clone https://github.com/ibm-self-serve-assets/building-blocks.git cd building-blocks/data-for-ai/vector-search/milvus/assets/data-ingestion-asset/ -
Create a Python virtual environment:
python3 -m venv virtual-env source virtual-env/bin/activate pip3 install -r requirements.txt -
Configure environment variables:
cp .env.example .env -
Update
.envwith your credentials:
Milvus Credentials:
- WXD_MILVUS_HOST: Milvus host URL from watsonx.data UI
- WXD_MILVUS_PORT: Milvus port from watsonx.data UI
- WXD_MILVUS_USER: Set to 'ibmlhapikey'
- WXD_MILVUS_PASSWORD: IBM Cloud API Key for Milvus service account
IBM COS Credentials:
- IBM_CLOUD_API_KEY: IBM Cloud API Key for COS access
- COS_ENDPOINT: Service endpoint URL for your COS instance
- COS_SERVICE_INSTANCE_ID: CRN value of COS instance
API Security:
- REST_API_KEY: Set a unique value for API authentication
Starting the Application¶
Start the application locally:
python3 main.py
Or using Uvicorn:
uvicorn app.main:app --host 127.0.0.1 --port 4050 --reload
Access Swagger UI at: http://127.0.0.1:4050/docs
API Usage¶
Ingestion Endpoint¶
Endpoint: POST /ingest-files
Request Body:
{
"bucket_name": "<cos-bucket>",
"collection_name": "<milvus-collection>",
"chunk_type": "DOCLING_DOCS"
}
Parameters:
bucket_name: Name of the S3/COS bucket containing documentscollection_name: Target Milvus collection to create or upsert intochunk_type: Chunking strategy (DOCLING_DOCS, MARKDOWN, RECURSIVE)
Headers:
REST_API_KEY: <your-secret>
Content-Type: application/json
Example using Python:
import json, requests
url = "http://127.0.0.1:4050/ingest-files"
payload = json.dumps({
"bucket_name": "<cos-bucket>",
"collection_name": "<milvus-collection>",
"chunk_type": "DOCLING_DOCS"
})
headers = {
"REST_API_KEY": "<your-secret>",
"Content-Type": "application/json"
}
response = requests.post(url, headers=headers, data=payload)
print(response.text)
Testing via Swagger UI¶
- Navigate to
http://127.0.0.1:4050/docs - Expand POST /ingest-files
- Click
Try it out - Fill in bucket_name, collection_name, and chunk_type
- Click
Execute - Verify the 200 response and review ingestion statistics
Use Cases¶
- Semantic Search: Find documents based on meaning, not just keywords
- RAG Pipelines: Retrieval-augmented generation for LLMs
- Knowledge Bases: Build searchable knowledge repositories
- Document Discovery: Find similar documents across large collections
- Question Answering: Retrieve relevant context for Q&A systems
- Content Recommendation: Suggest similar content based on embeddings
Chunking Strategies¶
DOCLING_DOCS¶
- Structure-aware chunking based on document layout
- Preserves document hierarchy (headings, sections, paragraphs)
- Optimal for well-structured documents
- Best for maintaining context across document sections
MARKDOWN¶
- Preserves markdown formatting during chunking
- Respects markdown structure (headers, lists, code blocks)
- Ideal for markdown-formatted documentation
- Maintains formatting for better readability
RECURSIVE¶
- Hierarchical text splitting with configurable chunk size
- Splits on multiple separators (paragraphs, sentences, words)
- Flexible for various document types
- Good for general-purpose chunking
Performance Considerations¶
Optimization Guidelines
- Batch Processing: Process multiple documents in parallel for faster ingestion
- Chunk Size: Balance between context preservation and retrieval precision
- Embedding Dimensions: Higher dimensions provide more accuracy but slower search
- Index Type: Choose appropriate index type (IVF_FLAT, HNSW) based on use case
- Collection Sharding: Distribute data across multiple shards for scalability
Coming Soon¶
Upcoming Features
- .png and .jpg VLM (Vision Language Model) support
- Additional Docling processing functions:
- Image annotation
- Table exports
- Enhanced error logging with structured logs
- Performance optimization for large-scale ingestion
Resources¶
Team¶
Created and Architected By: Anand Das, Anindya Neogi, Joseph Kim, Shivam Solanki
Support¶
For issues or questions, please refer to the GitHub repository or open an issue.