Data - Building Blocks¶
Welcome to the Data Building Blocks documentation. This collection provides ready-to-use accelerators that make it easier to operationalize data for AI/GenAI use cases.
Overview¶
This framework provides ready-to-use accelerators that address critical capabilities required to manage, process, and secure data for AI-driven applications. These accelerators are designed to integrate seamlessly with existing enterprise systems, reducing time-to-value for AI projects.
GitHub Repository
The complete source code and examples are available in the GitHub repository:
Available Building Blocks¶
Question & Answer (Q&A)¶
Natural language interfaces to interact with data through RAG (Retrieval-Augmented Generation) and Text-to-SQL powered by IBM watsonx.
Key Components:
- RAG Accelerator: Complete RAG pipeline with document processing, embedding, and semantic search
- Text-to-SQL: Converts natural language questions into executable SQL queries with metadata enrichment
Zero-Copy Lakehouse¶
Enables seamless querying across databases, warehouses, and cloud object stores without data duplication.
Key Benefits:
- Reduces costs and latency by eliminating data movement
- Built on open table formats (Iceberg/Delta)
- Provides federated query capability
Data Ingestion¶
Comprehensive data ingestion solutions for IBM watsonx.data covering unstructured and structured data sources.
Supported Data Types:
- Unstructured Data Ingestion: Process documents, PDFs, images, and unstructured content
- Structured Data Ingestion: Database connectors with CDC support for RDBMS platforms
Vector Search¶
Provides a vector-based retrieval service for GenAI pipelines with semantic similarity search capabilities.
Supported Databases:
- Milvus: High-performance vector database optimized for billion-scale vector search (Available Now)
- OpenSearch: Enterprise search with hybrid vector and keyword search capabilities (Planned)
- DataStax Astra DB: Cloud-native vector database with global distribution (Planned)
Data Security and Encryption¶
Protects sensitive data through masking, encryption, and access controls.
Key Features:
- Data privacy and encryption with watsonx.data Intelligence
- Project & Catalog automation
- Data protection and masking workflows
- Guardium integration (coming soon)
Getting Started¶
Quick Start Guide
Follow these steps to get started with any building block:
-
Clone the repository:
git clone https://github.com/ibm-self-serve-assets/building-blocks.git cd building-blocks/data-for-ai -
Navigate to the specific building block directory
-
Follow the README instructions for setup and configuration
Key Benefits¶
Why Use Data Building Blocks?
- Cost Savings: Eliminate redundant storage and data movement
- Faster Insights: Reduce ETL delays and processing time
- Single Source of Truth: Maintain data consistency across systems
- Enhanced Security: Protect sensitive data with governance controls
- Scalability: Optimized for enterprise AI workloads
Contributing¶
We welcome contributions! Please fork the repository, create a feature branch, and open a pull request with your changes.
Contribution Guidelines
- Follow existing code style and documentation patterns
- Include tests for new features
- Update documentation as needed
- Ensure all tests pass before submitting
License¶
This project is licensed under the Apache 2.0 License.