Introduction
DeepSpecs is a question-answering system designed specifically for navigating complex technical specifications in 5G and telecommunications standards. By combining advanced retrieval techniques with large language models, DeepSpecs enables users to query technical documents and receive contextually relevant answers. The system features a multi-database architecture with specialized chunking strategies and a multi-stage retrieval pipeline that enable it to mirror how domain experts reason about specification documents.
Read the arXiv paper (arXiv:2511.01305)
Try the Demo
You can try out the live demo of DeepSpecs using the link below. Please note this demo supports only single-turn Q&A (not conversational mode).
Launch Demo
System Architecture
Multi-Stage Retrieval Pipeline

Database Chunking and Population
Simple Q&A Interface with Viewable Context- Interface to type questions and receive responses from the DeepSpecs system
- Toggleable RAG mode allowing users to disable DeepSpecs and compare responses from raw GPT
- Interactive windows to view retrieved context and understand the system's reasoning process
Multi-Database System with Specialized Chunking- SpecDB: Contains technical specification information with novel chunking techniques that preserve hierarchical section information and spec characteristics
- ChangeDB: Stores diffs between adjacent versions of the same specification
- TDocDB: Houses change request information parsed through an LLM-powered system that distills why changes were made, which specs and clauses were affected, and what the changes entail
- Custom DBClient for easy interfacing with ChromaDB collections, featuring metadata searching, load-sensitive document loading, and database management tools
Enhanced Multi-Stage Retrieval Process- HyDE-Inspired Question Rewriting: Questions are rewritten to create "hypothetical documents" that improve retrieval accuracy1
- Cross Reference Resolution (CRR): Performs initial semantic retrieval to get the top k1 documents from the specDB. Then, it extracts references to other clauses/specs from them for use in building section-aware and spec-aware metadata filters. These are used in service of secondary metadata + semantic searches to retrieve the top k2 documents, better matching human intuition for resolving cross-references
- Specification Evolution Reasoning (SER): Analyzes questions for version-related queries, searches ChangeDB for relevant diffs, and uses these to generate metadata filters for TDocDB searches to retrieve k3 documents about feature evolution
- Configurable Depth and Rollover: Users can specify the number of successive rounds of secondary retrieval with automatic deficit rollover
Extensible and Versatile Tools- ReferenceExtractor: Rule-based extraction of references to other clauses and specifications
- ChangeDiffer: Compares adjacent specification versions and extracts changes for ChromaDB collection entries
Key Features
- Multi-database architecture tailored for technical specification documents
- Cross-reference resolution that follows citation chains
- Version evolution tracking to answer "why" and "how" questions about spec changes
- Configurable retrieval depth for user control over breadth vs. depth
- Transparent context display for human review
Next Release (December 12, 2025)
The upcoming release will include several enhancements and optimizations:
Better User Control- Toggleable HyDE document rewriting: users can choose between raw query or hypothetical document for retrieval
DB Population Improvements- Optimizations to handle edge cases of chunks with very large embeddings that cannot be resolved by splitting
Better External Reference Resolution- Expanded ruleset for ReferenceExtractor to capture more edge cases for external specification references
Conversational QA- Replace single-turn QA with multi-turn conversational mode that preserves chat context and supports follow-up questions
Refactoring- Migration from langchain's RecursiveCharacterTextSplitter to custom implementation
Usage Scenarios
- Technical specification query and analysis for 3GPP and telecom standards
- Understanding feature evolution across specification versions
- Resolving complex cross-references between related specification documents
- Research and education in 5G protocol development
Team Members
Code Release
Download (Beta)BibTeX
@article{manvattira2025deepspecs,
title={DeepSpecs: Expert-Level Questions Answering in 5G},
author={Manvattira, Aman Ganapathy and Xu, Yifei and Dang, Ziyue and Lu, Songwu},
journal={arXiv preprint arXiv:2511.01305},
year={2025}
}
References
1 Gao, Luyu, Xueguang Ma, Jimmy Lin, and Jamie Callan. "Precise zero-shot dense retrieval without relevance labels." In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1762-1777. 2023.
This site and software are provided as a beta for evaluation purposes.