# TIA Semantic Search Guide for Legal Operations
**Updated:** 2025-06-18  
**Purpose:** Comprehensive guide to semantic search and embedding management for legal materials

---

## 🧠 **OVERVIEW**

TIA's semantic search system provides AI-powered document discovery across all legal materials using OpenAI's text-embedding-3-small model (1536 dimensions). This system allows you to find relevant content by meaning, not just keywords.

**Current Index Status:** 1,043+ documents indexed with OpenAI 1536 embeddings

---

## 🎯 **QUICK START**

### **Basic Search**
```bash
# Search all legal materials
tia semantic search "constitutional rights violation"

# Get more results
tia semantic search "parental alienation evidence" --top-k 10

# Lower similarity threshold for broader results
tia semantic search "judge hawk strategy" --min-similarity 0.4
```

### **Index New Materials**
```bash
# Index entire Legal directory (incremental - only new files)
tia semantic index --root ~/Legal --collection legal-materials

# Index specific subdirectory
tia semantic index --root ~/Legal/CURRENT_CASE/new_folder --collection legal-materials

# Test indexing with dry run (shows what would be indexed)
tia semantic index --root ~/Legal --collection legal-materials --limit 5 --verbose
```

---

## 📊 **SYSTEM STATUS & INFO**

### **Check Index Information**
```bash
# View index statistics and status
tia semantic info

# Check what's currently indexed
tia search sessions "legal" --limit 5
```

### **Current Configuration**
- **Provider:** OpenAI (default)
- **Model:** text-embedding-3-small  
- **Dimensions:** 1536
- **Database:** `/home/scottsen/.local/share/tia/semantic/vector_store.db`
- **Collections:** legal-materials, legal-arguments, tia-docs, etc.

---

## 🔍 **SEARCH STRATEGIES**

### **By Content Type**
```bash
# Find case strategies
tia semantic search "case strategy timeline evidence"

# Locate court filings
tia semantic search "motion contempt filing procedures"

# Find attorney communications
tia semantic search "attorney handoff materials"

# Research judge backgrounds
tia semantic search "judge hawk civil rights ACLU"
```

### **By Legal Concepts**
```bash
# Constitutional law
tia semantic search "due process violations parental rights"

# Evidence and procedures
tia semantic search "smoking gun evidence sanctions"

# Case precedents
tia semantic search "2020 case precedent Ponomarchuk findings"

# Professional violations
tia semantic search "therapist violations licensing board"
```

### **Advanced Search Options**
```bash
# Higher similarity threshold (more precise)
tia semantic search "emergency motion template" --min-similarity 0.7

# JSON output for processing
tia semantic search "evidence hierarchy" --output json

# Verbose output with debugging
tia semantic search "hearing strategy" --verbose

# Markdown formatted results
tia semantic search "legal timeline" --output markdown
```

---

## 📁 **INDEXING WORKFLOWS**

### **Adding New Materials**

**1. Individual Documents**
```bash
# Index new case files
tia semantic index --root ~/Legal/CURRENT_CASE/new_materials --collection legal-materials
```

**2. Bulk Directory Updates**
```bash
# Full Legal directory reindex (incremental)
tia semantic index --root ~/Legal --collection legal-materials

# Exclude specific directories
tia semantic index --root ~/Legal --collection legal-materials --exclude embeddings_openai_1536 99_ARCHIVE
```

**3. New Case Projects**
```bash
# Create dedicated collection for new case
tia semantic index --root ~/Legal/NEW_CASE --collection new-case-materials
```

### **Maintenance & Updates**

**Weekly Reindexing (Recommended)**
```bash
# Update with recent changes
tia semantic index --root ~/Legal --collection legal-materials --verbose
```

**After Major Case Updates**
```bash
# Full reindex with progress tracking
tia semantic index --root ~/Legal --collection legal-materials --limit 0 --verbose
```

**Quality Check**
```bash
# Test search quality after indexing
tia semantic index --root ~/Legal --collection legal-materials --test-query "constitutional violations"
```

---

## 🎯 **COLLECTION MANAGEMENT**

### **Current Collections**
- **`legal-materials`** - Main legal document collection
- **`legal-arguments`** - Attorney argument frameworks  
- **`tia-docs`** - TIA system documentation
- **`parenting_plans`** - Court orders and parenting plans

### **Best Practices**
```bash
# Use consistent collection names
tia semantic index --collection legal-materials      # For case materials
tia semantic index --collection legal-research       # For research documents  
tia semantic index --collection legal-templates      # For templates and forms
```

---

## ⚙️ **ADVANCED FEATURES**

### **Provider Options**
```bash
# Default: OpenAI (high quality, 1536 dimensions)
tia semantic search "query" --provider openai

# Alternative: FastEmbed (free, local, 384 dimensions)  
tia semantic search "query" --provider fastembed

# Auto-select best available
tia semantic search "query" --provider auto
```

### **Model Selection**
```bash
# Use specific OpenAI model
tia semantic index --model text-embedding-3-small --root ~/Legal

# Use different model for experimentation
tia semantic index --model text-embedding-3-large --root ~/Legal/test
```

### **Database Management**
```bash
# Use custom database location
tia semantic index --db ~/Legal/custom_embeddings.db --root ~/Legal

# Backup current database
cp ~/.local/share/tia/semantic/vector_store.db ~/Legal/backup_embeddings_$(date +%Y%m%d).db
```

---

## 🚨 **TROUBLESHOOTING**

### **Common Issues**

**No Search Results**
```bash
# Check if documents are indexed
tia semantic info

# Try broader search
tia semantic search "your query" --min-similarity 0.3

# Reindex if needed
tia semantic index --root ~/Legal --collection legal-materials
```

**Slow Search Performance**
```bash
# Check database size
tia semantic info

# Use more specific queries
tia semantic search "specific terms" --top-k 5
```

**Indexing Errors**
```bash
# Check with verbose output
tia semantic index --root ~/Legal --verbose

# Test with limited files
tia semantic index --root ~/Legal --limit 5 --verbose
```

### **Performance Optimization**

**Cost Management**
- OpenAI embeddings cost ~$0.00002 per 1K tokens
- Average document: ~500 tokens = ~$0.01 per document
- Reindexing only processes new/changed files

**Search Quality**
- Higher similarity thresholds (0.7+) = more precise results
- Lower similarity thresholds (0.3-0.5) = broader results
- Top-k 5-10 usually sufficient for most queries

---

## 📚 **EXAMPLE WORKFLOWS**

### **Daily Legal Research**
```bash
# Morning case review
tia semantic search "latest evidence timeline updates"

# Find relevant precedents
tia semantic search "similar case sanctions contempt"

# Locate specific procedures
tia semantic search "king county filing requirements"
```

### **Attorney Preparation**
```bash
# Review argument frameworks
tia semantic search "opening closing argument strategy"

# Find supporting evidence
tia semantic search "therapist documentation child distress"

# Check judge background
tia semantic search "judge hawk preferences civil rights"
```

### **Case Documentation**
```bash
# Index new materials after session
tia semantic index --root ~/Legal/CURRENT_CASE --collection legal-materials

# Verify indexing quality
tia semantic search "newly added content" --verbose

# Test search functionality
tia semantic search "recent case developments"
```

---

## 🔗 **INTEGRATION WITH TIA ECOSYSTEM**

### **Related TIA Commands**
```bash
# Session-based search
tia search sessions "legal" --limit 10

# File-based search  
tia search files "Goodnight" --limit 5

# Content search
tia search content "constitutional" --limit 5
```

### **Boot Integration**
```bash
# Check semantic status during boot
tia-boot  # Includes semantic system status

# Quick search from anywhere
tia semantic search "quick query"
```

---

## 📋 **MAINTENANCE SCHEDULE**

### **Weekly Tasks**
- [ ] Reindex new materials: `tia semantic index --root ~/Legal --collection legal-materials`
- [ ] Quality check: `tia semantic search "recent work" --top-k 5`
- [ ] Status review: `tia semantic info`

### **Monthly Tasks**  
- [ ] Database backup: `cp ~/.local/share/tia/semantic/vector_store.db ~/Legal/backup/`
- [ ] Performance review: Check search quality and indexing speed
- [ ] Collection cleanup: Remove obsolete collections if needed

### **After Major Case Updates**
- [ ] Full reindex: `tia semantic index --root ~/Legal --collection legal-materials --verbose`
- [ ] Test search quality: `tia semantic search "case key concepts"`
- [ ] Update documentation: Record new search patterns and useful queries

---

## 💡 **PRO TIPS**

### **Search Optimization**
- Use natural language: "find constitutional rights violations" works better than "constitutional rights"
- Combine concepts: "judge hawk civil rights strategy" finds targeted results
- Test similarity thresholds: 0.5 (balanced), 0.7 (precise), 0.3 (broad)

### **Indexing Efficiency**  
- Index incrementally: Only new files are processed automatically
- Use collections: Organize by case, document type, or topic
- Monitor costs: Check token usage for large indexing operations

### **Quality Assurance**
- Test searches after indexing: Verify new content is discoverable
- Use verbose mode: Debug indexing and search issues
- Regular maintenance: Weekly reindexing keeps content current

---

**🎯 Remember:** TIA semantic search transforms legal research from keyword hunting to intelligent content discovery. The system learns from usage patterns and improves search quality over time.