# Markdown to PDF Conversion Guide for Legal Documents

**Last Updated:** August 17, 2025  
**Version:** 2.0  
**Status:** Production Ready  

## 🎯 Quick Start

### Single Command to Convert All Filing Documents
```bash
cd ~/Legal
./md_to_pdf_legal_universal.sh -v
```

### Custom Conversion
```bash
# Convert specific directory
./md_to_pdf_legal_universal.sh /path/to/markdown/files CUSTOM_OUTPUT

# Convert single file
./md_to_pdf_legal_universal.sh -s /path/to/file.md

# Debug mode (create clean markdown only)
./md_to_pdf_legal_universal.sh -c
```

## 📋 Script Location and Purpose

### Primary Script
**File:** `/home/scottsen/Legal/md_to_pdf_legal_universal.sh`  
**Purpose:** Universal converter for legal markdown files to King County GR 14 compliant PDFs  
**Features:**
- Handles corrupted YAML front matter automatically
- Creates professional court-ready PDFs
- Built-in validation and error checking
- King County Superior Court GR 14 compliant formatting

### Backup Scripts (Historical)
- `convert_filing_docs_pandoc.sh` (August 17) - Good working version
- `convert_filing_docs.sh` (August 16) - Uses wkhtmltopdf (inferior quality)
- `convert_legal_docs_enhanced.sh` (August 17) - Complex version

## 🔧 Technical Details

### Dependencies Required
```bash
# Install required packages
sudo apt-get install pandoc texlive-latex-base texlive-fonts-recommended
```

### PDF Specifications (King County GR 14 Compliant)
- **Font:** Times New Roman, 12pt
- **Line Spacing:** Double-spaced (linestretch 2.0)
- **Margins:** 3" top, 1" bottom/left/right
- **Format:** Professional legal document structure
- **Paper:** Letter size (8.5" x 11")
- **Colors:** None (black text only)
- **Compatibility:** Universal PDF/A format

### Pandoc Command Template
```bash
pandoc \
    --from markdown \
    --to pdf \
    --pdf-engine=pdflatex \
    --variable geometry:"top=3in,bottom=1in,left=1in,right=1in" \
    --variable fontfamily:times \
    --variable fontsize:12pt \
    --variable linestretch:2.0 \
    --variable documentclass:article \
    --variable papersize:letter \
    --metadata title:"" \
    --metadata author:"" \
    --metadata date:"" \
    --standalone \
    input.md \
    -o output.pdf
```

## 🧹 YAML Front Matter Handling

### The Problem
TIA markdown files contain YAML front matter for system integration:
```yaml
---
title: "Document Title"
created: "2025-08-17"
case_numbers:
- 20-3-03830-3-SEA
beth_keywords:
- professional evidence
---
```

**This YAML must be removed** before PDF conversion or it will appear in court documents.

### Automatic Removal Methods

#### Method 1: Universal Script (Recommended)
```bash
./md_to_pdf_legal_universal.sh
```
- Automatically detects and removes ALL YAML blocks
- Handles corrupted/multiple YAML sections
- Finds actual document content intelligently

#### Method 2: Manual sed Command
```bash
# Remove first YAML block only (basic)
sed '/^---$/,/^---$/d' input.md > clean.md

# Remove multiple YAML blocks (better)
sed '/^---$/,/^---$/d' input.md | sed '/^---$/,/^---$/d' > clean.md
```

#### Method 3: Content Detection
```bash
# Find actual document start
grep -n "PETITIONER'S RESPONSE\|SUPPLEMENTAL DECLARATION\|PROPOSED ORDER" input.md

# Extract from that line onwards
tail -n +LINE_NUMBER input.md > clean.md
```

## 📁 Directory Structure

### Standard Layout
```
~/Legal/
├── md_to_pdf_legal_universal.sh          # Main conversion script
├── 01_ACTIVE_HEARING_AUG_25/
│   └── COURT_FILINGS/
│       └── FINAL_FILING_DOCUMENTS/
│           ├── *_FILING_READY.md          # Source markdown files
│           └── PROFESSIONAL_PDFs_COURT_READY/
│               ├── *_FILING_READY.pdf     # Output PDFs
│               └── clean_markdown/        # Intermediate clean files
│                   └── *_clean.md
```

### Current Production Files (August 17, 2025)
**Source Directory:** `~/Legal/01_ACTIVE_HEARING_AUG_25/COURT_FILINGS/FINAL_FILING_DOCUMENTS/`
**Output Directory:** `~/Legal/01_ACTIVE_HEARING_AUG_25/COURT_FILINGS/FINAL_FILING_DOCUMENTS/PROFESSIONAL_PDFs_COURT_READY/`

**Ready PDFs:**
1. `01_Response_Brief_Opposition_FILING_READY.pdf` (58KB)
2. `02_FL135_Declaration_Opposition_FILING_READY.pdf` (60KB)  
3. `03_Proposed_Order_Custody_Restoration_FILING_READY.pdf` (25KB)
4. `04_Supplemental_Declaration_FILING_READY.pdf` (58KB)

## ✅ Quality Validation

### Automated Validation
```bash
# Run with validation
./md_to_pdf_legal_universal.sh -v
```

### Manual Validation Checklist
```bash
# Check PDF content
pdftotext file.pdf - | head -10

# Check file size (should be 20-100KB for typical legal docs)
ls -lh *.pdf

# Verify no YAML front matter in PDF
pdftotext file.pdf - | grep -E "^---|^title:|^created:"

# Should return nothing (empty result = good)
```

### Quality Indicators
- **Good PDF:** 20-100KB file size, clean legal headers, no YAML content
- **Bad PDF:** <5KB or >500KB, contains YAML markers, missing content

## 🚨 Common Issues & Solutions

### Issue 1: PDF Contains YAML Front Matter
**Symptom:** PDF shows `---`, `title:`, `created:` at the top  
**Solution:** YAML removal failed - check source file structure
```bash
# Debug with clean-only mode
./md_to_pdf_legal_universal.sh -c
# Check the clean markdown files manually
```

### Issue 2: PDF Conversion Fails
**Symptom:** Script reports "PDF conversion failed"  
**Solution:** Check LaTeX installation and markdown formatting
```bash
# Install missing packages
sudo apt-get install texlive-latex-base texlive-fonts-recommended

# Check markdown syntax
pandoc --from markdown --to html input.md > test.html
```

### Issue 3: Missing Content in PDF
**Symptom:** PDF exists but has minimal content  
**Solution:** YAML removal was too aggressive
```bash
# Check clean markdown file
cat clean_markdown/filename_clean.md

# If empty, manually extract content starting from actual document title
grep -n "PETITIONER'S\|SUPPLEMENTAL\|PROPOSED" source.md
tail -n +LINE_NUMBER source.md > fixed.md
```

### Issue 4: Large File Sizes (>500KB)
**Symptom:** PDF files are unnecessarily large  
**Solution:** Use pandoc instead of wkhtmltopdf, optimize images
```bash
# Check for embedded images
grep -i "\!\[" source.md

# Use the universal script (uses pandoc by default)
./md_to_pdf_legal_universal.sh
```

## 📋 Historical Context

### Conversion Method Evolution
1. **August 16:** `wkhtmltopdf` method (created large files 1-5MB)
2. **August 17:** `pandoc` method (creates optimal files 20-100KB)
3. **August 17:** Universal script with YAML detection (current best practice)

### Lessons Learned
- YAML front matter **must** be completely removed
- Pandoc produces superior PDFs vs wkhtmltopdf
- Content detection is more reliable than regex YAML removal
- King County GR 14 compliance requires specific formatting parameters

## 🔗 Related Documentation

- **King County GR 14:** Local court rules for document formatting
- **Pandoc Manual:** https://pandoc.org/MANUAL.html
- **LaTeX Font Guide:** For troubleshooting font issues
- **TIA YAML Standards:** For understanding source file structure

## 🎯 Next Steps for Future Use

### For Regular Legal Document Conversion
1. Keep markdown files with YAML for TIA system integration
2. Use universal script when converting for court submission
3. Always validate PDFs before filing
4. Keep clean markdown files as backup/debugging aid

### For Script Maintenance
1. Universal script handles current YAML corruption patterns
2. Update script if new YAML structures appear
3. Test script with each new TIA markdown format
4. Maintain both universal script and simple backup methods

---

**Script Status:** ✅ Production Ready  
**Last Validation:** August 17, 2025  
**Success Rate:** 100% (4/4 files converted successfully)  
**Ready for Court Filing:** YES