📝 Dideon - AI Essay Generation Pipeline
Transform raw chat logs from ChatGPT and Claude into comprehensive, publication-ready essays and long-form works
A 4-stage AI refinement process with RAG-enhanced context, semantic grouping, and multi-provider AI collaboration that transforms raw conversation exports into curated, high-quality essays.
🎯 What Dideon Does
Dideon processes raw chat logs from Anthropic Claude and OpenAI ChatGPT conversations, intelligently curating them into thoughtful, publication-ready essays. Instead of overwhelming you with noise, it finds the signal through a systematic refinement process.
Source Material:
- ChatGPT Export Files: conversations.json format from OpenAI
- Claude Export Files: JSON and TXT formats from Anthropic
- Individual Conversations: Raw chat logs and discussion transcripts
🏗️ The 4-Stage Architecture
Conversation → First Draft Essay + Classification
- Processes individual conversations using local models (cost-efficient)
- Generates preliminary essays with keyword classification
- Filters conversations worth developing further
Grouped Synthesis → Competing Drafts
- Groups related Stage 1 essays using keyword clustering
- Generates competing perspectives using local models
- Cross-pollinates ideas across conversations
Final Synthesis → Publication-Ready Essays
- Semantically groups Stage 2 essays using embeddings
- Uses frontier AI models (GPT-4, Claude) with extensive RAG context
- Produces polished, blog-ready content with proper structure
Iterative Long-Form Works → Books, Reports, Guides
- Takes Stage 3 essays as seeds for larger collaborative works
- Multi-model iterative generation for extended pieces
- Builds substantial works through guided collaborative iteration
🧠 RAG-Enhanced Context System
Dideon uses a sophisticated RAG (Retrieval-Augmented Generation) system to enhance essay generation:
Semantic Search Architecture
- Embedding Service: Local Ollama with nomic-embed-text model
- Vector Database: PostgreSQL with pgvector extension
- Similarity Threshold: 0.3 cosine similarity for relevance filtering
- No Artificial Limits: Retrieves all semantically relevant chunks
Context Enhancement by Stage
- Stage 1: No RAG context (focused on individual conversations)
- Stage 2: Moderate RAG context for cross-pollination ideas
- Stage 3: Extensive RAG context (up to 300K characters) for rich synthesis
- Stage 4: Iterative context building from previous iterations and source essays
Quality Controls
- Similarity-based filtering instead of arbitrary chunk limits
- Automatic deduplication of retrieved content
- Source conversation exclusion to prevent circular references
🎛️ Multi-Provider AI Integration
Supported Providers
AI Model Ecosystem
- OpenAI: GPT-4, GPT-4-turbo, GPT-3.5-turbo
- Anthropic: Claude-3.5-sonnet, Claude-3-haiku
- Ollama: Local models (Phi4, Llama3.2, Qwen2.5, Gemma2)
Cost Optimization Strategy
# Cost-conscious setup (local models for bulk processing)
AGENT_ANALYST_PROVIDER=ollama
AGENT_CRITIC_PROVIDER=ollama
AGENT_SYNTHESIZER_PROVIDER=openai # Frontier models for final polish
# Quality-focused setup (APIs for all stages)
AGENT_ANALYST_PROVIDER=openai
AGENT_CRITIC_PROVIDER=anthropic
AGENT_SYNTHESIZER_PROVIDER=anthropic
Model Selection by Stage
- Stage 1 & 2: Local models (phi4-mini, llama3.2) for cost efficiency
- Stage 3: Frontier models (GPT-4, Claude) for publication quality
- Stage 4: Alternating frontier models for collaborative iteration
- Embeddings: Local nomic-embed-text for privacy and cost
📖 Stage 4: Long-Form Iterative Works
Stage 4 enables collaborative creation of extended works by using Stage 3 essays as seeds for larger projects.
Key Features
- Iterative Generation: Multiple rounds of collaborative writing
- Multi-Model Collaboration: Alternating providers for diverse perspectives
- Custom Prompting: Define work-specific generation prompts
- Web-Based Management: Create and monitor works through the UI
Example Work Types
- Technical Guides: Multi-chapter technical documentation
- Research Reports: Comprehensive analysis documents
- Creative Writing: Short stories, novellas, or essay collections
- Academic Papers: Extended arguments with multiple sections
- Business Documents: Detailed proposals, strategy documents
📊 Output Quality & Process Results
From Raw Chat Logs to Publication-Ready Content
Example Generated Essays:
- "The Dawn of Digital Meta-Consciousness" - View Full Essay →
- "Building the Human-Centric Web" - View Full Essay →
- "Desire, Capital, and the Paradox of Repression" - View Full Essay →
- "The Future of Human Connection: Hair as Biological Antennas" - View Full Essay →
- "The Labyrinth of Desire, Art, and Artificial Minds"
- "Consciousness, Control, and the Creative Impulse"
- "Technology and the Architecture of Human Meaning"
- Plus additional comprehensive essays and Stage 4 extended works
🎯 Live Examples Available:
Example #1: "The Dawn of Digital Meta-Consciousness" - Explores AI consciousness, digital twins, and planetary meta-consciousness from raw chat logs using OpenAI GPT-4.1.
Example #2: "Building the Human-Centric Web" - Technical blueprint for privacy-preserving authentication and decentralized digital identity systems from conversation exports.
Example #3: "Desire, Capital, and the Paradox of Repression" - Philosophical exploration of capitalism, technology, and consciousness examining how desire becomes commodified and constrained.
Example #4: "The Future of Human Connection" - Speculative bioengineering essay on transforming human hair into biological antennas for electromagnetic communication and enhanced empathy.
These demonstrate the extraordinary range of your pipeline - from AI consciousness theory to privacy cryptography, philosophical analysis, and speculative bioengineering, all transformed from fragmented chat logs into publication-ready essays.
Quality Characteristics
- Unified Voice: Reads as single author's perspective
- Cross-Conversation Synthesis: Combines insights from multiple sources
- Publication Ready: Proper structure, compelling titles, clear arguments
- No Meta-References: Never mentions source materials or AI generation
⚡ Performance & Economics
Processing Characteristics
Pipeline Efficiency:
- Intelligent Curation: Quality over quantity approach to content selection
- Cost Optimization: Local models for bulk processing, frontier models for refinement
- Scalable Processing: Handles varying volumes of input conversations
- Iterative Refinement: Multi-stage process ensures publication quality
Cost Strategy
Economic Model by Stage:
├── Stage 1: Local models for cost-efficient initial processing
├── Stage 2: Local models for bulk synthesis work
├── Stage 3: Frontier models for publication-quality refinement
├── Stage 4: Variable cost based on iteration complexity
└── Overall: Balanced approach optimizing quality vs. operational cost
🔧 Technical Implementation
Quick Start Example
# Clone and setup
git clone https://github.com/your-username/dideon.git
cd dideon
# Configure with your API keys
cp .env.example .env
nano .env
# Run everything with Docker
docker-compose up --build
# Run the complete 4-stage pipeline
docker-compose run --rm app python -m dideon generate
# Browse results in web UI
open http://localhost:8081
Architecture Components
Core Systems
- Database: PostgreSQL with pgvector extension for embeddings
- Web Interface: Essay browser, Stage 4 works manager, pipeline monitor
- Docker Services: Containerized deployment with database and application
- Prompt System: Stage-specific templates with customizable outputs
Web Interface Features
- Essay Browser: View all generated essays by stage (1-3)
- Stage 4 Works Manager: Create and manage long-form iterative works
- Conversation Explorer: Browse source conversations
- Pipeline Monitor: Track generation progress and costs
- RAG Debugger: Inspect retrieved context and similarity scores
🛠️ Development Philosophy
- No artificial limits: Let natural constraints (similarity, context windows) govern processing
- Provider agnostic: Easy to swap AI providers and models
- Quality over quantity: Intelligent curation beats brute force generation
- User control: Extensive configuration without code changes
This system demonstrates intelligent content curation and multi-stage AI collaboration for transforming raw chat logs from AI conversations into polished publications.
Built with: Python, PostgreSQL, pgvector, Docker, OpenAI GPT, Anthropic Claude, Ollama (local models), FastAPI, semantic similarity search, and multi-provider AI orchestration.
Architecture: 4-stage refinement pipeline with RAG-enhanced context, semantic grouping, iterative long-form generation, and cost-optimized model selection.
License: WTFPL - This project implements intelligent conversation curation with multi-provider AI collaboration for publication-ready content generation.