A comprehensive technical deep-dive into CortexOS: zero-knowledge cryptography, on-device Llama 3.2 LLM, sovereign ML pipeline, health integration, and notification intelligence.
Modern AI applications function as thin clients streaming user data to centralized servers. Every prompt, every journal entry, every voice recording crosses a trust boundary the user cannot audit. CortexOS rejects this model entirely.
Sovereign AI means your thoughts stay on your device. CortexOS leverages hardware acceleration on modern mobile processors — including on-device LLM inference via Llama 3.2 — to deliver intelligence that rivals cloud services without ever transmitting personal content.
Your journal entries, voice recordings, health correlations, generative AI outputs, and ML-derived insights are processed entirely on your device. Zero bytes of personal content ever touch our servers.
CortexOS implements application-layer encryption with user-derived keys. No server, cloud provider, or CortexOS itself can read your journal. The encryption key exists only on your device and is never transmitted in any form.
Verify the implementation: The cryptographic layer (key derivation, encryption, vault) is open source — github.com/CortexOS-App/CortexOS-crypto-core
During onboarding, CortexOS generates a 6-word BIP39 recovery phrase combined with a 4-digit PIN you choose. These two factors together are the sole source of all cryptographic material. Neither is ever sent to any server.
| Factor | Format | Entropy |
|---|---|---|
| Recovery phrase | 6 words from BIP39 wordlist (2,048 words) | 2,0486 ≈ 7.2 × 1019 combinations |
| PIN | 4 digits | 10,000 combinations |
| Combined | "word1 word2 word3 word4 word5 word6-1234" | ≈ 7.2 × 1023 — brute-force infeasible with Argon2id |
The combined phrase is normalized (lowercase, whitespace trimmed) and fed into Argon2id. Three independent keys are derived from it — each with a different purpose-bound domain salt so that compromising one key reveals nothing about the others.
Argon2id (winner of the Password Hashing Competition, OWASP’s current recommendation) is used for all key derivation. Its memory-hard design makes GPU and ASIC attacks economically infeasible: a single derivation requires 64 MB of RAM held for the full computation.
| Parameter | Value | Rationale |
|---|---|---|
| Algorithm | Argon2id v1.3 | Hybrid variant: resists both GPU brute-force and side-channel attacks |
| Memory cost | 65,536 KB (64 MB) | Memory-hard — each attempt requires 64 MB RAM |
| Time cost | 3 iterations | ~800 ms on mid-range mobile; negligible for users, prohibitive for attackers |
| Parallelism | 4 lanes | Matches mobile multi-core; attacker gains nothing from parallelism |
| Output length | 32 bytes (256 bits) | Exact AES-256 key size; no truncation or expansion |
Three separate Argon2id calls produce three independent keys. Domain separation is achieved via purpose-bound salts — a technique that ensures the same input never produces the same output for different purposes:
| Key | Salt | Used For |
|---|---|---|
accountId | "cortexos-account-id-v2-argon2id" (fixed) | Opaque server identifier. Fixed salt ensures the same phrase produces the same ID on any device — enabling cross-device vault lookup without registration. |
encryptionKey | "cortexos-encryption-key-v2-argon2id" + 32-byte random user salt | AES-256 key for all local and vault encryption. Per-user salt ensures uniqueness even if two users share the same phrase. |
authToken | "cortexos-auth-token-v2-argon2id" + 32-byte random user salt | API authentication bearer token. Fully independent from the encryption key — a server compromise reveals no key material. |
// Argon2id parameters — MUST match Android's BouncyCastle implementation exactly
private static let argon2Memory: UInt32 = 65536 // 64 MB in KB
private static let argon2TimeCost: UInt32 = 3 // iterations
private static let argon2Parallelism: UInt32 = 4 // threads
private static let hashLength: Int = 32 // 256 bits
// Domain-separation salts (raw UTF-8 bytes — identical on iOS and Android)
private static let saltAccountId = "cortexos-account-id-v2-argon2id"
private static let saltEncryptionKey = "cortexos-encryption-key-v2-argon2id"
private static let saltAuthToken = "cortexos-auth-token-v2-argon2id"
public static func deriveAllKeys(from fullPhrase: String, userSalt: Data) throws -> DerivedKeys {
let normalized = normalizePhrase(fullPhrase) // lowercase, trim whitespace
// accountId: fixed salt — same result on every device for the same phrase
let accountIdBytes = try argon2id(password: normalized, salt: saltAccountId)
// encryptionKey: purpose prefix + per-user random salt
let encSalt = saltEncryptionKey.data(using: .utf8)! + userSalt
let encryptionKeyBytes = try argon2id(password: normalized, saltData: encSalt)
// authToken: purpose prefix + per-user random salt (independent from encryptionKey)
let authSalt = saltAuthToken.data(using: .utf8)! + userSalt
let authTokenBytes = try argon2id(password: normalized, saltData: authSalt)
return DerivedKeys(
accountId: accountIdBytes.hexString, // 64-char hex
encryptionKey: SymmetricKey(data: encryptionKeyBytes),
authToken: authTokenBytes.hexString // 64-char hex
)
}
All content is encrypted with AES-256-GCM (Galois/Counter Mode). GCM provides both confidentiality and authenticity in a single pass — any tampering with the ciphertext is detected and rejected before decryption proceeds. This protects against padding oracle attacks, bit-flipping attacks, and silent data corruption.
| Component | Value |
|---|---|
| Cipher | AES-256-GCM |
| Key size | 256 bits (32 bytes) |
| Nonce / IV | 96 bits (12 bytes), cryptographically random per operation |
| Authentication tag | 128 bits (16 bytes) |
| Output format | Nonce ‖ Ciphertext ‖ AuthTag — combined into a single opaque blob |
| Key storage | iOS: Keychain (kSecAttrAccessibleWhenUnlockedThisDeviceOnly) • Android: Keystore (hardware-backed TEE/HSM) |
public func encryptData(_ data: Data) throws -> Data {
let key = try getMasterKeySync() // loaded from Keychain
let sealedBox = try AES.GCM.seal(data, using: key)
guard let combined = sealedBox.combined else { throw EncryptionError.encryptionFailed }
return combined // nonce || ciphertext || tag
}
public func decryptData(_ data: Data) throws -> Data {
let key = try getMasterKeySync()
let sealedBox = try AES.GCM.SealedBox(combined: data)
return try AES.GCM.open(sealedBox, using: key) // fails if tag invalid
}
Every piece of user-generated content is encrypted before being written to disk:
AI analysis runs against plaintext in memory only. Results are encrypted before being written to the database. The on-device LLM never receives decrypted entries from a network source — it reads from the already-decrypted in-memory context only during an active session.
A zero-knowledge system has an inherent UX problem: a wrong passphrase silently produces a wrong key. CortexOS solves this with two independent mechanisms:
Canary verification: During onboarding, a known string is encrypted with the derived key and stored. On each login, the system attempts to decrypt the canary. A match confirms the correct key was derived — without the key, phrase, or PIN ever being stored anywhere.
Secure word hashes: Each of the 6 phrase words is hashed as SHA-256("position:word") and stored in the Keychain. On re-login, the user is challenged with 2 randomly selected positions. Only the correct word at the correct position produces the matching hash. This resists word-by-word enumeration attacks — an attacker who obtains one hash still cannot reconstruct the full phrase.
// SHA-256("position:word") — position prevents cross-position hash reuse attacks
public func hashWordAtPosition(_ word: String, position: Int) -> String {
let input = "\(position):\(word.lowercased())"
let hash = SHA256.hash(data: Data(input.utf8))
return hash.compactMap { String(format: "%02x", $0) }.joined()
}
// Login challenge: 2 random positions from the 6-word phrase + PIN
public func verifyPhraseChallenge(
wordIndex1: Int, word1: String,
wordIndex2: Int, word2: String,
pin: String
) -> Bool {
guard verifyPin(pin) else { return false }
let hash1 = hashWordAtPosition(word1, position: wordIndex1)
let hash2 = hashWordAtPosition(word2, position: wordIndex2)
return hash1 == getPhraseWordHash(wordIndex1) &&
hash2 == getPhraseWordHash(wordIndex2)
}
CortexOS offers optional encrypted cloud backup. The vault is architecturally zero-knowledge: the server stores encrypted blobs it cannot decrypt. There is no registration, no email, no account management server-side — just opaque bytes identified by a deterministic hash.
The vault pipeline has three layers, each with a single responsibility:
encryptionKey. The key never leaves the device.| What the server stores | What the server cannot determine |
|---|---|
|
|
Even under legal compulsion, the server operator cannot produce plaintext journal data. There is nothing to hand over except encrypted bytes.
The accountId uses a fixed domain salt for its Argon2id derivation. This means the same recovery phrase + PIN always produces the same accountId on any device, on any platform (iOS or Android) — enabling vault recovery on a new device without any server-side registration or account linking. No email or identifier is ever provided to the server.
// 1. Serialize entries to JSON
let vaultData = VaultData(version: .currentVersion, platform: "ios", entries: entryDTOs)
let jsonData = try JSONEncoder().encode(vaultData)
// 2. Encrypt client-side — server never sees plaintext
let encryptedData = try VaultEncryption.encrypt(data: jsonData, key: key)
// 3. Upload opaque blob — authenticated with authToken, NOT encryptionKey
try await api.uploadVault(data: encryptedData)
When a user installs CortexOS on a new device and enters their recovery phrase + PIN:
accountId and encryptionKey deterministically from the phrase + PINauthTokenencryptionKeyAt no point does the server participate in key exchange, key storage, or key escrow. If the recovery phrase is lost, the vault cannot be decrypted by anyone — including CortexOS.
| Threat | CortexOS’s Protection |
|---|---|
| Server breach (Cloudflare R2 compromised) | Attacker gets encrypted blobs and opaque accountIds. No decryption possible without the user’s phrase + PIN. |
| Network interception (MitM) | All vault traffic is TLS + the payload is independently AES-256-GCM encrypted. Interception yields an encrypted blob over an encrypted channel. |
| Subpoena / legal compulsion of server | Server operator can only produce encrypted blobs. No key material exists on the server. No user identity beyond opaque accountId. |
| Brute-force of recovery phrase | 2,0486 × 10,000 ≈ 7.2 × 1023 combinations. Each attempt requires 64 MB RAM × 3 iterations of Argon2id. Computationally infeasible. |
| Compromised authToken | authToken authenticates vault API access but is cryptographically independent from the encryptionKey. A leaked token allows downloading the blob but not decrypting it. |
| Malicious app update replacing the crypto | App Store code signing and Apple’s review process. Open-sourced crypto layer allows independent verification that the published code matches the app. |
| Attacker with physical device access | Data at rest is encrypted. Keychain keys are WhenUnlockedThisDeviceOnly — inaccessible when the device is locked or moved to another device. |
| Lost recovery phrase | Data cannot be recovered by the user or by CortexOS. This is by design — it is the cost of zero-knowledge encryption. |
CortexOS runs AI models entirely offline using a four-system architecture, each serving distinct purposes while sharing common ML infrastructure.
Real-time. Triggered when user saves a journal entry. Routes through AIAnalyzerFactory to tier-appropriate TFLite / Lexicon pipeline. Outputs sentiment, emotions, entities, cognitive distortions.
Background. Runs via InsightWorker (9 AM) and NightlyAnalysisWorker (3 AM). Uses MLCrossEntryAnalyzer for semantic clustering, mood trajectories, and correlation detection.
Notifications. MLContextualEngine scores prompts by relevance (60%) + novelty (40%). Considers time-of-day, mood trajectory, and streak status for smart nudges.
On-demand. Llama 3.2 via LlamaGenerativeService. Generates cognitive reframings, weekly summaries, streak celebrations, and therapeutic nudge messages. See Section 5.
| Model | Size | Purpose |
|---|---|---|
mobilebert_sentiment.tflite | ~25 MB | Primary BERT-based sentiment analysis |
sentiment_model.tflite | ~2 MB | Lightweight sentiment fallback |
emotion_model.tflite | ~5 MB | 15-class emotion detection |
sentence_encoder.tflite | ~22 MB | 512-dimensional embeddings (USE-Lite) |
llama-3.2-1b-q4_k_m.gguf | ~700 MB | On-device generative AI (mid-range devices) |
llama-3.2-3b-q4_k_m.gguf | ~1.8 GB | On-device generative AI (high-end devices, 6GB+ RAM) |
whisper-tiny.bin | ~75 MB | Voice-to-text transcription |
When models fail or device resources are constrained, the system degrades gracefully through five levels:
The lexicon contains sentiment-scored words from AFINN and SentiWordNet datasets, enabling full offline analysis without any ML models loaded.
DeviceCapabilityChecker detects available RAM and CPU tier at startup. The ML pipeline adapts accordingly:
| Device Tier | RAM | Capabilities |
|---|---|---|
| HIGH_END | 6 GB+ | All TFLite models + Llama 3.2 3B + Whisper |
| MID_END | 4–6 GB | All TFLite models + Llama 3.2 1B + Whisper |
| LOW_END | < 4 GB | Lightweight TFLite + Lexicon + Whisper Tiny |
| MINIMAL | < 3 GB | Lexicon only. LLM disabled entirely. |
CortexOS v4.0 introduces full generative AI running entirely on-device using Meta’s Llama 3.2 Instruct models via llama.cpp. No text is ever sent to a server. This is the single largest capability addition since launch.
| Parameter | Value |
|---|---|
| Model | Llama 3.2 Instruct (1B or 3B, quantized q4_k_m) |
| Context Window | 2,048 tokens |
| Max Output Tokens | 256 (default) |
| Temperature | 0.7 |
| Top-P | 0.9 |
| Timeout | 30 seconds (configurable) |
Every LLM call uses a two-tier contract: try Llama first, fall back to a high-quality static template if inference fails, times out, or the model isn’t loaded.
LlamaGenerativeService.ktsuspend fun generateOrFallback(
prompt: String,
fallback: String,
maxTokens: Int = 200,
temperature: Float = 0.7f,
timeoutMs: Long = 30_000L
): String
suspend fun generateOrNull(
prompt: String,
maxTokens: Int = 200,
temperature: Float = 0.7f,
timeoutMs: Long = 30_000L
): String?
fun isAvailable(): Boolean = MLCoreEngine.hasLlamaSupport()
This ensures the user experience is never degraded by model unavailability. Static templates in AIContentLibrary (500+ prompts, 200+ insights, 40+ emotion definitions) provide rich fallback content.
ML models are trained on general datasets, but every person writes differently. CortexOS includes a PersonalizedSentimentAdapter that learns per-user sentiment corrections over time without retraining the underlying model.
The adapter operates as a lightweight post-processing layer after TFLite inference. If a user’s entries about “gym” are consistently positive but the model scores them neutral, the adapter learns a topic→sentiment offset and adjusts future scores.
NightlyAnalysisWorkersuspend fun adjust(text: String, baseSentiment: Float): Float {
val keywords = extractKeywords(text)
var totalOffset = 0f
var matchCount = 0
for (keyword in keywords) {
learnedOffsets[keyword.lowercase()]?.let { offset ->
totalOffset += offset
matchCount++
}
}
return (baseSentiment + averageOffset).coerceIn(-1f, 1f)
}
CortexOS offers three subscription tiers plus a 14-day free trial. Cortex Deep and Cortex Ultra have identical capabilities — Ultra is the one-time lifetime payment option.
| Feature | Nano (Free) | Deep (Premium) | Ultra (Lifetime) | Trial (14 days) |
|---|---|---|---|---|
| Entries / month | 50 | Unlimited | Unlimited | Unlimited |
| Sentiment analysis | 3-level | 5-level + confidence | 5-level + confidence | 5-level + confidence |
| Emotions detected | 5 | 15 | 15 | 15 |
| Sentiment lexicon | 1,200 words | 15,234 words | 15,234 words | 15,234 words |
| Context window | 1,000 words | 5,000 words | 5,000 words | 5,000 words |
| Cognitive distortions | — | 15 types + reframing | 15 types + reframing | 15 types + reframing |
| Cross-entry insights | Basic | ML-enhanced | ML-enhanced | ML-enhanced |
| Weekly summaries | — | ✓ | ✓ | ✓ |
| Semantic throwbacks | — | ✓ | ✓ | ✓ |
| Daily reflection | — | ✓ | ✓ | ✓ |
| Mood prediction | — | ✓ | ✓ | ✓ |
| Therapist export | — | ✓ | ✓ | ✓ |
| Voice journaling | ✓ | ✓ | ✓ | ✓ |
| Chapters | ✓ | ✓ | ✓ | ✓ |
| Health Integration | ✓ | ✓ | ✓ | ✓ |
| Cloud vault backup | ✓ | ✓ | ✓ | ✓ |
| Home screen widget | ✓ | ✓ | ✓ | ✓ |
fun getAnalyzer(context: Context): BaseAIAnalyzer {
val prefs = UserPreferences(context)
val effectiveTier = when {
prefs.isLifetime -> TIER_LIFETIME
prefs.isPremium -> TIER_PREMIUM // Includes trial
else -> prefs.subscriptionTier
}
return when (effectiveTier) {
TIER_LIFETIME, TIER_PREMIUM
-> LifetimeAIAnalyzer(context) // Cortex Ultra/Deep
else
-> FreemiumAIAnalyzer(context) // Cortex Nano
}
}
CortexOS implements semantic understanding using sentence embeddings, enabling discovery of entries that feel similar despite using entirely different words.
Each journal entry encodes into a 512-dimensional vector using a quantized version of Universal Sentence Encoder Lite (USE-Lite). Embeddings capture semantic meaning beyond keywords, storing as 2,048 bytes per entry.
MLCoreEngine.ktconst val EMBEDDING_DIM = 512
const val EMBEDDING_BYTE_SIZE = EMBEDDING_DIM * 4 // 2,048 bytes
Rather than matching by date alone, the system finds entries by meaning through cosine similarity with a 0.5–0.7 threshold.
Today’s entry about “feeling overwhelmed at work” surfaces a 6-month-old entry about “the pressure of deadlines” with 0.73 similarity — despite sharing no common words.
MLCrossEntryAnalyzer uses K-means clustering (k=5) on embeddings to automatically group entries by theme, enabling pattern detection without manual tagging. Themes are extracted from cluster centroids and labeled using the most representative entries.
NightlyAnalysisWorker scans for entries missing embeddings (created before ML was available, or migrated from older versions) and regenerates up to 50 per night. This ensures the semantic index gradually reaches full coverage without impacting daytime performance.
CortexOS transforms journaling from isolated entries into a continuous narrative. The Journey System organizes entries into 7-day chapters with AI-generated names, creating a sense of progression and story.
ChapterFinalizationWorker automatically finalizes expired chapters twice dailyChapterNameGenerator aggregates emotions, themes, and mood trajectory across all entries in the chapter to produce a poetic title. When Llama 3.2 is available, names are generated by the LLM for richer, more personalized results.
ChapterAnalyzer provides ML-enhanced aggregation across a chapter’s entries:
CortexOS integrates with Apple HealthKit (iOS) and Android Health Connect (Android) to surface correlations between physical health and emotional state. All health data stays on-device and is processed locally.
| Data Type | What’s Read | Alert Threshold |
|---|---|---|
| Sleep | Sessions, total duration, sleep stages | < 5 hours triggers concern |
| Heart Rate | Average, resting, min/max | > 100 bpm triggers alert |
| Steps & Activity | Daily steps, distance, calories | — |
| Workouts | Exercise sessions, types, duration | — |
HealthSignalMonitor watches for concerning patterns (low sleep, elevated heart rate) and caches health concern flags for InsightWorker to surface. A 24-hour cooldown prevents alert fatigue. Health insights are integrated into chapter analytics, weekly summaries, and therapist exports.
Each data type is independently togglable. CortexOS requests only the permissions the user explicitly enables. Health data is encrypted alongside journal entries using the same AES-256-GCM pipeline.
Voice journaling uses OpenAI’s Whisper model ported for entirely on-device operation via Whisper.cpp. No audio ever leaves the device.
| Parameter | Value |
|---|---|
| Engine | Android: Whisper.cpp via JNI • iOS: WhisperKit |
| Models | whisper-tiny.bin (75 MB) or whisper-base.bin (150 MB) |
| Sample Rate | 16 kHz mono PCM |
| Languages | 99 languages supported |
| Processing | 100% on-device, zero cloud |
CortexOS has 7+ independent notification systems. Without coordination, worst-case days could produce 10–14 notifications. The Notification Intelligence layer ensures notifications are helpful, never overwhelming.
NotificationThrottler is a global gatekeeper that every notification must pass through before firing. It enforces a daily budget and minimum spacing.
| Priority | Types | Budget | Spacing |
|---|---|---|---|
| CRITICAL | User-set reminders | Always delivered | None |
| HIGH | Mood alerts, declining mood check-ins | Subject to budget | None |
| MEDIUM | Daily reflection, nudges, inactivity | Subject to budget | 2 hours minimum |
| LOW | Throwbacks, streaks, chapters, summaries | Subject to budget | 2 hours minimum |
Default daily budget: 4 notifications (user-configurable, range 2–8). Resets at midnight. CRITICAL reminders never count against the budget.
InsightWorker runs 7 analysis checks daily but collects notification candidates into a priority queue rather than firing each one. Only the top 1–2 candidates are delivered. Unfired insights are still saved to the database and visible in the Insights tab.
Priority order: mood alert > streak milestone > throwback > weekly summary > smart nudge > health signal.
SessionMoodMonitor provides real-time intra-session detection during active journaling sessions.
| Parameter | Value |
|---|---|
| Session window | 4 hours for decline detection |
| Session timeout | 6 hours of inactivity |
| Decline trigger | 3+ increasingly negative entries in window |
| Sudden drop threshold | > 0.6 deviation from 7-day average |
| Alert cooldown | 3 days between session mood alerts |
InactivityNudgeWorker sends escalating re-engagement nudges at 3, 7, 14, 30, and 60+ day inactivity thresholds. Each threshold fires only once; the counter resets when the user creates a new entry.
ReminderDetector uses NLP to extract reminders directly from journal text (“remind me in 2 hours”, “don’t forget tomorrow”). Detected reminders are automatically scheduled and delivered as CRITICAL-priority notifications.
CortexOS coordinates 13 background workers through a platform-native scheduler (WorkManager on Android, BGTaskScheduler on iOS):
| Worker | Schedule | Purpose |
|---|---|---|
NightlyAnalysisWorker | 3 AM daily | Heavy ML analysis, cache pre-computation, embedding regeneration |
InsightWorker | 9 AM daily | Throwbacks, mood alerts, weekly summaries, streak checks |
DailyReflectionWorker | User-configured hour | Mood-aware journaling prompts |
NudgeWorker | Every 8–24h | Contextual journaling nudges |
InactivityNudgeWorker | Daily | Progressive re-engagement (3/7/14/30/60+ days) |
ChapterFinalizationWorker | Every 12h | Finalize expired 7-day chapters |
DailyReminderScanWorker | 6 AM daily | Reschedule missed reminders |
ReminderWorker | On-demand | Fire individual reminders at scheduled time |
VaultSyncWorker | On write (debounced) | Encrypted cloud backup |
TherapistExportManager generates shareable reports for mental health professionals while preserving journal privacy. All processing is on-device.
Reports generate in a secure temp directory and are programmatically wiped when:
Unlike analytics-driven apps, CortexOS learns user behavior locally to optimize the journaling experience — without transmitting behavioral data to any server.
MLContextualEngine analyzes entry timestamps across the last 30 days to predict optimal journaling times. It counts entries by hour, identifies peak hours, calculates confidence, and provides natural-language reasoning (e.g., “You usually journal in the evening”).
Notifications adapt to user context using locally-computed signals:
SmartVarietySelector in the AIContentLibrary tracks the last 10 prompts shown to each user and ensures no repetition. Prompts are scored by relevance (60%) and novelty (40%) before selection.
NightlyAnalysisWorker runs heavy analysis at 3 AM while the user sleeps — pre-computing correlations, mood trajectories, emotional cycles, and embedding regeneration. Results are cached in AnalysisCache for instant access on wake. This ensures the Insights tab loads instantly without running expensive ML queries in the foreground.
CortexOS v4.0 represents the most comprehensive implementation of sovereign AI in a consumer application, combining:
Your mind belongs to you. Your software should respect that.
Security researchers are welcome to review our approach. Responsible disclosure: info@cortexos.app