unisono/specs/004-afraid-to-ask/research.md

# Research Findings: Afraid to Ask

**Feature Branch**: `004-afraid-to-ask` | **Date**: 2025-10-13 | **Plan**: [../plan.md](D:\Coding\unisono\specs\004-afraid-to-ask\plan.md)

## LLM Service for Semantic Comparison

**Decision**: Utilize Google Cloud Natural Language API for semantic similarity comparison.

**Rationale**: Google Cloud Natural Language API offers robust text analysis capabilities, including entity extraction, sentiment analysis, and content classification, which can be leveraged for semantic similarity. It provides a well-documented API and client libraries for Node.js, aligning with the project's backend technology stack. This choice balances flexibility, performance, and ease of integration.

**Alternatives Considered**:
-   **OpenAI Embeddings**: While powerful, it might introduce additional cost and dependency on a third-party service that is not Google Cloud. Also, the project might already have a Google Cloud dependency.
-   **Self-hosted LLM (e.g., using Hugging Face models)**: Offers maximum control and privacy, but significantly increases operational complexity, resource requirements, and development effort for deployment and maintenance. Not suitable for initial implementation given the project's current scope.

## Secure Storage Solution for Sensitive User Ideas

**Decision**: Implement ephemeral server-side storage for encrypted "Afraid to ask" ideas and their compliance metadata, purging all related data upon session termination.

**Rationale**: To meet the strict privacy requirement that ideas are "not available in any way to other users, even via developer tools," and to ensure data is purged upon session termination, a server-side in-memory or session-bound storage solution is necessary. Encrypted "Afraid to ask" ideas will be sent to the backend via WebSocket for semantic comparison and then stored ephemerally (e.g., in a session store or in-memory cache) tied to the *common session ID*. This ensures the raw sensitive data is never persistently stored on the server, and all related data is removed when the session ends. Access to these ideas will be strictly controlled via authenticated WebSocket messages, and the encryption key for client-side encryption will be ephemeral and derived from the session, never persistently stored or transmitted.

**Alternatives Considered**:
-   **Persistent Database (e.g., PostgreSQL)**: Rejected because the requirement is to purge data upon session termination, making persistent storage unsuitable for the sensitive raw content or its encrypted form.
-   **Client-side only storage**: Rejected due to the impossibility of guaranteeing privacy from developer tools and performing semantic comparison without exposing raw data client-side.