[2026-01-18] Guest Mode Backend Implementation

Task Metadata

Date: 2026-01-18
Status: Complete
Related Task: 2512_genai_food_tracking-7sc
:file_document: Related ADR: ADR 006: Guest Mode Architecture

Context & Goal Approach Impact Aggregation Integrity Senior Critique

Status: UNBLOCKED

Dependency Complete: Tech Task: Implement Shared Zod Schema (View Log).

Objective

Goal: Implement the backend support for "Guest Mode" promotion.

We need a mechanism to promote local guest data (stored in IndexedDB) to the cloud (DynamoDB) when a user creates an account. This requires an atomic, idempotent batch operation that respects the "Trust the Guest" conflict resolution policy.

Trigger: Feature execution for "Guest Mode" (Phase 2 Roadmap).
Constraints:
- Idempotency: The operation must be safe to retry if the network fails.
- Atomicity: Partial failures should be minimized or handled gracefully.
- Performance: Must handle ~1 month of history (approx. 100-150 items) without timing out.

Technical Strategy

We will implement a new GraphQL Mutation importGuestHistory backed by a Node.js (TypeScript) Lambda.

Why Lambda instead of AppSync JS?

Network Looping Restriction: AppSync JS resolvers are "single-shot" — they can only send a single request to a data source per resolver invocation. Since DynamoDB has hard limits (e.g., 25 items for BatchWriteItem), a bulk import of 100+ meals requires multiple sequential network calls. A Lambda is necessary to loop through these chunks.
Aggregation Logic: Accurately updating DaySummary counts requires diffing new vs. existing data (using BatchGetItem followed by conditional logic) to avoid double-counting.

Why Node.js over Python?

Shared Validation Logic: We use Zod for schema validation in the frontend. By using Node.js in the Lambda, we can share the exact same Zod schema to validate the incoming mealsJSON payload, ensuring strict parity between client and server validation rules without code duplication.

Build Strategy (JIT Vendoring):

Because SAM builds in an isolated staging environment, we cannot use relative paths to root workspaces.
We use Just-In-Time (JIT) Vendoring via Taskfile.yml to physically copy @chatkcal/shared into the function directory before building.

Testing Strategy

Unit Tests: Vitest for the Lambda logic (mocking aws-sdk).
Integration Tests: Local invocation via SAM or deployment integration tests.

Algorithm:

Parse & Validate: Accept mealsJSON string. Validate via Zod.
Idempotency Check: Use BatchGetItem to identify which meals already exist in DynamoDB.
Smart Aggregation:
- New Meal: Add full calories to DaySummary.
- Existing Meal (Conflict): Calculate delta (New - Old) and update DaySummary.
Persistence:
- BatchWriteItem for Meals (Trust Guest = Overwrite).
- TransactWriteItems for DaySummaries (Atomic increments).

Proposed Schema:

type Mutation {
  # JSON string avoids complex input types for batching
  importGuestHistory(mealsJSON: String!, settingsJSON: String): ImportResult
}

type ImportResult {
  success: Boolean!
  processedMeals: Int!
  message: String
}

Risk Analysis

Double Counting: If the diff logic is flawed, we might corrupt the user's daily totals.
Mitigation: The logic prioritizes "Read-Compute-Write" over blind increments.
Drift Risk: Minimized because "Edit Meal" is not yet supported.

Files to Modify:
- graphql/schema.graphql (New mutation)
- functions/import_guest_history/app.ts (New Lambda)
- template.yaml (Infrastructure)
- Taskfile.yml (Add JIT Vendoring command)

Does Overwriting Meals Corrupt DaySummaries?

Concern: If we overwrite a meal that already exists on the server, do we risk double-counting or drifting the DaySummary aggregation?

Analysis:

Creation Safety (Disjoint IDs):
- addMeal (Server) generates IDs using util.autoId() (server-side UUID).
- Guest Mode generates IDs using uuid.v4() (client-side UUID).
- Result: A newly created server meal will never collide with a guest meal. They are distinct entities. The import process will treat the server meal as "existing" and the guest meal as "new", summing both correctly.
Re-Import Safety (Idempotency):
- If the user imports the same guest history twice (e.g., retry), the second run detects the collision.
- Logic: Delta = New_Guest - Old_Server.
- If New == Old, Delta = 0. Summary is unchanged. Safe.
The "Drift" Risk (Race Condition):
- Scenario: User edits a historical meal on the server at the exact moment they import it again.
- Sequence: Import Reads Meal A (500) -> User Updates Meal A to 600 -> Import Writes Meal A (500) & Delta (0).
- Result: Meal A reverts to 500. Summary stays at 600. Drift.
- Mitigation:
  - Current Design: We do not currently support "Edit Meal" (only Add/Delete). This effectively eliminates the race condition for now.
  - Future Proofing: If we add editing, we would likely implement it as Delete + Add (copy-on-write) or require optimistic locking.

Critique & Gaps

The "Smart Merge" Complexity Trap: The logic to "calculate delta" is fragile. Mitigation: We are proceeding with the delta logic to ensure idempotency.
Partial Failures in Mixed Operations: We must handle UnprocessedItems in BatchWriteItem.
The "Shared Code" Deployment Nightmare:
UPDATE: We have resolved this using JIT Vendoring. The Taskfile will physically bundle the code into the artifact context before sam build.
Cold Starts: Acceptable for one-time import.

Gap Analysis

Gap: No explicit error handling strategy defined for UnprocessedItems in BatchWriteItem.
Gap: Build mechanism now defined: JIT Vendoring.

Suggestions to Address Critique

Race Condition: Accept the risk (see Aggregation Integrity tab).
Partial Failures: The Lambda MUST implement a retry loop for UnprocessedItems.
Code Sharing: Use JIT Vendoring (copying root shared to function subdirectory) to bypass SAM sandbox limitations.

Execution Plan

Stop: User Approval Required

Do not proceed with execution until the user has explicitly approved the Approach and Execution Plan above.

Step 1: Define schema in graphql/schema.graphql.
Step 2: Update Taskfile.yml to vendor @chatkcal/shared for the new function.
Step 3: Scaffold Node.js (TypeScript) Lambda in functions/import_guest_history/.
- Requirements: DynamoDBCrudPolicy, TABLE_NAME env var, @chatkcal/shared (vendored).
Step 4: Implement core logic (Parsing, BatchGet, Diff, Write).
Step 5: Unit Testing: Write Vitest specs for the Lambda handler.
Step 6: Define infrastructure in template.yaml (Function + DataSource + Resolver).
Verify: Deploy and test with awscurl or AppSync Console.
- Result: Successfully deployed to minimal-calorie-tracker-test. Schema verification confirmed mutation presence. Unit tests passing for chunking and retries.

Execution Notes

Protocol Validation: The initial drift towards Python (due to its data processing strengths) was corrected by reviewing the architecture docs, which mandated Zod compliance. This highlights the critical value of the Plan-First Protocol: writing the plan exposed the conflict between "Python for Data" and "Zod for Schema" before a single line of code was written, saving hours of refactoring.
Dependency Management: Encountered incompatibility with aws-sdk-client-mock and zod v4 (alpha). Downgraded to zod v3.22.4 (stable) in both shared library and lambda to resolve.
JIT Vendoring: Successfully implemented vendor:shared task in Taskfile to copy local shared package into lambda build context.
Testing: Implemented comprehensive unit tests covering deduplication, delta calculation, and validation.
Build Config: Excluded @aws-sdk/* from esbuild bundling in template.yaml as these are provided by the Node.js 20.x Lambda runtime.
Shared Infrastructure Constraint: It was noted during verification that minimal-calorie-tracker-test points to the Production DynamoDB table (MinimalCalorieMeals). This means we cannot simply "wipe" the database for tests.
- Future Testing Design: Validation of importGuestHistory against "Prod" must use a dedicated Synthetic Test User (specific sub ID) or ephemeral users created during the test run. We must ensure we never accidentally target a real user's sub during automated integration tests.

Walkthrough: AppSync + Lambda Integration

This is how we connected the GraphQL API to the Lambda function in template.yaml:

Define the Lambda: We created a AWS::Serverless::Function resource (ImportGuestHistoryFunction) pointing to our code. We explicitly passed the TABLE_NAME as an environment variable so the Lambda knows where to write.
Define the Data Source: In the CalorieTrackerApi (AppSync) resource, we added a new entry under DataSources:
```
Lambda:
  ImportGuestHistorySource:
    FunctionArn: !GetAtt ImportGuestHistoryFunction.Arn
```
This tells AppSync: "There is a data source named ImportGuestHistorySource which is actually this Lambda function."
Define the Function (AppSync Runtime): We registered the ImportGuestHistoryFunc under Functions, linking it to the Data Source:
```
ImportGuestHistoryFunc:
  Runtime:
    Name: APPSYNC_JS
    Version: 1.0.0
  DataSource: ImportGuestHistorySource
  CodeUri: appsync/importGuestHistory.js
```
This step is subtle but important. It tells AppSync to use the JS file (importGuestHistory.js) to prepare the request for the Lambda.

Connect Resolver to Schema: Finally, we wired the importGuestHistory mutation field to run this function:

Mutation:
  importGuestHistory:
    Runtime:
      Name: APPSYNC_JS
      Version: 1.0.0
    Pipeline:
      - ImportGuestHistoryFunc

Senior Code Review

Strengths

Correct Aggregation Logic: The decision to use TransactWriteItems with atomic increment logic (#val = if_not_exists(#val, :zero) + :delta) is excellent. It correctly handles the "First Write" vs. "Subsequent Update" problem without race conditions.
Defensive Batching: The chunk helper correctly respects DynamoDB's limits (100 for BatchGet, 25 for BatchWrite and TransactWrite).
Schema Reuse: Vendoring the @chatkcal/shared library ensures that server-side validation exactly matches the client-side Zod schemas, eliminating a common class of "works on my machine" bugs.
Robust Idempotency: The "Read-Compute-Diff-Write" pattern ensures that re-running the import (e.g., after a network timeout) does not double-count calories. This is critical for mobile networks.

Critical Weakness: Partial Failure Handling

The current implementation of TransactWriteItems is "All or Nothing" per chunk (25 items).

Scenario: If one summary update fails (e.g., contention or throughput exceeded), the entire batch of 25 days fails.
Impact: The BatchWriteItem for the meals might have succeeded (since they are separate calls), leaving the system in an inconsistent state (Meals exist, but Day Summary is zero).
Recommendation: While acceptable for a prototype, a production-grade importer should likely use a queue or a SAGA pattern to ensure eventual consistency, or wrap the Meal Write + Summary Update in a single transaction (though this hits the 25-item limit instantly). Given the low volume of "Guest Imports", the current approach is an acceptable trade-off for simplicity.

Nitpicks & Observability

Error Logging: The console.error in the validation block is good, but adding structured logging (e.g., AWS CloudWatch Embedded Metrics) would help track how often imports fail due to schema mismatch.
Memory Management: Loading all guest meals into memory is fine for 100-200 items, but if a user has 5 years of guest history (unlikely but possible), the Lambda might OOM. A streaming approach or pagination would be safer for massive imports.

Cost Effectiveness Analysis

This architecture is highly optimized for the AWS Free Tier:

Lambda:
Execution: This function runs once per user lifetime (at account creation).
Duration: Processing 30 days of history takes \<500ms.
Cost: Virtually zero. Even with 10k new users/month, it fits comfortably within the 400,000 GB-seconds free tier.
DynamoDB:
Batch Writes: Using BatchWriteItem (25 items/call) is significantly cheaper than individual PutItem calls because it reduces network overhead and connection negotiation.
Read-Before-Write: While BatchGetItem incurs read costs, it saves write costs (which are 5x more expensive) by preventing unnecessary writes for unchanged meals.
CloudWatch Logs:
Volume: We only log console.log("Starting import...") and validation errors.
Cost: Negligible. 5GB/month free ingestion covers millions of these lines. - Recommendation: Do not increase log verbosity (e.g. logging every meal payload) in production, as that could inadvertently spike ingestion costs during a mass-onboarding event. The current "Errors + Milestones" logging strategy is the sweet spot.

=== " Senior QA Review"

  !!! success "Coverage"

      The unit tests in `app.test.ts` provide excellent coverage for the "Happy Path" and the "Conflict Path":

      - **Core Logic:** Verifies that a new meal results in a `BatchWriteItem` + `TransactWriteItems` (Summary +500).
      - **Idempotency:** Verifies that re-importing an existing meal (with changed calories) correctly calculates the delta (+100) and zero increment on meal count.
      - **Validation:** Verifies that malformed input is rejected before hitting DynamoDB.

      !!! failure "Testing Gaps (Resolved)"

          1. **Partial Batches (Chunking):**

             - **Resolved:** We added a test case `should chunk write requests` that processes 30 items, verifying `BatchWriteItem` is called twice (25 + 5).

          2. **Unprocessed Items (Retry Logic):**

             - **Resolved:** We added a test case `should retry unprocessed items` that mocks a DynamoDB throttle response (`UnprocessedItems`), verifying the lambda correctly loops and retries.

      !!! tip "QA Recommendation"

          For the Integration phase (`task test:backend` or manual deployment), specifically try importing a payload of **50+ meals** to verify the chunking logic works in the real AWS environment.

  === ":material-language-javascript: Senior JS Code Quality Review"

      !!! success "Modern Practices"

          - **Type Safety:** The use of `zod` for runtime validation combined with TypeScript ensures that `guestMeals` is strictly typed. There are no `any` casts in the core logic.
          - **Functional Style:** The use of `.map()` for transformations and `Array.from` for chunking is clean and idiomatic.
          - **Defensive Copying:** The destructuring `{ originalSk, ...dbItem } = meal` is a clean way to remove internal keys before persistence.

      !!! warning "Performance & Style Nitpicks"

          1. **Map vs Object:**

             - Using `new Map()` for `existingItemsMap` and `daySummaries` is excellent for performance (O(1) lookups) and prevents prototype pollution attacks compared to plain objects.

          2. **Sequential Await in Loops:**

             - The code uses `for (const batch of writeChunks) { await ... }`.
             - **Critique:** This processes batches serially (Sequence: Write Batch 1 -> Wait -> Write Batch 2).
             - **Optimization:** We *could* use `Promise.all(writeChunks.map(...))` to fire all 6-7 batches in parallel.
             - **Counter-Argument:** Parallel writes might trigger DynamoDB `ProvisionedThroughputExceededException` more easily. Serial execution acts as a natural throttle. Given this is a background import task, **Serial is the safer choice** for stability over raw speed.

          3. **Explicit Type Ignore:**
             \- `// @ts-ignore - Vendored dependency`
             \- **Critique:** While ugly, this is a pragmatic solution to the "JIT Vendoring" architecture where the file doesn't exist at dev-time but does at build-time. It's an acceptable tradeoff for the shared schema architecture.

             ```
             !!! success "Readability & Maintainability"

                 - **Visual Structure:** The code is distinctly separated into 4 logical phases ("Parse", "Dedupe", "Compute", "Execute") with clear comments (e.g., `// 1. Parse & Validate`). This makes the "Story" of the function obvious to any new maintainer.
                 - **Complexity Management:** The decision to break out the `DaySummary` update logic into a separate mapping phase (`summaryUpdates = ...`) rather than nesting it inside the main loop keeps the Cyclomatic Complexity low.
                 - **Variable Naming:** Names like `mealsWithKeys`, `existingItemsMap`, and `writeRequests` are explicit and self-documenting.
                 - **Verdict:** **High.** The code reads linearly and avoids "Clever" one-liners in favor of explicit, debuggable steps.
             ```

             ## :material-account-check-outline: User Approval & Key Learnings!!! success "Key Learnings"

(List items here)

(User to confirm approval and add notes/learnings)