[2026-02-07] Feature: Secure API Access & Cost-Protection Quotas
Task Metadata
Date: 2026-02-07
Status: Complete
- Beads Issue: 2512_genai_food_tracking-cs3
Objective
Goal: Implement Personal Access Tokens (PATs) for programmatic API access and establish a robust "User Quota" system to prevent AWS bill shock and malicious usage.
- Trigger: User request to enable CLI/Script access to food logging, combined with a strategic need to protect serverless infrastructure costs (AppSync/DynamoDB).
- Constraints:
- Must support existing Cognito Auth for Web.
- Must support
ck_live_tokens for CLI. - Must maintain low latency for Web users.
Technical Strategy
We will implement a hybrid authentication model using an AWS Lambda Authorizer (Python) in AppSync. This allows us to intercept every request, verify PATs, and enforce quotas before expensive resolvers are executed.
- Key Decisions:
- Syntax Guard: The Lambda Authorizer will validate token format (
ck_live_prefix + length) BEFORE querying DynamoDB to prevent DB cost exhaustion from random junk traffic. - Authorizer Caching (30s): Balances security with performance, reducing DynamoDB hits for quota checks.
- GSI1 Partitioning: A new Global Secondary Index on the token hash enables instant lookups without a full table scan.
- Weighted Quotas: Distinct limits for "Reads" (5,000/day) vs "Writes" (500/day) to match the AppSync cost model.
- DynamoDB Usage Tracker: Atomic counters in a
USAGE#YYYY-MM-DDitem with TTL for auto-cleanup. - Budget Alarm: Set an AWS Budget Alarm ($5.00) as a final circuit breaker against large-scale DDoS or run-away costs.
- Syntax Guard: The Lambda Authorizer will validate token format (
Testing Strategy
- Must Test:
- Token verification logic (SHA-256 hash matching).
- Quota enforcement (rejecting requests after limit exceeded).
- Identity parity (ensure
subfrom PAT matchessubfrom Cognito in resolvers). - Skip: CSS styling for the API settings tab until logic is verified.
Risk Analysis
- Unauthenticated Attack Vector: Malicious users could spam random tokens to trigger Lambda execution and AppSync costs. (Mitigation: Syntax Guard + Budget Alarm).
- Potential Regressions: Incorrect identity mapping could break existing resolvers if they strictly expect Cognito claims.
- Security Implications: Storing raw tokens is strictly forbidden; only SHA-256 hashes are stored.
- Files to Modify:
template.yaml(DDB GSI, Lambda Authorizer, AppSync Config)template-test.yaml(Sync infra changes for dev environment)graphql/schema.graphql(PAT mutations, Usage queries)functions/api_authorizer/app.py(New Authorizer logic)frontend/src/components/SettingsModal.tsx(UI Integration)frontend/src/features/settings/ApiAccess.tsx(New Management UI)
1. Infrastructure: GSI & Authorizer (template.yaml)
We add GSI1 to DynamoDB for looking up users by PATHASH#... and attach the Lambda Authorizer to AppSync with 30s caching.
# DynamoDB Table
GlobalSecondaryIndexes:
- IndexName: GSI1
KeySchema:
- AttributeName: GSI1PK # PATHASH#<hash>
KeyType: HASH
- AttributeName: GSI1SK # METADATA
KeyType: RANGE
Projection:
ProjectionType: INCLUDE
NonKeyAttributes:
- scopes
- lastUsedAt
# CRITICAL: Lock GSI to Free Tier
ProvisionedThroughput:
ReadCapacityUnits: 5
WriteCapacityUnits: 5
# AppSync API
Auth:
Type: API_KEY
AdditionalAuthProviders:
- Cognito:
UserPoolId: !Ref UserPool
- Lambda:
FunctionArn: !GetAtt ApiAuthorizerFunction.Arn
AuthorizerResultTtlInSeconds: 30
IdentityValidationExpression: ^ck_live_[a-f0-9]{64}$
2. Security: Lambda Authorizer Logic (functions/api_authorizer/app.py)
The authorizer enforces the syntax guard, hashes the token, and performs a Read-Only Quota Check. It injects identity context (including the isPat flag) for downstream resolvers.
3. Backend: Token Management (Split Files & Pipeline Resolvers)
To ensure compatibility with sam validate and follow the project's "one file per function" pattern, patResolvers.js was split into listPATs.js, getUsage.js, and revokePAT.js. All are implemented as Pipeline Resolvers using the standard APPSYNC_JS runtime.
4. Backend: Reliable Quota Tracking (appsync/trackUsage.js)
A shared pipeline function that runs on every request but uses Probabilistic Sampling to reduce database write costs by 95-98%.
export function request(ctx) {
if (ctx.identity.isPat !== "true") return {
payload: {}
};
const isMutation = ctx.info.parentTypeName === 'Mutation';
// Strategy: Sample 1/20 (5%) for Writes, 1/50 (2%) for Reads
const threshold = isMutation ? 0.05 : 0.02;
const increment = isMutation ? 20 : 50;
// The "Indie-Scale" Guard: Skip DB write 95-98% of the time
if (Math.random() > threshold) {
return {
payload: {}
};
}
const counterField = isMutation ? 'writes' : 'reads';
const sub = ctx.identity.sub;
const pk = `USER#${sub}`;
const sk = `USAGE#${util.time.nowISO8601().split('T')[0]}`;
const expiry = Math.floor(util.time.nowEpochMilli() / 1000) + (3 * 86400);
return {
operation: 'UpdateItem',
key: util.dynamodb.toMapValues({
PK: pk,
SK: sk
}),
update: {
expression: `ADD ${counterField} :inc SET #ttl = :expiry`,
expressionNames: {
'#ttl': 'ttl'
},
expressionValues: util.dynamodb.toMapValues({
':inc': increment,
':expiry': expiry
})
}
};
}
5. Security: Scope Enforcement (appsync/trackUsage.js)
The shared pipeline function now verifies that the token has the necessary permissions (read, write, or *) before allowing the operation to proceed.
6. Schema: Dual Authorization (graphql/schema.graphql)
Allowing both User Pools (Web) and Lambda (CLI) access.
7. Frontend: Usage Meter (frontend/src/features/settings/ApiAccess.tsx)
Visualizing the quota usage (Reads and Writes).
8. API Examples: Fetch and Add Meals (CLI)
Minimal curl examples using the AppSync GraphQL endpoint. Authorization can be a Cognito ID token or a PAT (ck_live_...) with matching scopes.
# Required env vars
export APPSYNC_URL="https://77tzkmeswfcf3mxham6cx3pvk4.appsync-api.ap-southeast-2.amazonaws.com/graphql"
export AUTH_TOKEN="ck_live_your_token_here"
# Query: Get meals in a UTC range (requires read scope)
curl -sS "$APPSYNC_URL" \
-H "Content-Type: application/json" \
-H "Authorization: $AUTH_TOKEN" \
-d '{
"query":"query GetMeals($from: String!, $to: String!) { getMeals(from: $from, to: $to) { date count meals { sk createdAt mealSummary calories protein carbs fat notes userDate } } }",
"variables":{
"from":"2026-02-10T00:00:00.000Z",
"to":"2026-02-11T00:00:00.000Z"
}
}'
# Mutation: Add meal (requires write scope)
curl -sS "$APPSYNC_URL" \
-H "Content-Type: application/json" \
-H "Authorization: $AUTH_TOKEN" \
-d '{
"query":"mutation AddMeal($mealSummary: String!, $calories: Float!, $protein: Float!, $carbs: Float!, $fat: Float!, $notes: String!, $userDate: String!, $createdAt: String) { addMeal(mealSummary: $mealSummary, calories: $calories, protein: $protein, carbs: $carbs, fat: $fat, notes: $notes, userDate: $userDate, createdAt: $createdAt) { message item { sk createdAt mealSummary calories protein carbs fat userDate } } }",
"variables":{
"mealSummary":"Chicken salad",
"calories":420,
"protein":38,
"carbs":14,
"fat":22,
"notes":"API test insert",
"userDate":"2026-02-10",
"createdAt":"2026-02-10T12:30:00.000Z"
}
}'
Critique & Gaps
- Latency: Cold starts on the Python Authorizer could add 500ms+ to initial API requests. (Mitigation: AppSync Authorizer Caching).
- Identity Parity: Cognito provides
claimswhile Lambda Authorizer providescontext. Resolvers must be updated to handle both. - Unauthenticated Load: Syntax guard helps, but doesn't eliminate AppSync/Lambda invocation costs. (Mitigation: Budget Alarm).
1. Last Used Visibility (Conditional Updates)
To solve the "inactive token" problem without doubling write costs, the Authorizer now checks lastUsedAt and only updates it if it differs from the current date (once per day per active token).
2. UX Protection: 'Flash' Token Display
The UI enforces a "Copy it now" workflow with a persistent warning banner, ensuring users don't lose their raw token (which is never stored).
3. Timing Attack Mitigation
Although DynamoDB handles the lookup, we added secrets.compare_digest() for a constant-time comparison of the hash suffix as a defense-in-depth measure.
4. Empty State Handling (Zero-Usage Default)
The resolver logic for getUsage was hardened to handle null DynamoDB results (e.g., first-time users). It now guarantees a return object with zeroed values ({ reads: 0, writes: 0 }) instead of propagating nulls to the UI.
Execution Plan
- Step 1: Infrastructure & Authorizer
- Add GSI1 to
template.yaml. - Implement Python Lambda Authorizer with Syntax Guard and
ck_live_hashing. - Enable Authorizer Caching (30s) in AppSync.
- Track AWS Budget Alarm ($5.00 limit) as follow-up issue
2512_genai_food_tracking-v3u. - Update
schema.graphqlto enable@aws_lambdaaccess for core operations and@aws_cognito_user_poolsfor token management.
- Add GSI1 to
- Step 2: Backend Token Management
- Implement
createPAT,revokePAT, andlistPATsmutations. - Implement Quota tracking logic (atomic increments in DDB for Reads and Writes).
- Implement
- Step 3: Resolver Hardening
- Update critical resolvers to handle both Cognito and Lambda identities.
- Enforce "Heavy Operation" limits (e.g., Imports).
- Step 4: Frontend & Documentation
- Add API Access management to Settings.
- Document the API and Quotas.
Execution Notes
Infrastructure & Structural Hardening
- SAM Template Validation: Fixed
AdditionalAuthProvidersstructure and movedType: AWS_LAMBDAto the correct level to resolvesam validateerrors. - Auth Provider Wiring Correction (2026-02-10): Updated both
template-test.yamlandtemplate.yamlto use SAM GraphQL auth shape (Auth.Additional+LambdaAuthorizer.AuthorizerUri). This ensures CloudFormation actually provisionsAdditionalAuthenticationProvidersfor Lambda auth. deploy:testBuild Unblock (2026-02-10):sam build -t template-test.yamlwas stalling inNodejsNpmEsbuildBuilder:NpmInstallfor Node Lambdas. To make deployments reliable, test-stack Node functions were switched toBuildMethod: makefileintemplate-test.yaml, with function-localMakefilebuilds forImportGuestHistoryFunctionandManagePatsFunctionusingesbuild --external:@aws-sdk/*.- Result:
task deploy:testcompleted successfully end-to-end (build + CloudFormation stack update) forminimal-calorie-tracker-test.
- Result:
- Frontend Usage Contract Alignment (2026-02-10): Local dev surfaced
FieldUndefinederrors because the frontendgetUsagequery still requested legacy fields (imports,genAi) that are no longer inUsageStats.- Fix: Updated frontend GraphQL query/types/service mapping to only use
date,reads,writes, and aligned PAT creation defaults to['read', 'write']to match resolver scope enforcement.
- Fix: Updated frontend GraphQL query/types/service mapping to only use
- Infinite Spinner Debug & Stabilization (2026-02-10): After auth, dashboard could spin indefinitely with no visible error.
- Fix: Added explicit settings error UI + retry in
Dashboard, exposed/logged settings query errors inuseSettings, and added 15s GraphQL timeouts in API meal/settings service calls to prevent silent hangs. - Backend hardening: Updated
appsync/trackUsage.jsno-op paths toruntime.earlyReturn({})(instead of returning payload objects) and redeployed test stack. - Result: The local dev spinner issue is resolved for now and failures now surface actionable error text.
- Fix: Added explicit settings error UI + retry in
- Cloudflare Pages Build Isolation (2026-02-10):
root_dir=docsfailed because Pages disallows output directories outside the root (../site). Implemented a dedicated wrapper config atcf_pages/mkdocs.yml(docs_dir: ../docs,site_dir: site) and updated Pages build settings toroot_dir=cf_pages,destination_dir=site, build commandmkdocs build -f mkdocs.yml.- Result: Pages no longer needs repo-root context, so backend
/functionsis not scanned by the docs build path and MkDocs output stays inside the configured root.
- Result: Pages no longer needs repo-root context, so backend
- Docs Build Warning Cleanup (2026-02-10): Fixed broken internal docs links and restored legacy target docs for historical references (
003_component_granularity,001_guest_mode_architecture) so MkDocs builds without link warnings. - Pipeline Resolver Migration: Converted all PAT-related resolvers (
listPATs,getUsage,revokePAT) from unit resolvers to Pipeline Resolvers. This ensures compatibility with SAM's validation engine and follows the project's "one file per function" pattern. - GSI Projection Optimization: Configured
GSI1withProjectionType: INCLUDEforscopesandlastUsedAt.- Reasoning: While
ALLis future-proof,INCLUDEis significantly more cost-effective as it prevents DynamoDB from duplicating unnecessary attributes (likenameorcreatedAt) into the index, while still allowing the Authorizer to function without base-table lookups.
- Reasoning: While
- Known Blocker (GSI1 Required for PAT Auth): PAT lookups depend on
GSI1. On environments where the backing DynamoDB table has noGSI1, PAT requests fail authorization because token hash lookup cannot run.- Observed failure: DynamoDB returns
ValidationException: The table does not have the specified index: GSI1.
- Observed failure: DynamoDB returns
- Promotion Plan: Promote the updated
template.yamlto production to ensureGSI1and Lambda additional auth are active on the managed table/API.- Project state (resolved): Production deploy completed and PAT-authenticated
getMealsconfirmed working via AppSync.
- Project state (resolved): Production deploy completed and PAT-authenticated
- IAM Policy Expansion: Upgraded Authorizer permissions to allow
dynamodb:UpdateItemfor dailylastUsedAtupdates and atomic quota increments. - PAT Auth Unblock in Test Stack (2026-02-11): PAT requests were still failing with
UnauthorizedExceptioneven though the token hash existed inGSI1and direct authorizer invocation returned allow decisions.- Root cause:
functions/api_authorizer/app.pyreturned API Gateway-style authorizer output (policyDocument), but AppSync Lambda Auth expectsisAuthorizedandresolverContext. - Fix 1: Updated authorizer response contract to AppSync-native format (
isAuthorized,resolverContext.sub,resolverContext.username,resolverContext.isPat,resolverContext.scopes). - Fix 2: Updated resolver identity extraction to support both Cognito and Lambda-auth contexts (
identity.sub,identity.claims.sub,identity.username,identity.resolverContext.sub,identity.resolverContext.username) across:appsync/addMeal.js,appsync/deleteMeal_step1.js,appsync/deleteMeal_step2.js,appsync/getMeals.js,appsync/getUserTargets.js,appsync/updateUserTargets.js,appsync/listPATs.js,appsync/getUsage.js,appsync/revokePAT.js. - Fix 3: Hardened
appsync/trackUsage.jsto read PAT flags/scopes fromresolverContext, then fixed an AppSync JS runtime bug (replace('[')regex interpretation) by switching scope sanitization tosplit().join(). - Verification: Ran
./scripts/evaluate_appsync.sh(all resolvers pass),task deploy:test(successful stack update), and live PAT query from/home/ben/.chatkcal_dev_env:getMealsreturnedcount: 4witherrors: null. - Working API call (validated):
set -a; source /home/ben/.chatkcal_dev_env; set +a; curl -sS -X POST "$API_ENDPOINT" -H "Content-Type: application/json" -H "Authorization: $API_KEY" --data '{"query":"query GetMeals($from: String!, $to: String!) { getMeals(from: $from, to: $to) { date count meals { sk createdAt mealSummary calories protein carbs fat userDate } } }","variables":{"from":"2026-02-10T00:00:00.000Z","to":"2026-02-10T23:59:59.999Z"}}' | jq '{data, errors}' - Create meal API call (example, not executed in this session):
set -a; source /home/ben/.chatkcal_dev_env; set +a; curl -sS -X POST "$API_ENDPOINT" -H "Content-Type: application/json" -H "Authorization: $API_KEY" --data '{"query":"mutation AddMeal($mealSummary: String!, $calories: Float!, $protein: Float!, $carbs: Float!, $fat: Float!, $notes: String!, $userDate: String!, $createdAt: String) { addMeal(mealSummary: $mealSummary, calories: $calories, protein: $protein, carbs: $carbs, fat: $fat, notes: $notes, userDate: $userDate, createdAt: $createdAt) { message item { sk createdAt mealSummary calories protein carbs fat userDate } } }","variables":{"mealSummary":"PAT API test meal","calories":420,"protein":35,"carbs":28,"fat":18,"notes":"Created via PAT example command","userDate":"2026-02-10","createdAt":"2026-02-10T12:00:00.000Z"}}' | jq '{data, errors}' - Observed response:
data.getMeals.count = 4anderrors = null.
- Root cause:
- AppSync Resolver Refactor (2026-02-11): Centralized duplicated user identity resolution logic into a shared Pipeline Function.
- Change: Created
appsync/resolveUserContext.jsto extractuserId,userPK,isPat, andscopesfrom any auth context (Cognito or PAT) and store them inctx.stash. - Cleanup: Removed ~150 lines of redundant
getUserIdhelper functions across 10 resolver files. - Security: Hardened scope parsing to handle CSV, JSON-string, and Array formats safely without using regex literals (unsupported in AppSync JS).
- Infrastructure: Updated
template.yamlandtemplate-test.yamlto ensureResolveUserContextFuncis the first step in every Query/Mutation pipeline.
- Change: Created
- Testing Uplift (2026-02-11): Added 26 unit tests for the API Access feature.
- Coverage: Verified
ApiAccessService(Amplify mocks),useApiAccesshook (optimistic state), andApiAccesscomponent (UI/Clipboard). - Verification: All 26 frontend tests and all 30 backend resolver tests passed (
bunx vitest run).
- Coverage: Verified
- Environment Fix: Fixed a
rehash: command not founderror in the persistent environment script by switchingfnm envto--shell bashmode. - DynamoDB Provisioned Capacity: Reverted to PROVISIONED billing mode with 5 WCU and 5 RCU for the table and GSI. This ensures the project stays within the AWS Free Tier (which covers 25 provisioned units but not on-demand requests).
- Language Choice (Python): The Authorizer was implemented in Python 3.13 rather than Node.js.
- Rationale: Since Authorizers are in the critical path of every API request, Cold Start Latency and Execution Overhead are the primary drivers.
- Comparison:
- Cold Starts: Python 3.13 typically provides faster initialization than Node.js (V8) for lightweight functions, reducing the impact of infrequent requests.
- Package Complexity: Node.js implementations in this project rely on
esbuildbundling and the@aws-sdk, which increases deployment package size and complexity. Python uses the built-inboto3and a single script file, keeping the "overhead per request" as low as possible. - Security API: Python's
hashlibandsecretsprovide concise, readable one-liners for SHA-256 and constant-time comparisons (compare_digest). The Node.jscryptoequivalent is more verbose and requires careful handling ofBuffertypes to ensure timing-safe behavior. - Consistency: Python is already established in the project for identity-related logic (
PreSignUp), reducing the mental overhead for security audits.
Quota Tracking Architecture Refinement
- Move to Resolver Pipeline: A critical refinement was made to move Quota Tracking from the Lambda Authorizer to the AppSync Resolver Pipeline (via a shared
TrackUsageFunc). - Reasoning:
- Accuracy: The Authorizer only runs once per TTL (30s), allowing users to bypass quotas by bursting traffic. Resolvers run on every request, ensuring 100% accurate counting.
- Precision: Resolvers know exactly if an operation is a
QueryorMutationviactx.info.parentTypeName, eliminating the need for brittle string-matching heuristics in the Authorizer. - Latency: Using native AppSync JS for tracking avoids Python cold starts for quota increments.
Indie-Scale API Architecture Rationale
This combination protects the project from unexpected costs while maintaining a professional experience.
1. The Chosen Architecture
- Authentication: Python Lambda Authorizer with 30s Caching and a Syntax Guard (
ck_live_). - Storage: DynamoDB configured with Provisioned Capacity (5 WCU / 5 RCU).
- Quota Tracking:
- Writes (Mutations): Tracked via 1/20 (5%) Probabilistic Sampling (write once, increment by 20).
- Reads (Queries): Tracked via 1/50 (2%) Probabilistic Sampling (write once, increment by 50).
- UI: Usage displayed as percentage-based meters, masking the coarse-grained sampling updates.
2. Why This is the Winning Design
- Maximizing the AWS Free Tier: AWS offers 25 WCU/RCU for free indefinitely, but only for provisioned tables. On-demand (Pay-Per-Request) mode does not offer free request units. By using Provisioned mode, we ensure the project remains truly free for low-to-medium volumes.
- The Physical Circuit Breaker: By explicitly setting DynamoDB to 5 WCU, we create a hard physical limit. If a script goes rogue, AWS throttles the requests rather than scaling costs. It is the ultimate safety net against "bill shock."
- Deep Cost Reduction (Sampling):
- Reads: 98% reduction (1/50 sampling).
- Writes: 95% reduction (1/20 sampling).
- This allows us to support ~5,000 daily reads and ~500 mutations with virtually zero impact on the 25 WCU Free Tier, as we only perform ~125 database writes per day for tracking.
- The "Syntax Guard" Shield: The Lambda Authorizer rejects malformed tokens before they hit the database, saving Read Capacity Units for real users.
- Developer-Centric UX: Displaying usage as percentages turns a technical trade-off (sampling steps) into a clean, professional UI.
3. Alternatives Dismissed
| Alternative | Why We Killed It |
|---|---|
| Redis / ElastiCache | Too Expensive. Requires ~$30/month instance and complex VPC networking. Overkill. |
| SQS + Worker Lambda | Over-Engineered. Adds infra maintenance and potential "stuck queue" failure points. |
| Real-Time DynamoDB | Too Costly. Writing to DB on every Read quintuples the cost of a Query. |
| AWS WAF | Not Granular Enough. Cannot handle per-user quota logic without expensive custom rules. |
Deviation from Plan
The AWS Budget Alarm was split into a follow-up hardening task to keep feature delivery unblocked:
2512_genai_food_tracking-v3u (Security: Add AWS Budget alarm for PAT/API quota protection).
Core PAT/quota capability is complete and validated.
User Approval & Key Learnings
Key Learnings
- SAM Validation & Pipeline Resolvers:
sam validateis strict aboutDataSourceusage in unit resolvers. Converting to Pipeline Resolvers is the most reliable way to handle multi-step logic and satisfy the validation engine. - Caching vs. Quotas: Relying on a Lambda Authorizer for quota increments is unsafe if AppSync Caching is enabled (30s TTL). Always increment in the Resolver to ensure 100% accuracy, while using the Authorizer only for the "Check" and "Identity" steps.
- GSI Efficiency: Avoid
ProjectionType: ALLfor large tables. UsingINCLUDEfor specific metadata (likescopesandlastUsedAt) balances latency with storage/write costs. - Language Choice: Python 3.13 remains the "gold standard" for lightweight Lambda Authorizers due to superior cold-start performance and a lean security standard library (
secrets,hashlib). - MkDocs Tab Syntax: Admonitions and code blocks inside MkDocs Material tabs are extremely sensitive to indentation. A single extra level of indent can break the layout.
(User to confirm approval and add notes/learnings)
Context Memory (AI-Only)
Summary for Future Context
Implementing PAT-based API access via a Python Lambda Authorizer with built-in daily quotas (500 writes / 5000 reads). Includes DynamoDB GSI1 for token lookups, 30s authorizer caching, and a "Syntax Guard" + Budget Alarm for bill protection.