Skip to content

NFT Platform — Implementation Plan

Companion to: Architecture v5 · 4 phases · 39 tasks


How to Read

Field Meaning
Effort S = 1–3d · M = 3–7d · L = 1–3w · XL = 3w+
Risk High / Medium / Low
Depends Task IDs that must be complete first

⚠ Do not start Phase 2 until all Phase 1 exit criteria pass. A broken dedup or idempotency mechanism produces phantom balances that are hard to detect and expensive to clean up.


Phase Summary

Phase Goal Est.
1 Working ingestion for 2 networks (Polygon + TON): dedup, idempotency, ownership, projections, API 8–12w
1.5 Independent service admin panel for internal operations 3–5d
2 Multi-network, metadata, scoring, ClickHouse, semantic search 8–12w
3 Kafka, full reorg suite, personalization, canary SLA 6–10w
4 Horizontal scaling — only where measured Open

Phase 1 — Foundation

Infrastructure · Schema · Polygon + TON adapters · Normalizer · State updater · Projections · API · Reorg handler

Infrastructure & Schema

P1-01 — Postgres schema baseline migration M Risk: Medium 7 core schemas: ref, ingest, ledger, catalog, market, projection, system. All constraints and indexes from arch §9. Additive migrations only. user_content, social, scoring introduced in Phase 2. Depends: —

P1-02 — ref tables seed data S Risk: Low Populate chains, networks, token_standards, marketplaces for target network. Depends: P1-01

P1-03 — Redis + MinIO + Docker Compose S Risk: Low Local dev + prod compose. Redis Streams is the Phase 1 event bus; Redis is also used for cache/locks. Depends: —

P1-04 — Event envelope schema & validation library S Risk: Medium Canonical RawEvent as Pydantic model + JSON Schema. Frozen interface between adapters and normalizer. Include schema_version. Must be stable from day one. Depends: —

Chain Adapter

P1-05 — ChainAdapter base class & contract tests M Risk: Medium Abstract interface: fetch_blocks, parse_raw_events. Contract tests (arch §16.5): schema validation, sub_index completeness, delta conservation, source_event_id stability, empty deltas for non-transfers. Every concrete adapter must pass all 5. Depends: P1-04

P1-06 — Polygon + TON adapters XL Risk: High Implement both adapters in Phase 1: Polygon (ERC-721/1155 + marketplace events) and TON (TEP-62 + GetGems). Cover: Transfer, TransferBatch (or TON equivalent batch semantics), Mint (from=zero), Burn (to=zero), Listing, Sale, Cancel. Record real mainnet tx fixtures. Both adapters must pass all contract tests. Address canonicalization per arch §3 — no universal lower(). Depends: P1-05

P1-07 — Block fetcher & ingest cursor M Risk: Medium fetch_blocks with retry, rate-limit, circuit breaker. Write cursor to system.sync_cursors after each successful batch. Never advance cursor past a failed block. Depends: P1-06

Normalizer

P1-08 — Normalizer worker L Risk: High Consume chain.raw_events. UPSERT ledger.normalized_event_keys on (source_event_id, sub_index) — handles reorg re-inclusion, not just dedup. Write normalized_events + normalized_event_deltas. Create catalog stubs via uuidv5. Publish to ledger.normalized.

Critical: same event twice → second write updates chain metadata fields, does not create a second row, does not change normalized_event_id.

Depends: P1-04, P1-01

State Updater

P1-09 — applied_events idempotency pattern M Risk: High Reusable library. INSERT INTO applied_events ON CONFLICT DO NOTHING before every state mutation. Write all 3 idempotency tests from arch §16.3 before marking done: single delivery, concurrent delivery, partial failure recovery. Depends: P1-01

P1-10 — State updater — ownership M Risk: High Consume ledger.normalized. Fetch all normalized_event_deltas per event. UPSERT ownership_current qty += delta. DELETE where qty <= 0. Update finality_status on finalization. Publish ledger.ownership_changed. Depends: P1-08, P1-09

P1-11 — State updater — listings & sales M Risk: Medium Handle listing, sale, cancel event_kinds. Update market.listings_current. Insert sales_history. Publish market.listing_changed and market.sale_recorded. Depends: P1-10

Projection Pipeline

P1-12 — Projection pipeline — ownership_view M Risk: Medium Upsert projection.ownership_view with finality_status (pending/confirmed/finalized). Use applied_events. Template for all subsequent projections — write it correctly once. Depends: P1-10

P1-13 — Projection pipeline — asset_cards & collection_stats M Risk: Medium projection.asset_cards and collection_stats (floor, volume, owner count). Handle is_stub=true gracefully — show stub data, not errors. Depends: P1-12

P1-14 — Projection pipeline — portfolio_assets & listing_cards M Risk: Low projection.portfolio_assets and listing_cards. Remaining projections for minimal functional API. Depends: P1-13

Core API

P1-15 — Core API skeleton & auth module M Risk: Low FastAPI modular structure. Auth: API keys only (JWT deferred to Phase 2). Enforce from day one: on-chain modules read only from projection.*. user_content and social read their own tables. Depends: P1-01

P1-16 — API endpoints: ownership, catalog, marketplace M Risk: Low GET /assets/{id}, /assets/{id}/owners, /collections/{id}, /collections/{id}/listings. All responses include finality_status. Stubs return is_stub: true. Non-existent collections return 404 with X-Data-Status: pending. Depends: P1-14, P1-15

Reliability & Observability

P1-17 — Outbox publisher M Risk: Medium Poll system.outbox_events WHERE published_at IS NULL ORDER BY created_at. Publish to Redis Streams. Mark published_at on success. After 5 failures → system.dlq. Depends: P1-01

P1-18 — Metrics & structured logging M Risk: Medium All metrics from arch §15 with alert thresholds. Structured log contract in all workers. 4 minimum dashboards. Alert rules for dlq_depth and outbox_unpublished_count at any non-zero value. Depends: P1-07, P1-10

P1-19 — Reorg handler L Risk: High Detect reorg in adapter. Emit ledger.reorg_detected. Mark normalized_event_keys.is_reverted=true scoped to affected asset_ids. Recompute ownership_current for affected asset_ids only. Write all 5 reorg scenarios from arch §16.2. Depends: P1-10

Phase 1 Exit Criteria

  • [ ] Contract tests pass for both adapters — all 5 types from arch §16.5 (10 tests total)
  • [ ] Idempotency tests pass: single delivery, concurrent delivery, partial failure recovery
  • [ ] Reorg scenarios 1–5 from arch §16.2 pass against real Postgres (no mocks)
  • [ ] End-to-end: chain event → API response with correct finality_status within SLA
  • [ ] ownership_view reconciliation: 0 mismatches for sample of 100 assets
  • [ ] Observability: pipeline lag dashboard visible, dlq_depth alert tested and firing
  • [ ] Code review: no direct SQL joins between on-chain domain modules in the API

Phase 2 — Multi-Network & Intelligence

Additional adapters · Metadata pipeline · Scoring · ClickHouse · Semantic search

P2-01 — Additional chain adapters (third+) L per adapter Risk: High Each must pass all contract tests before connecting to normalizer. Depends: P1 exit ✓

P2-02 — Metadata pipeline M Risk: Medium Consume asset_created. Fetch URI, compute content_hash, create version only on hash change. Handle: IPFS timeouts (30s / 3 retries), malformed JSON (log+skip+retry), oversized >5MB (store ref only). Depends: P1 exit ✓

P2-03 — Trait extraction & rarity scoring M Risk: Low Parse traits. Compute rarity_rank and rarity_pct per trait value within collection. Update projection.asset_cards. Depends: P2-02

P2-04 — ClickHouse setup & CDC pipeline L Risk: Medium Deploy ClickHouse. CDC (Debezium) or export job for price_snapshots + sales_history. Postgres retains 30-day hot partition. Workers never write to ClickHouse directly. Depends: P1 exit ✓

P2-05 — Scoring pipeline — asset scores M Risk: Medium composite_score = rarity + liquidity + demand. Versioned via scoring_runs. Use applied_events. Update asset_cards.composite_score and trending_assets. Depends: P2-03

P2-06 — Scoring pipeline — collector scores M Risk: Low portfolio_value_usd, diversity_score, activity_score per account. Update portfolio_assets.estimated_value_usd. Depends: P2-05

P2-07 — Text embeddings & pgvector search L Risk: Medium Deploy semantic-encoder. HNSW index on embedding WHERE is_current=true. Hybrid search in search_docs. Expose /search endpoint. Depends: P2-03

P2-08 — API: scores, trending, search M Risk: Low GET /assets/{id}/score, /collections/{id}/trending, /search?q=. Volume stats from ClickHouse. Depends: P2-05, P2-07

P2-09 — Reconciliation job M Risk: Low Hourly (staging) / daily (prod). Sample 1000 assets, compare projection vs ground-truth. Alert if mismatch_rate > 0.1%. Depends: P1 exit ✓

P2-10 — Metadata re-fetch scheduler M Risk: Low Periodic re-fetch. After N failures → mark metadata_unreachable, stop retrying, surface in API. Depends: P2-02

P2-11 — Visual embeddings M Risk: Low (parallelizable) Image embeddings for visual similarity. Separate model_version from text embeddings. Depends: P2-07

P2-12 — user_content module M Risk: Low Wallet linking, folder CRUD, compose layouts. Reads user_content.* directly. Depends: P1-15

P2-13 — social module M Risk: Low Comments (threaded), reactions, notifications. Depends: P2-12

P2-14 — Price oracle integration S Risk: Low Native token → USD rates. All price_usd fields include rate timestamp. Alert if rate > 5min stale. Depends: P1 exit ✓

Phase 2 Exit Criteria

  • [ ] All contract tests pass for each new adapter
  • [ ] stub_assets_total trends to zero within 10min for test batch of 1000 assets
  • [ ] ClickHouse: 90-day floor price history < 100ms for collection with 50k sales
  • [ ] Search: precision@10 > 0.7 on test query set
  • [ ] Reconciliation: 0 mismatches for 3 consecutive daily runs across all networks
  • [ ] Per-network lag metrics visible for all active networks

Phase 3 — Production Hardening

Kafka · Full reorg suite · Personalization · Canary SLA · Backfill

P3-01 — Kafka / Redpanda migration L Risk: Medium Envelope schema frozen for this — no app logic changes. Dual-write 24h. Validate message counts match before cutover. Depends: P2 exit ✓

P3-02 — Full reorg test suite L Risk: High All 5 scenarios from arch §16.2 for every supported network. Automated simulation in staging. Depends: P2 exit ✓

P3-03 — Personalization engine L Risk: Medium User preference vectors from ownership + viewing history + folders. Blend with rarity + liquidity in trending and search. Depends: P2-07, P2-06

P3-04 — Qdrant migration M Risk: Low (conditional) Only if p99 semantic search > 200ms. Backend swap via abstraction layer. Validate precision@10 unchanged. Depends: P3-01

P3-05 — Advanced collection analytics L Risk: Medium ClickHouse: wash trading detection, price manipulation signals, whale concentration. Depends: P2-04

P3-06 — Achievements system M Risk: Low achievements_catalog + user_achievements, triggered by events. Depends: P2-13

P3-07 — API rate limiting & quota M Risk: Low Per-key rate limiting via Redis. Graceful degradation: serve stale projection data under DB pressure. Depends: P1-15

P3-08 — SLA monitoring & canary M Risk: Medium Automate freshness SLA checks from arch §17.2. Canary transaction every 5min. Alert if misses SLA window. Depends: P2 exit ✓

P3-09 — Backfill tooling L Risk: Medium CLI to backfill from configured start block. Uses same normalizer/state-updater pipeline. Idempotent. Depends: P2 exit ✓

P3-10 — Staging environment parity M Risk: Low Staging mirrors prod schema + pipeline exactly. Nightly reconciliation. All reorg tests run in staging. Depends: P3-01

Phase 3 Exit Criteria

  • [ ] Kafka: 0 message loss during 72h dual-write. Consumer lag < 1000 msgs/partition steady state
  • [ ] Full reorg test suite passes for all supported networks
  • [ ] Canary SLA: 99.5% visible in API within window over 7 days
  • [ ] p99 API latency < 200ms under 2x peak load

Phase 4 — Scale

Only where measured. No speculative scaling.

⚠ Each task requires a profiler trace identifying the specific bottleneck.

P4-01 — Ingest horizontal scaling by network M Shard adapter if ingest_lag fires consistently. Normalizer handles out-of-order via UPSERT — no changes needed there. Trigger: measured ingest lag

P4-02 — Projection pipeline parallelization M Partition by asset_id hash if projection_lag fires consistently. applied_events prevents cross-partition conflicts. Trigger: measured projection lag

P4-03 — normalized_events partition count increase L Risk: Medium Double partition count (8 → 16) if single partition exceeds ~100M rows. Requires data migration + maintenance window. Trigger: measured query degradation

P4-04 — Hot-path service extraction XL Risk: Medium Last resort. Profile first, extract second. Trigger: measured resource contention on a specific module

P4-05 — Read replica routing M Risk: Medium Route projection.* reads to replicas. Writes stay on primary. Monitor replication lag. Trigger: primary DB CPU > 70% sustained

P4-06 — applied_events archival S Archive rows > 90d to cold table if applied_events > 100M rows and INSERT latency degrades. Trigger: measured INSERT latency degradation


Risk Register

# Risk Phase Impact Mitigation
1 Dedup bug silently drops events P1 Critical Contract test: same event twice → exactly 1 row. Must pass before production data flows.
2 applied_events missing → double balance on replay P1 Critical Code review gate on every ledger-mutating function. Consider linter rule.
3 Solana addresses corrupted by lower() P2 Critical Canonicalization unit tests per network family. Fixture tests with case-sensitive Solana addresses.
4 ERC-1155 TransferBatch: sub_index omitted → batch collapse P1 Critical Contract test: 1, 10, 100-item batches. Assert contiguous sub_index from "0".
5 Reorg re-inclusion: is_reverted stays true P1 High Reorg scenario 2 must pass before Phase 1 exit. UPSERT on normalized_event_keys updates is_reverted.
6 Projection diverges from ledger silently P1+ High Reconciliation job (P2-09) + canary SLA monitor (P3-08).
7 ClickHouse lag causes stale analytics P2 Medium Monitor CDC lag. Surface data_as_of timestamp in analytics responses.
8 Broken metadata URI → stuck stubs P2 Medium After N failures: mark metadata_unreachable, stop retrying, surface in API.
9 Kafka migration loses messages P3 High Dual-write 24h. Validate counts on both sides before cutover.
10 Premature Phase 4 scaling P4 Medium Hard rule: no task without a profiler trace.

Decision Log

Decision Chosen Rejected Rationale
API architecture Modular monolith Microservices Domains tightly coupled. Velocity > isolation at this stage.
Event bus — Phase 1 Redis Streams Kafka Lower ops overhead. Envelope frozen for future migration.
Event bus — Phase 3 Kafka / Redpanda Stay on Redis At-least-once guarantees, consumer groups, replay.
normalized_events partitioning HASH(network_id) Time partition Time partition breaks global unique PK in Postgres.
Dedup strategy Unpartitioned keys table UNIQUE on partitioned Postgres can't enforce cross-partition UNIQUE without partition key.
Worker idempotency applied_events (DB) In-memory / Redis DB transaction guarantees. Survives restart. Auditable.
Surrogate IDs uuidv5 from natural key Auto-increment Deterministic. Eliminates FK race conditions.
Address canonicalization Canonical string at ingestion lower() at query time lower() corrupts Solana addresses.
Embedding storage pgvector Qdrant Simpler ops. Abstraction allows migration if p99 > 200ms.
ClickHouse timing Phase 2 Phase 3 sales_history and price_snapshots outgrow Postgres faster.