Multi-Platform Identity Stitching

The fragmented self

Consider a person who listens to Deafheaven on Spotify, plays Hollow Knight on Steam, rates barrel-aged stouts on Untappd, has a Goodreads shelf full of Jeff VanderMeer, and contributes to Rust projects on GitHub. These platforms know fragments of this person. None of them know the whole picture.

Traditional dating apps ignore this entirely. They ask you to describe yourself in a text box. Affinity Atlas takes the opposite approach: it reads your actual behaviour across platforms and builds a compatibility profile from what you genuinely engage with - not what you claim to like.

But this creates a hard technical problem. How do you take data from a dozen different services - each with its own API, its own data format, its own authentication model, and its own concept of what a "user" is - and stitch it into a single, coherent profile?

In the tech industry, this is called identity resolution: the process of identifying individual users across digital touchpoints to construct a single, precise profile. But the way most companies do it is fundamentally different from how Affinity Atlas approaches it.

How the industry does it (and why that is creepy)

Identity resolution is a multi-billion-pound industry. Companies like LiveRamp, AWS Entity Resolution, and dozens of ad-tech firms specialise in connecting user data across platforms. The techniques fall into two broad categories:

Deterministic matching

Deterministic matching uses exact identifiers to link records. If the same email address appears in two databases, the records are matched. It is precise but requires a shared identifier - which users often do not knowingly provide.

In ad-tech, this means tracking cookies, device fingerprints, and email address harvesting. When you sign up for a service with your Gmail address, that address becomes a key that connects your behaviour across every other service where you used the same email - often without your awareness or consent.

Probabilistic matching

Probabilistic matching uses statistical inference to link records without a shared identifier. If two records share enough similar attributes (same city, similar browsing patterns, same device type, similar time zones), the system infers they are probably the same person. This is "fuzzy" matching - less precise but much broader.

Entity resolution systems combine both approaches, building identity graphs that connect records across databases by analysing patterns and relationships in the data, not just exact matches of individual fields.

⚠️ The critical difference: The ad-tech industry performs identity resolution without user knowledge or consent. Users do not choose to be tracked across platforms. They do not see the identity graph that has been built about them. They cannot inspect it, correct it, or delete it. The entire system is designed to be invisible. Affinity Atlas rejects this model entirely.

Affinity Atlas uses deterministic matching only, and exclusively through a mechanism the user controls: OAuth-based platform linking.

There is no probabilistic matching. No email harvesting. No device fingerprinting. No background tracking. The identity graph is built entirely from explicit user actions: "I want to connect my Spotify account. I want to connect my Steam account. I want to connect my Untappd account."

🔗 The Identity Linking Flow

User initiates connection

The user taps "Connect Spotify" in their Affinity Atlas settings. No platform is ever connected by default.

OAuth handshake

The user is redirected to Spotify's own authentication page. They log in with their Spotify credentials (which Affinity Atlas never sees) and grant a scoped token - typically read-only access to top artists, top tracks, and recently played.

Platform ID stored

Affinity Atlas receives a platform-specific user ID (e.g., a Spotify user ID) and the scoped access token. This ID becomes a node in the user's identity graph, linking their Affinity Atlas profile to their Spotify account.

Data pull and derivation

The system pulls the permitted data, processes it into derived signals (genre affinities, niche weights, engagement levels), and discards the raw data. Only the derived signals are stored permanently.

Profile enrichment

The derived signals are merged into the user's unified compatibility profile. If they have also connected Steam and Untappd, the music signals join gaming signals and beer signals to form a multi-dimensional taste fingerprint.

This is a fundamentally different model from how ad-tech companies build identity graphs. The key differences:

User-initiated. Every link is created by an explicit user action. No background resolution.
Visible. The user can see their full identity graph in their settings - every connected platform, every derived signal, every link.
Revocable. Disconnect a platform and all its data, derived signals, and links are deleted immediately.
No cross-referencing. Affinity Atlas does not try to find your Steam account by matching your Spotify email. Each platform is connected independently through its own OAuth flow.

The API landscape: what each platform gives you

Not all APIs are created equal. Each platform Affinity Atlas integrates with has different capabilities, rate limits, data formats, and authentication models. Here is what the landscape actually looks like:

🎵 Spotify

OAuth 2.0 PKCE

Top artists, top tracks, recently played, saved library. Rich metadata including genres, popularity scores, audio features. Rate limit: ~180 req/min.

🎮 Steam

Steam Web API Key

Owned games, playtime per game, achievement data. No genre data from API (requires supplementary IGDB/SteamSpy lookup). Rate limit: ~100k req/day.

🍺 Untappd

OAuth 2.0

Check-ins, ratings, beer/brewery details, style preferences. Rich taste data but lower rate limits: ~100 req/hr for authenticated users.

📚 Goodreads

RSS/Scraping*

Shelved books, ratings, review text. Official API was retired in 2020. Current approach uses RSS feeds or user-uploaded exports. Limited but workable.

💻 GitHub

OAuth 2.0

Public repos, starred repos, languages, contribution patterns. Rich metadata. Rate limit: 5,000 req/hr authenticated.

🏃 Strava

OAuth 2.0

Activity types, frequency, gear. No location data pulled (privacy). Rate limit: 200 req/15min, 2,000/day.

🎬 Letterboxd

RSS/CSV Export

Watched films, ratings, lists. No official API (as of 2026). CSV export or RSS feed parsing. Limited real-time sync.

🎨 Last.fm

API Key

Scrobble history, top artists/tracks/albums with play counts. Extremely rich listening data. Rate limit: 5 req/sec.

Each integration is a distinct engineering challenge. The Spotify API is well-documented, rate-limited but generous, and provides rich structured data. The Goodreads API no longer exists. Letterboxd has no API at all. Steam's API gives you playtime but not genre information, requiring a secondary lookup against a games database.

This fragmentation is the core technical challenge of identity stitching. The data does not come in a standard format. There is no universal schema for "what does this person like?" Every platform answers that question differently.

Data normalisation across domains

Once data is pulled from each platform, it needs to be normalised into a common format that the matching system can work with. This is where the engineering gets genuinely difficult.

The universal signal model

Affinity Atlas normalises all platform data into a common structure:

{
  "signal": {
    "entity_id": "spotify:artist:4Z8W4fKeB5YxbusRsdQVPb",
    "entity_name": "Radiohead",
    "domain": "music",
    "source": "spotify",
    "engagement_level": 0.82,      // 0-1 normalised
    "popularity": 78,               // platform-native scale
    "niche_weight": 1.15,           // derived from popularity
    "confidence": 0.91,             // how reliable this signal is
    "last_observed": "2026-03-28T14:22:00Z",
    "tags": ["alternative", "art-rock", "electronic"]
  }
}

Every interest, regardless of source platform, gets reduced to this structure. A Spotify artist, a Steam game, an Untappd beer, a Goodreads book - they all become signals with engagement levels, popularity scores, niche weights, and confidence values.

The normalisation challenges

Engagement level. What does "engagement" mean across platforms? On Spotify, it is derived from play count, recency, and whether the artist appears in your top artists list. On Steam, it is playtime relative to the game's median. On Untappd, it is number of check-ins for that beer or brewery, combined with rating. On GitHub, it is contribution frequency and repo starring patterns. Each platform requires a custom normalisation function that maps platform-specific metrics to a 0-1 scale.

Popularity. Spotify provides artist popularity on a 0-100 scale. Steam has no equivalent - you need external data from sources like SteamDB or IGDB. Untappd has check-in counts and global ratings but no normalised popularity metric. The system must derive comparable popularity metrics from each platform's available data.

Taxonomy. Music genres, game genres, book genres, and beer styles are all different taxonomies with different granularities. "Indie" means something in music, something different in gaming, and something else again in publishing. The system maintains domain-specific taxonomies rather than forcing a universal one - music:indie-rock is not the same as games:indie.

Temporal dynamics. Your Spotify top artists update frequently. Your Steam library changes slowly. Your Goodreads shelf may not update for months. The confidence score accounts for data freshness - a signal from a recently synced platform carries more weight than one from a stale sync six months ago.

🎯 The design principle: Normalisation should preserve meaning, not erase it. A 0.82 engagement level should mean roughly the same depth of interest whether it comes from Spotify listening patterns or Steam playtime. Getting this calibration right is one of the hardest problems in the entire system.

The identity graph

Once all platforms are connected and data is normalised, the user's identity graph looks something like this:

User: aa_user_7291
├── spotify:user:abc123 (connected 2026-01-15)
│   ├── 312 artist signals (music domain)
│   ├── 89 genre affinities
│   └── niche_score: 0.72 (more niche than 72% of users)
├── steam:user:76561198012345 (connected 2026-01-16)
│   ├── 47 game signals (gaming domain)
│   ├── 12 genre affinities
│   └── niche_score: 0.68
├── untappd:user:SeanOMahoney (connected 2026-02-01)
│   ├── 156 beer signals (beer domain)
│   ├── 23 style affinities
│   └── niche_score: 0.81
├── github:user:sean-o (connected 2026-02-03)
│   ├── 28 repo signals (code domain)
│   ├── 8 language affinities
│   └── niche_score: 0.55
└── Unified Profile
    ├── total_signals: 543
    ├── domains_active: 4/8
    ├── overall_niche: 0.71
    └── profile_confidence: HIGH

This graph is the foundation for matching. When two users are compared, the system finds overlapping signals across all shared domains, weights them by niche score, and produces the Affinity Score that appears on the match card.

Critically, the graph only contains derived signals. If this data were exposed in a breach, an attacker would learn that user aa_user_7291 has 312 artist taste signals with a niche score of 0.72. They would not learn which specific tracks were played, when, or how often. The raw data never enters the graph.

Cross-domain insights

The identity graph enables something no single-platform system can do: cross-domain pattern recognition.

A user who listens to post-metal on Spotify, plays Dark Souls on Steam, and reads Cormac McCarthy on Goodreads has a cross-domain pattern that suggests a preference for challenging, atmospheric, emotionally intense experiences. This is not something any individual platform can infer - it only becomes visible when the data is stitched together.

The matching system uses these cross-domain patterns as an additional compatibility signal. Two users who share this "intensity" pattern across three different domains are likely more compatible than two users who just happen to like the same popular artist.

Edge cases and honest limitations

Identity stitching is not a solved problem. Here are the real limitations Affinity Atlas faces:

Shared accounts

If you share your Spotify account with a partner or housemate, the taste signals will be a blend of both people's preferences. The system has no reliable way to separate individual listening patterns from a shared account. The current mitigation is a "Shared account" toggle that applies a lower confidence score to signals from that platform, reducing their weight in matching.

Platform coverage gaps

Not every user has accounts on every platform. Some users will have rich data across five platforms; others might only connect Spotify. The system handles this through the confidence score - more connected platforms mean higher confidence in the overall match. A match between two users who have both connected four platforms is inherently more reliable than a match based on a single shared platform.

API deprecation

Goodreads killed its API in 2020. Letterboxd still does not have one. Platform APIs can change scope, introduce new restrictions, or disappear entirely. Each integration is a dependency on an external company's willingness to maintain an open API. The system is designed to degrade gracefully - if a platform API goes down or changes, existing derived signals remain valid, but new syncs for that platform are paused until the integration is updated.

Taste evolution

People change. The music you loved five years ago may not represent who you are now. The system handles this through recency weighting - recent signals carry more weight than older ones - and periodic re-syncing that refreshes the derived signals. But there is always a lag between your actual taste evolving and the system reflecting that change.

Cold start

A new user with no connected platforms has no signals. A user who connects one platform has signals in only one domain. The cold start problem is real and unavoidable. The system communicates this honestly: if your profile confidence is LOW, the match card tells you so, and suggests connecting additional platforms to improve match quality.

Gaming and manipulation

Could someone create fake Spotify listening history or inflate their Steam playtime to game the system? Technically, yes. Practically, the effort required is significant - you would need to genuinely listen to hundreds of hours of specific music or play specific games for hundreds of hours. The niche weighting system also makes this harder: you cannot just add popular things to your library, because popular things carry low niche weight. You would need to fake engagement with genuinely obscure content, which is a harder signal to manufacture.

🧩 The honest summary: Identity stitching in Affinity Atlas is consent-based, deterministic, and privacy-preserving. It is also imperfect, dependent on external APIs, and limited by the cold start problem. These trade-offs are the price of building a system that respects user autonomy. The alternative - ad-tech-style probabilistic tracking without consent - would be more comprehensive but fundamentally dishonest. We would rather have an honest, imperfect system than a comprehensive, creepy one.

See the stitched profile in action

The interactive demo shows how multi-platform data combines into a unified compatibility profile. Connect your interests and see the identity graph work.

Try the demo

The fragmented self

How the industry does it (and why that is creepy)

Deterministic matching

Probabilistic matching

Consent-first identity linking

The API landscape: what each platform gives you

Data normalisation across domains

The universal signal model

The normalisation challenges

The identity graph

Cross-domain insights

Edge cases and honest limitations

Shared accounts

Platform coverage gaps

API deprecation

Taste evolution

Cold start

Gaming and manipulation

Related posts

See the stitched profile in action