The popularity problem
Imagine two people both have Taylor Swift in their top artists on Spotify. Does that tell you much about their compatibility? Probably not. Taylor Swift is one of the most-streamed artists on the planet - over 100 million monthly listeners. Sharing her as a favourite is about as informative as both people saying they enjoy breathing.
Now imagine two people both have the same artist with a Spotify popularity index of 12 out of 100. An artist that roughly 97% of listeners have never heard of. Both people independently found, listened to, and loved this artist. That overlap is not a coincidence - it is a genuine signal of shared taste, curiosity, and identity.
This is the core intuition behind niche weighting: the rarer the shared interest, the stronger the compatibility signal.
It is also a well-documented problem in recommendation science. Researchers call it popularity bias - the tendency of recommendation systems to over-promote popular items and under-represent niche ones. A 2023 survey paper found that popularity bias is pervasive across nearly all collaborative filtering systems, noting that it "often leads to a limited value of the recommendations for consumers and providers" and creates "undesired reinforcement effects over time."
In dating apps, this bias is even more insidious. If two users both like popular things, a naive matching system treats that as evidence of compatibility. It is not. It is evidence of being alive in 2026. The real signal is buried deeper, in the long tail of interests that most platforms ignore entirely.
๐ก The key insight: Mainstream overlaps tell you two people exist in the same culture. Niche overlaps tell you two people share the same identity. Affinity Atlas is built to find the identity-level signals.
A concept borrowed from information science
The idea that rare things carry more information than common things is not new. It is one of the foundational principles of information retrieval, formalised in 1972 by Karen Spärck Jones as TF-IDF (Term Frequency - Inverse Document Frequency).
Here is how TF-IDF works in its original context: when searching a collection of documents, you want to know which words are meaningful. The word "the" appears in every document - it tells you nothing. The word "mitochondria" appears in only a handful - it tells you a lot. TF-IDF assigns higher importance to rare, distinctive terms and lower importance to common ones.
The formula is elegant:
importance = frequency_in_document × log(total_documents / documents_containing_term)
The rarer a term is across all documents, the more weight it carries when it does appear.
Spärck Jones's insight was transformative - it became the backbone of search engines, spam filters, and text classification systems. The same principle has been applied in recommendation systems, where researchers use TF-IDF-style weighting to identify which features of a user's behaviour are genuinely distinctive versus merely common.
Affinity Atlas adapts this principle to people and their passions. Instead of asking "how rare is this word in a collection of documents?", we ask: "how rare is this interest across all users?"
The answer determines how much that shared interest contributes to a compatibility score.
How niche weighting works in Affinity Atlas
Every interest in the Affinity Atlas system has a popularity metric associated with it. For music, this comes directly from Spotify's popularity index - a score from 0 to 100, updated regularly, reflecting recent streaming activity relative to all other tracks and artists on the platform. For gaming, it comes from Steam's concurrent player counts and ownership data. For beer, Untappd check-in volumes. For books, Goodreads rating counts. And so on.
Critically, Spotify's popularity index is logarithmic, not linear. An artist at 50 is not just a bit more popular than an artist at 40 - they might be ten times more popular. An artist at 70 compared to 50 is an even bigger gulf. This means even a few points can represent a massive difference in reach.
Affinity Atlas converts these raw popularity metrics into a niche weight using an inverse curve:
NicheWeight = 1 + k × (1 - popularity_normalised)n
Where popularity_normalised is 0-1, k is the maximum bonus multiplier, and n controls curve steepness.
In plain language:
- High popularity (close to 1.0) - the niche weight is close to
1.0. The overlap counts, but it is not exceptional. - Mid popularity (around 0.5) - the niche weight starts climbing. This is the "interesting" zone - well-known enough to have a community, niche enough to be meaningful.
- Low popularity (close to 0.0) - the niche weight approaches
1 + k. Sharing this interest is a strong signal. The overlap is amplified significantly in the compatibility calculation.
The exponent n controls how aggressively the curve ramps. A higher exponent means the weight only spikes for truly obscure interests. A lower exponent spreads the bonus more evenly. After testing, Affinity Atlas uses a value that creates a gentle curve with a steep tail - rewarding niche overlap without ignoring mid-range interests entirely.
How it fits into the overall score
Niche weight is one factor in the broader Affinity Score formula:
SignalScore = Commonality × NicheWeight × SignalWeight × Confidence
Each shared interest generates a signal. The overall Affinity Score is the weighted sum across all signals.
- Commonality - the raw overlap. Do both users share this interest? How strongly? (For music, this might factor in listening time, not just presence.)
- NicheWeight - the popularity-adjusted multiplier described above. Rare overlaps count more.
- SignalWeight - how much this category matters to each user. If someone has connected Spotify but not Steam, music signals are weighted more heavily for them.
- Confidence - how much data we have. A user with 500 Spotify artists gives a more reliable signal than someone with 5. This prevents thin data from producing overconfident scores.
The transparency promise means every factor is visible on a match card. Users can see exactly which shared interests contributed, what each one's niche weight was, and why the overall score landed where it did. No hidden variables. No opaque ranking.
Real examples across categories
Numbers are more intuitive with concrete examples. Here is how niche weighting plays out across different interest categories:
100M+ monthly listeners. Everyone knows Taylor Swift. Shared overlap is nice but tells you very little about unique compatibility.
Experimental neoclassical darkwave. If two people both independently love Lingua Ignota, they almost certainly share a deep affinity for intense, boundary-pushing art.
One of the best-selling games of all time. Shared ownership is essentially a given for any two gamers.
A beloved indie exploration game with a dedicated cult following. Sharing this signals curiosity, patience, and a love for discovery-driven storytelling.
A globally iconic stout. Shared enjoyment is practically universal among beer drinkers. Meaningful? Barely.
A highly regarded pale ale from a small Cheltenham brewery. Two people who both rate this are probably craft beer enthusiasts with overlapping taste in hazy pales.
One of the most-read books in history. Almost everyone who reads has encountered it. Not a distinguishing signal.
A critically acclaimed novel with a dedicated but modest readership. Shared love signals an appreciation for atmospheric, literary fantasy.
Notice the pattern. The niche weight does not penalise mainstream overlaps - Taylor Swift still counts as a shared interest. It just recognises that rare overlaps carry disproportionately more information about who you are and what you care about.
The long tail is where compatibility lives
This is not just an intuition. It is backed by decades of research in recommendation science and behavioural psychology.
The long tail distribution
In virtually every domain - music, books, games, films, food - user preferences follow a long-tail distribution, sometimes called a Pareto distribution. A small number of items are massively popular (the "head"), while the vast majority of items are consumed by relatively few people (the "tail"). Research consistently shows that approximately 20% of items receive 80% of all interactions, while the remaining 80% of items share just 20% of engagement.
But here is the critical part: that long tail is where individuality lives. A study from NYU's Stern School of Business demonstrated that the long tail of recommender systems contains significant value that traditional systems fail to capture, and that clustering tail items can substantially improve recommendation quality.
Popularity bias actively harms matching
Most recommendation systems - including dating app matching engines - suffer from what researchers call popularity bias. Popular items dominate recommendations because they have more data points, more interactions, and therefore stronger signals in collaborative filtering models. This creates a feedback loop: popular items get recommended more, which makes them even more popular, which makes them get recommended even more.
A 2025 paper from the ACM RecSys community introduced the concept of "power-niche users" - users who actively engage with niche content and whose preferences contain the richest compatibility signals. The researchers found that upweighting these niche interactions significantly improved recommendation diversity and accuracy, particularly for users whose tastes extend beyond mainstream preferences.
In dating terms, this means: the users who have the most distinctive taste profiles are precisely the ones being worst served by conventional matching. Their most meaningful compatibility signals - the obscure shared interests that would create the strongest connections - are being drowned out by mainstream noise.
Similarity research supports the approach
The psychological evidence supports prioritising distinctive shared traits. Research on assortative mating patterns has consistently shown that couples with stronger similarity on specific characteristics - not just broad personality traits - report higher relationship satisfaction. A meta-analysis of speed-dating studies found that compatibility effects (unique person-to-person fit) were among the strongest predictors of later romantic interest, often outweighing broad attractiveness judgments.
The more specific and distinctive the shared trait, the more predictive it is. Two people who both like "music" share nothing. Two people who both love the same artist with a dozen monthly listeners share something real.
Edge cases and safeguards
Niche weighting is powerful, but naive implementation creates problems. Here are the edge cases Affinity Atlas accounts for:
The "only listener" problem
What if two users share an artist with a popularity of 0? Perhaps it is a friend's bedroom project with 3 monthly listeners. The niche weight formula would assign maximum weight - but should it? The interest might be so obscure that the overlap is meaningless (they both know the same person) or accidental (the artist has almost no catalogue).
Safeguard: Affinity Atlas applies a minimum popularity floor. Below a certain threshold, the niche weight is capped. Ultra-obscure overlaps are still noted and shown on match cards, but they do not disproportionately dominate the score. The system also uses the Confidence factor - an artist with minimal listening data produces a lower-confidence signal regardless of niche weight.
The "one-hit wonder" problem
A user listened to an artist once, two years ago, and never returned. Does that count the same as an artist they play daily? It should not.
Safeguard: Commonality is not binary. For music, it factors in listening frequency, recency, and depth. A single play of a niche artist generates a weak commonality signal that, even when multiplied by a high niche weight, does not overwhelm genuine deep interests. The signal needs both rarity and engagement to be meaningful.
The "niche but irrelevant" problem
Two users both own an obscure Steam game that they each played for 12 minutes before refunding. Technically a shared niche interest. Practically meaningless.
Safeguard: For gaming, commonality incorporates playtime thresholds. A game must have been played for a meaningful duration to register as a signal. For beer, a check-in with no rating (just a venue check-in) carries less weight than a rated check-in. Each data category has domain-specific rules about what constitutes a real engagement signal versus noise.
The "category imbalance" problem
What if one user has connected six platforms and another has only connected Spotify? The first user has hundreds of potential signals; the second has a fraction. Without adjustment, the match score would be dominated by whichever category has the most data.
Safeguard: This is where SignalWeight comes in. The score is normalised by the number of active categories for each user. If someone has only connected Spotify, music signals carry more weight for that user - not less. The system adapts to the data it has, rather than penalising users for not connecting everything.
Why this matters for matching
Every dating app claims to have a good matching system. Most of them are doing some variant of collaborative filtering - "people who liked this also liked that" - often weighted heavily by photos, location, and broad demographic data. None of them, as far as we can determine, implement anything resembling niche weighting on real behavioural data.
The result is what you experience every day on mainstream apps: matches with people who share your broad cultural context but not your specific identity. You both like music and food and travel - because everyone does. The things that make you you - the specific artists, games, books, beers, and films that you have chosen to spend your time on - are invisible to the matching engine.
๐ฏ The Affinity Atlas thesis: The best compatibility signal is not what you have in common with everyone. It is what you have in common with almost nobody - and then finding the person who shares it.
Niche weighting is not the only thing that makes Affinity Atlas different. But it is the foundation. Everything else - the multi-platform data architecture, the transparent match cards, the privacy-first design - exists to make niche weighting possible, accurate, and trustworthy.
Because the best matches are not found by looking at what is popular. They are found by looking deeper.
See niche weighting in action
The interactive demo lets you explore how different interest overlaps affect compatibility scores. Connect mock profiles and see the niche weights for yourself.
Try the demo