Star Schema Done Right, Part 2: Dimension Tables and Why Cardinality Matters

← Previously: Star Schema Done Right, Part 1: What a Fact Table Actually Is (And Isn't) (4 Jun) → Next: Part 3: Slowly Changing Dimensions - The Bit Nobody Explains (18 Jun)

What this post covers: what a dimension table actually is, beyond "the lookup table on the other end of the relationship"; why the fact-to-dimension relationship has to be many-to-one and what the engine does with that promise; what goes wrong when you let Power BI talk you into many-to-many; and why bidirectional filtering is almost always the wrong answer to a question you should be asking differently.

A dimension is a controlled vocabulary, not a lookup

Part 1 ended with the move that defines the whole star schema: separate what happened from what describes the things that happened. The fact table got the first half. The dimensions get the second, and "lookup table" undersells what they do.

A lookup table answers "what's the label for key 1247". A dimension table does that too, but its real job is stricter: it defines the complete legal set of things that can exist. The customer dimension is the authoritative list of customers - every customer exactly once, one row each, with everything the business says about them. If a customer isn't in the dimension, they don't exist as far as the model is concerned. If they're in it twice, the model now has two opinions about who they are, and every number that touches them splits in half.

That's why the defining property of a dimension is uniqueness on its key. Not as a nice-to-have, as the entire contract. The fact table is allowed to mention customer 1247 fifty thousand times; the dimension must declare them exactly once. Controlled vocabulary: many usages, one definition.

Staying with the sales model from Part 1: FactSales holds the events, and around it sit DimCustomer, DimProduct, DimDate, each one the single authoritative statement of what exists on its axis. Every analytical question you'll ever ask - by region, by category, by month - is a question phrased in a dimension's vocabulary and answered from the fact's events.

Cardinality is a promise the engine holds you to

Connect FactSales to DimCustomer in Power BI and the relationship gets a cardinality: many-to-one, fact side many, dimension side one. It's tempting to read that as descriptive metadata, like a comment. It isn't. It's a promise, and the engine builds the model's entire behaviour on top of it.

Here's the mechanism, and it's the single most load-bearing sentence in this series: filters flow from the one side to the many side. When a user clicks "NSW" in a slicer, the filter lands on DimCustomer, reduces it to the customers in NSW, and flows down the relationship to FactSales, keeping only their sales rows. Every measure then aggregates what survived:

dax

Total Revenue = SUM ( FactSales[Revenue] )

No CALCULATE, no FILTER, no cleverness. The measure is one line because the model is doing the filtering, and the model can do it because the cardinality guarantees the route: any fact row resolves to exactly one customer, so "the customers in NSW" translates unambiguously to "their sales". The reason DAX feels effortless on a good model and impossible on a bad one is mostly this - on a good model, the relationships answer the filtering questions before the measure has to. (How filters and CALCULATE interact from the DAX side is the DAX Context series from April; this series is the model side of the same story.)

One row per entity on the one side is what makes that translation safe. Which brings us to what happens when it isn't true.

Many-to-many is almost always a data quality problem wearing a feature's clothes

Load a customer table with a duplicated key - one row per customer per year, say, or just a dirty extract - and relate it to the fact table. Power BI won't refuse. It will offer you many-to-many cardinality, the dialog will warn you mildly, and it will work, for visuals-on-a-page values of "work".

What you've actually done is dissolve the contract. "Filter to this customer" no longer names one row, so the engine downgrades the relationship to what the documentation calls a limited relationship, and the practical losses arrive immediately. RELATED stops working across it, so any calculated column reaching for a customer attribute breaks. Totals stop being trustworthy, because a fact row now matches multiple dimension rows and can be counted through each of them - the classic symptom is a table whose rows sum to more than its total, in front of your least favourite stakeholder.

The fix is almost never the cardinality setting. The fix is upstream, in Power Query: deduplicate, or decide which version of the customer is current, or recognise that "customer per year" is actually a slowly changing dimension - which is precisely Part 3's territory. Genuine many-to-many business relationships do exist (a sale belonging to two salespeople, a patient with many diagnoses), and the grown-up pattern for those is a bridge table, which keeps every individual relationship one-to-many and honest. What's almost never right is shrugging at duplicate keys and clicking through the warning. Power BI offering many-to-many is like a bartender offering you a sixth drink: technically a service, but they're not the one driving your model home.

A related symptom worth recognising: a blank member appearing in a slicer or a matrix row. That's the engine telling you the fact table contains keys the dimension doesn't define - orphaned events, a vocabulary with missing words. The model papers over it with a virtual blank row so the numbers still add up. Don't ignore it; it's referential integrity failing politely.

Bidirectional relationships: the setting that fixes one visual and breaks the model

The relationship dialog offers a cross-filter direction, and "Both" is sitting right there. The temptation is honest enough: you've got a products slicer, and you want it to show only products the filtered customers actually bought. Flip the direction to Both, the slicer tidies itself up, ship it.

Here's what you actually changed. Filters now flow up from the fact into the dimension, which means every dimension connected to that fact can now filter every other dimension, through the fact, in ways no one will ever ask for on purpose. With one fact table it's mostly a performance tax and some surprising slicer behaviour. Add a second fact table - and every real model grows a second fact table - and the filter paths multiply until the engine starts refusing to activate relationships because the routes have become ambiguous. You'll meet that as a relationship that mysteriously won't set to active, usually months later, with no obvious connection to the slicer you were tidying up in March.

The boring guidance survives contact with every model I've built: leave relationships single-direction, dimension filters fact. For the one visual that genuinely needs the reverse flow, do it in the measure, scoped to that calculation alone:

dax

Products Bought =
CALCULATE (
    DISTINCTCOUNT ( FactSales[ProductKey] ),
    CROSSFILTER ( FactSales[CustomerKey], DimCustomer[CustomerKey], BOTH )
)

Same effect, blast radius of one measure instead of the whole model. If a measure only returns the right answer with the relationship set to Both, treat that as the model telling you something is mis-shaped - usually a missing dimension or a bridge table trying to be born.

Dimension habits that keep DAX predictable

Four rules, all cheap at design time and expensive to retrofit.

One row per entity, enforced, not assumed. Deduplicate in Power Query as the last step before load, so a dirty source can't quietly break the contract. The day the key column gains a duplicate, you want a refresh error, not a many-to-many suggestion.

Hide the keys, expose the attributes. Key columns exist for relationships, not for reports. Hide them on both sides so nobody builds a visual on CustomerKey and asks why it's a number. The dimension's visible columns are the user's mental model of the data - curate them like it.

Dimensions go wide, facts go long. A new attribute about a customer is a new column on DimCustomer - never on the fact. Part 1's failure modes all started with descriptions living on events; the dimension is where descriptions belong, where they're stored once and changed once.

Every model gets a date dimension. Even when a "Dates" column on the fact feels sufficient. Time intelligence, fiscal calendars, and consistent filtering across multiple facts all hang off a proper conformed date table, and it's the dimension you'll share across every fact table you ever add.

Frequently Asked Questions

What's the difference between a dimension table and a lookup table?

Mechanically they sit in the same place: one side of a many-to-one relationship. The difference is the contract. A dimension is the complete, deduplicated, authoritative list of an entity, designed to be the vocabulary every report filters with. A lookup is often just a code-to-label mapping. Every dimension can serve as a lookup; very few casual lookups meet the bar of a dimension.

Is one-to-one cardinality ever the right answer?

Almost never in a star schema. If two tables relate one-to-one, they're the same table that got separated, and the usual fix is merging them in Power Query. The legitimate exceptions are mostly about security or column-count hygiene on very wide dimensions, and if you're there, you already know.

When is many-to-many actually legitimate?

When the business relationship is genuinely many-to-many - a sale credited to two reps, a project tagged to three departments - and the honest model for that is a bridge table sitting between the two dimensions, carrying the pairs, with ordinary one-to-many relationships on both sides. What's rarely legitimate is the cardinality setting as a shortcut past duplicate keys. The bridge pattern gets proper treatment when the series covers multiple fact tables.

Does filter direction matter for row-level security?

Yes, and more than people expect. RLS filters propagate through relationships the same way slicer filters do - one side to many side. A security filter on a dimension flows into the fact naturally. If your security model seems to need bidirectional filtering to reach the right tables, treat that as a design smell with sharper teeth than usual: there's a dedicated "apply security filter in both directions" setting, and needing it is usually the model asking to be reshaped.

What's a degenerate dimension?

An attribute that stays on the fact table because it belongs to the event itself and has no entity behind it - invoice numbers are the classic case. There's no DimInvoiceNumber to build, no attributes to hang off it, so it lives on the fact as a column. Fine, normal, not a violation. The test from Part 1 still governs: does this describe the event, or a thing attached to the event?

My visuals work fine on a flat table. Why am I doing any of this?

Because "works on this page" and "behaves predictably as the model grows" are different standards. The flat table works until the first ambiguous total, the first time-intelligence requirement, or the second fact table - and then every measure becomes a workaround for structure the model doesn't have. Part 1 covers those failure modes in full; this post is what the cure looks like on the dimension side.

Where the series goes from here

Part 3 (18 Jun): Slowly Changing Dimensions - The Bit Nobody Explains. The customer moved, the rep changed territory, the product was recategorised. Which reports should show the new truth, which should show what was true at the time, and how to model the difference without breaking everything in this post.
Part 4 (25 Jun): Migrating a Flat Table to a Star Schema in Power BI. The hands-on payoff: one flat table in, a fact and three dimensions out, relationships built, and the measures validated against the old numbers.

Part 3 lands 18 June.

Tags // Data Modeling Power BI DAX