Fact Table: Star Schema Core Concept Explained

What this post covers: the question that every star schema tutorial skips, which is what problem a fact table actually solves that a flat table doesn't. The mechanical answer (one row per event, foreign keys to dimensions, additive measures) makes no sense until you've felt the failure modes of the alternative. This post stays at the intuition level. The vocabulary follows in Part 2.

The diagram is the wrong place to start

Most introductions to star schema start with the picture. The star, the centre, the spokes. Five tables arranged around one, lines between them. It looks tidy. Then the tutorial points at the middle table, says "this is the fact table," and starts listing properties: contains the measures, grain is one row per event, foreign keys point outward, and so on.

This is the equivalent of being shown a circuit diagram and told "this is a transistor." It is true, and it does not help. The properties only make sense once you understand what the thing is for. And what a fact table is for is not visible in the diagram. It's visible in the problems that arise when you don't have one.

So this post starts with the question that every other star schema tutorial skips. Why is a flat table not good enough? What does separating data into a fact table and several dimension tables actually buy you? The answer is more interesting than "normalisation" or "performance," both of which are true and neither of which is the real reason.

What a flat table is, and where it stops being enough

A flat table is the shape you'd produce if a colleague asked for "all the sales data in one place." One row per sale, every attribute the business cares about as a column on that row. Customer name, customer city, product category, product price, sales rep name, sales rep manager, region, channel, date, year, quarter. Twenty columns. Five thousand rows. Comprehensive. Readable. You can open it in Excel and immediately see what's there.

The flat table is also where most analytics work begins, and for a long time, it's enough. You can group by region and sum the revenue. You can filter to a date range and average the order value. You can pivot it. You can put it in a Power BI report and the visuals work. The flat table answers questions.

The flat table also has three failure modes that become impossible to ignore once you cross a certain size or a certain rate of change. Understanding those failure modes is the entire argument for the star schema.

Failure mode one: the same attribute changes in different places

Suppose a customer's city changes. They moved from Sydney to Melbourne. In the flat table, the city is on every row. To update it, you have to find every row for that customer and rewrite the city column.

This is not just inconvenient. It is the source of an entire category of bug. If the update misses any rows (because the join went stale, or the export ran before the update, or because the customer is in your CRM under two slightly different spellings of their name), some of the rows say Sydney and some say Melbourne. The data is no longer internally consistent. A report that filters by city will return wrong numbers, and the wrong numbers will look right, because nothing about them is missing or null. They are simply wrong by a degree that depends on which rows happened to get updated and which didn't.

The flat table makes attribute updates dangerous, in proportion to how many rows the attribute appears on. The danger compounds with table size.

Failure mode two: the data tells the truth about today, not about then

A customer's city did change. Should the report show their old sales as belonging to Sydney (where they were when the sale happened) or Melbourne (where they are now)?

In a flat table, you don't get to choose. The city column holds whatever value is in it right now. If you update it in place, all historical sales now appear to have come from Melbourne, even the ones from three years ago when the customer was in Sydney. If you don't update it, the column is wrong for any new sales after the move.

There is no correct answer in the flat structure. The shape of the table forces a single point-in-time view of attributes, and analytical questions almost always want the historical view: which city were they at the time of the sale, not which city are they today.

This is the problem that Slowly Changing Dimensions exists to solve, and Slowly Changing Dimensions only makes sense as a concept once you've split your data into a fact table and a dimension table that can carry historical versions of an attribute independently of the events that reference them. Part 3 covers the SCD types in detail. They are unintelligible until you've felt this failure mode.

Failure mode three: the same attribute lives in twelve places at twelve costs

A customer's city is a single fact about that customer. In a flat sales table with five thousand rows for that customer, the city is stored five thousand times. The storage cost is repetition. The performance cost is that every filter on city has to scan a column that contains thousands of redundant copies of the same value.

This is the argument tutorials usually lead with, and it is the least interesting of the three. Modern columnar engines like the Power BI Vertipaq engine compress repeated values very efficiently. The storage and performance argument, while real, is not what justifies the star schema for most projects. The first two arguments are.

The reason redundancy still matters is editorial. A flat table tells you the same fact dozens of times, and any user reading it has to figure out which dimensions are stable facts about an entity versus which dimensions are properties of the event. A star schema makes that distinction structural. The dimension table says "this is the truth about the customer." The fact table says "this is the truth about what happened." A user who reads the schema can see which is which before they read a single row of data.

So what is a fact table

A fact table is the part of the model that records what happened. One row per event. The event is the smallest unit of activity the business needs to analyse. A sale. An invoice line. A page view. A clock-in.

Two things go in a fact table, and only two things.

The first is the measures. The numeric quantities that the business adds up, averages, counts, or computes against. Revenue. Units. Hours. The numbers that show up in the report's summary cards.

The second is the foreign keys. The pointers to the dimensions that describe the event. Which customer was involved. Which product. Which date. Which sales rep. The fact table does not store the customer's name or city. It stores a reference to the customer dimension, and the customer dimension holds the name and city.

That's it. A fact table is not the place where you describe the entities involved in the event. It is the place where you record that the event happened, how much it was worth, and which entities were attached to it. The description of those entities lives elsewhere, by design, because the description of the entity can change without the event changing.

dax

// Conceptually, a fact table row looks like:
// 
// SaleDate    CustomerKey  ProductKey  Quantity  UnitPrice  Revenue
// 2026-05-29  1247         88          3         42.50      127.50
//
// What it deliberately does NOT contain:
//
// CustomerName, CustomerCity, ProductName, ProductCategory, ...

The foreign keys are integer pointers, not the human-readable values. That separation is what gives the model its flexibility. The customer can move, the product can be renamed, the category hierarchy can be restructured, and every historical sale still points at exactly the customer and product it originally pointed at. Their attributes change in the dimension; the reference in the fact does not.

What a fact table isn't

It is not a table that contains everything you might want in a report. The temptation is constant. "I just need the customer's email on this row so the report can show it." Resist. The customer's email belongs in the customer dimension. Putting it on the fact table reintroduces failure mode two: if the customer changes their email tomorrow, every historical sale row now says the new email.

It is not the table that holds dimensional attributes that happen to be numeric. The customer's age is not a measure. It is a property of the customer, which lives in the customer dimension. The number of staff at a company is a property of the company, not a measure of a sale to that company. The test is: does this number describe the event, or describe one of the things attached to the event? If the latter, it goes in a dimension.

It is not the table you should denormalise "for performance." The Vertipaq engine in Power BI is built for star schemas. Denormalising a fact table into a flat table makes Vertipaq slower on most workloads, not faster. The instinct from SQL Server, where denormalisation was sometimes a real performance win, doesn't transfer.

It is not the table that exists in only one form. A model can have many fact tables: sales, returns, inventory snapshots, time entries, and so on. They share dimensions but each one records its own kind of event. Part 4 covers how multiple fact tables coexist in a single model.

The intuition before the vocabulary

The intuition for the whole star schema, in one sentence: separate the things that happened from the things that describe the things that happened, so that each can change independently without lying about the other.

That's the move. Everything else, the dimension tables, the relationship cardinality, the SCD types, the role-playing dimensions, the bridge tables, the snowflakes, is consequence of this one separation. Once the separation feels obvious, the rest of the model is a working-out of the implications.

The mechanical rules ("one row per event," "foreign keys to dimensions," "measures are additive across all dimensions") are not arbitrary. Each one falls out of the requirement to keep the events and the descriptions decoupled. A fact table row records an event. The descriptions are referenced, not embedded. Measures are additive because aggregating events across any combination of dimensions has to produce a meaningful answer.

Frequently Asked Questions

Why is it called a fact table?

The word "fact" here is technical, not colloquial. In dimensional modelling terminology (popularised by Kimball in the 1990s), a "fact" is a measurement of a business event. The fact table is the table that records those measurements. The naming is unfortunate because "fact" in everyday English means "a true statement," which is not what's meant here. If the terminology bothers you, "event table" or "measurement table" both convey the idea more cleanly, but the field uses "fact table" and the terminology is too entrenched to swap out now.

Can a fact table have zero measures?

Yes. These are called factless fact tables, and they record that an event happened without measuring anything about it. A common example is attendance: the row says "this student attended this class on this date" with no numeric quantity attached. Factless fact tables are useful for counting events (how many attendances happened) and for relating dimensions that don't have a natural transactional link.

What's the difference between a fact table and a transaction table?

A transaction table is one type of fact table, the one where each row is a single transactional event (a sale, an invoice line). There are other types: periodic snapshot fact tables record the state of something at regular intervals (inventory levels at end of day), and accumulating snapshot fact tables track an entity through a process (an order moving through fulfilment stages). All are fact tables; transaction grain is the most common variant.

Should a fact table always have a date dimension?

In practice, almost always yes. Events happen at points in time, and time is the most common dimension that analytical questions filter by. The exception is fact tables that record states or facts that don't have a meaningful time component, but those are rare. If you're building a fact table and there's no natural date to attach to each row, that's usually a sign that the table isn't recording an event and may not belong in the model as a fact table at all.

What does "grain" mean and why does it keep coming up?

Grain is the level of detail one row of the fact table represents. "One row per sale," "one row per invoice line," "one row per day per product per store." Stating the grain is the first thing you do when designing a fact table, because every subsequent decision (which dimensions attach, which measures make sense, how the table grows) depends on it. The cardinal rule: don't mix grains in one fact table. If you have monthly summaries and daily transactions, those are two fact tables, not one.

Is a star schema still relevant in 2026, given how powerful modern engines are?

Yes, more than ever. The argument that hardware has gotten fast enough to make modelling unnecessary has been made every five years for two decades. It hasn't held up. The Power BI Vertipaq engine is specifically optimised for star schemas; performance on flat tables is worse, not better, on most non-trivial models. And the editorial argument (a model that documents itself by its structure) has nothing to do with hardware. Star schemas are the standard because they encode a way of thinking about the data, and the way of thinking is still correct.

Where the series goes from here

Part 2 (11 Jun): Dimension Tables and the Lookup Side of the Star. What goes on the other side of those foreign keys, why a dimension table is structurally different from a lookup table, and the rules of conformed dimensions.
Part 3 (18 Jun): Slowly Changing Dimensions, the Types You'll Actually Use. SCD Types 1, 2, and 6, with the failure modes that justify each one. The post that makes failure mode two from above into a solvable problem.
Part 4 (25 Jun): Multiple Fact Tables and How They Coexist. Why "one fact table per workbook" is wrong, how conformed dimensions tie fact tables together, and the role of bridge tables.

Part 2 lands 11 June.