Table.Group Power Query: The Aggregation That Returns a Tabl

Q: Why is Table.Group slow on a large dataset?

The default is GroupKind.Global, which scans the whole table to identify groups. If your rows are already sorted by the grouping key, pass GroupKind.Local as the fourth argument. It only groups consecutive runs and is dramatically faster. The trade-off is correctness: Local returns wrong results if the data is not pre-sorted.

You hit the wall at Group By

You have a sales table with one row per order. You want the most recent order per customer, with the full row intact - order ID, date, amount, product, whatever else is sitting there. You open the Group By dialog, pick CustomerID, set the aggregation to Max on OrderDate, and you get exactly one thing back: CustomerID and a date.

The rest of the row is gone.

So you cobble something together. Sort by date descending, remove duplicates on CustomerID. It works, but it relies on the sort being stable through a remove-duplicates operation, which is an assumption worth questioning on a large dataset. Or you merge the table with itself and filter to matching max dates, which handles ties poorly. Or you give up and do it in DAX.

The thing you actually want, which is "keep the whole row where the date is latest, per customer," looks like it should be a one-liner. And it is. You just have to stop thinking of Group By as a collapse.

Table.Group doesn't collapse, it nests

Every other aggregation you have used takes a group of rows and returns a single value. Sum, count, average, max - input is many rows, output is one number. Table.Group can do that, but it does not have to. Under the hood, its aggregation operates on the full sub-table of each group, and it will return whatever you tell it to return, including another table.

That is the reframe. Group By in the UI looks like a list of collapsing operations, but the underlying function is asking you a different question: "For each group, here is a sub-table. What do you want to do with it?"

If you say Sum, it runs List.Sum on one column. If you say "All Rows" (which the UI hides under Advanced), it returns the sub-table itself, untouched. Once you have the sub-table, you can do anything to it, including sort by date and take the first row.

That is the whole trick.

Keeping the latest row per customer

Here is the M for it:

let Source = Sales, Grouped = Table.Group( Source, {"CustomerID"}, {{"Latest", each Table.First(Table.Sort(_, {{"OrderDate", Order.Descending}}))}} ), Expanded = Table.ExpandRecordColumn( Grouped, "Latest", {"OrderID", "OrderDate", "Amount", "Product"} ) in Expanded

What is happening at the interesting parts:

each ... is the aggregation function. The _ inside refers to the sub-table for the current group, every sales row for that customer.
Table.Sort(_, ...) sorts that sub-table by OrderDate descending.
Table.First(...) grabs the top row, which is now the most recent order. That returns a record.
Table.ExpandRecordColumn flattens the nested record back out into normal columns.

You can do the first part through the UI: Group By > Advanced > group on CustomerID, set the operation to "All Rows", give the new column a name. You get back a column where every cell is a mini-table. Then you replace the default each _ in the formula bar with the sort-and-take-first, and expand.

Same idea. Much less fragile than sort-then-remove-duplicates, because the sort is scoped to each group and explicitly tied to the aggregation.

Where it gets weird

GroupKind matters on big data

Table.Group has an optional fourth argument. GroupKind.Global (the default) scans the whole table to find groups. GroupKind.Local assumes rows are already sorted by the grouping key and only groups consecutive runs. Local is dramatically faster, but wrong if the data is not pre-sorted. If you are grouping a huge sorted table and the query feels slow, this is the first place to look.

Records vs tables when expanding

In the example above, Table.First returns a record, so you expand with Table.ExpandRecordColumn. If your aggregation returns a full sub-table (for "top 3 per customer", say), you expand with Table.ExpandTableColumn instead. Getting this wrong produces a baffling "we cannot convert a value of type Record to type Table" error.

Top N per group is the same pattern

Swap Table.First for Table.FirstN(_, 3) and you keep the three most recent orders per customer. No window function required.

Query folding

This pattern does not fold to SQL in most sources. The sub-table-then-sort logic is too Power-Query-specific. If you are pulling from a database that can do this natively, do it in SQL where it belongs and use Table.Group on the shaped result.

What changes when it clicks

Once you see Table.Group as "nest by key, then apply any function to the sub-table," the rest of the weird patterns in Power Query stop looking weird.

Top N per group. First or last row per category. Running totals within partitions, by combining Table.Group with List.Accumulate on each sub-table. Calculating a group-level metric and joining it back without an explicit merge. These all used to require helper columns or self-joins or a trip through DAX. Now they are a grouping followed by the right one-liner on _.

The broader shift is about how to read Power Query operations in general. Most of the UI sits on top of primitives that are more flexible than the dialog lets on. Group By is the most important of those. When you see a default aggregation in the UI, remember that the function underneath is a full table-to-anything operation, and that you can always drop into the formula bar and give it something more interesting to do.

Frequently Asked Questions

How do I keep the most recent row per category in Power Query?

Use Table.Group with an "All Rows" aggregation, then pick the top row after sorting each sub-table by date. The M pattern is Table.Group(Source, {"CategoryID"}, {{"Latest", each Table.First(Table.Sort(_, {{"Date", Order.Descending}}))}}), followed by expanding the resulting record column with Table.ExpandRecordColumn.

What does "All Rows" do in Power Query Group By?

"All Rows" is an aggregation option hidden under Advanced in the Group By dialog. Instead of collapsing each group to a single value, it returns the full sub-table of rows for that group as a nested column. From there you can apply any table operation - sort, filter, take top N, calculate a group-specific metric - before expanding back out.

What is the difference between Table.Group and the Group By UI?

Group By in the UI is a wrapper around Table.Group. The UI only surfaces a handful of aggregations (Sum, Count, Max, and a few others) plus a single "All Rows" option under Advanced. Dropping into the formula bar gives you the full function, where the aggregation can be any each expression that operates on the sub-table of each group.

Why is Table.Group slow on a large dataset?

The default is GroupKind.Global, which scans the whole table to identify groups. If your rows are already sorted by the grouping key, pass GroupKind.Local as the fourth argument. It only groups consecutive runs and is dramatically faster. The trade-off is correctness: Local returns wrong results if the data is not pre-sorted.

Can I get top N rows per group in Power Query without writing M?

Not through the UI alone. Use Group By > Advanced > "All Rows" to get the nested sub-tables, then edit the aggregation in the formula bar to each Table.FirstN(Table.Sort(_, {{"SortCol", Order.Descending}}), N) and expand the result with Table.ExpandTableColumn. No helper columns or window functions required.

Does Table.Group fold to SQL?

Simple aggregations (Sum, Count, Max grouped by a key) generally fold. The pattern in this post - "All Rows" followed by sort-and-take-first on each sub-table - does not fold in most data sources, because the sub-table-then-operate logic has no direct SQL equivalent. If you are pulling from a database that supports window functions, do this kind of transformation in SQL and bring the shaped result into Power Query.