Power Query is brilliant at one job: reshaping data on the way in.

The take is that "on the way in" is doing a lot of work in that sentence, and most of the M code I see in production is doing things that should have happened somewhere else entirely.

Why everything ends up in Power Query

Because Power Query is the first place you touch the data, and once you're already there, every problem looks like it can be solved with another step. Need a calculated column? Add it. Need to filter to last quarter? Add it. Need a running total? Now you're in Group By, Index Column, custom function territory, but you're committed.

It also feels productive. The applied steps panel grows. The query "does more." You can see the work. DAX is invisible by comparison: one measure that looks the same whether it's doing a sum or a context-transition somersault. Python is a separate window, a separate language, a separate skill. Power Query is right there.

So it gets used for everything.

Where it actually breaks

Three failure modes show up over and over.

Calculations that should be measures.

A column added in Power Query is materialised. It sits in the model, takes up memory, and gets computed every refresh whether anyone looks at it or not. A measure runs at query time, only when a visual asks for it. If you've added a "Margin %" column to a fact table with thirty million rows, you've added thirty million numbers to the model that DAX could have computed on demand from two existing columns.

Transformations that should be SQL

Power Query will happily join two tables of a million rows each. It will not happily fold that join back to the source, especially if you've added a custom step earlier in the chain. The query runs locally, in memory, on your machine, every refresh. If your source is a database that could have done the join in two seconds with an index, you've just moved a database job into a tool that isn't a database.

Logic that belongs in Python or a dataflow

Cleaning a single CSV before it lands in the model is fine. Implementing a fuzzy matching algorithm across three sources with a custom function and List.Accumulate is not. The M is unreadable, untestable, and the next person who opens the file has no chance of safely changing anything.

The common thread: M code is hard to hand over. There's no debugger. There's no real test framework. The error messages refer to "the previous step" without telling you which step. A query someone wrote eighteen months ago with twelve nested Group By operations is functionally write-only software.

Pick the tool by where the work actually belongs

Reshaping the source is Power Query's job: column types, unpivoting, splitting, light cleaning. Do it there.

Calculations that depend on filter context, slicers, or user interaction are DAX. Measures, not columns, unless there's a specific reason.

Heavy joins, aggregations, or anything the source database could do faster: push it back to SQL. A view is more maintainable than a Power Query chain ten people will be afraid to touch. Anything algorithmic, iterative, or needing proper data structures goes in Python, either a notebook or a dataflow Gen2. M is not the language for any of that.

The test isn't "can Power Query do this?" Power Query can do almost anything. The real test is whether the next person to open the file can read what was written and change it without breaking it. Power Query is a transformation tool. Treat it like one, and stop asking it to be a database, a calculation engine, and a programming language at the same time.

Frequently Asked Questions

Should I do calculations in Power Query or DAX?

Default to DAX measures. A measure runs at query time and adds nothing to the model size. A calculated column in Power Query is materialised and sits in memory whether anyone uses it or not. Use Power Query for calculations only when the value needs to be sliceable as a dimension or fed into another transformation step.

Why is my Power Query refresh so slow?

Most often it's because the query isn't folding. Power Query tries to push transformations back to the source as native SQL, but custom steps, M-only functions, and certain operations break that fold. Once it breaks, the rest of the query runs locally and pulls all the data into memory. Right-click a step and choose "View Native Query" to see whether it's still folding.

When should I use SQL instead of Power Query?

When the source supports it. If the data lives in a database, push joins, aggregations, and heavy filtering into a view or a stored query. The database is built for that work and will do it faster than Power Query running on your local machine. Power Query then becomes a thin shaping layer over clean source data.

Is Power Query M code production-grade?

For straightforward shaping, yes. For complex business logic, no. M has no debugger, no real test framework, and limited tooling for handover. If a query has more than fifteen steps, custom functions, or List.Accumulate logic, the maintainability cost has already exceeded the convenience.

Can I use Python with Power Query?

Yes. Power Query has a "Run Python script" transformation, and Power BI dataflows Gen2 support Python scripts as well. For algorithmic work or anything needing proper libraries (regex beyond the basics, fuzzy matching, ML scoring), this is usually a better choice than fighting M to do it.