SQL vs. Data Modeling Instincts for Excel Users

You Don't Need SQL Access. You Need a Better Mental Model.

Someone in your organisation has a spreadsheet problem and a politically fraught solution: file an IT ticket requesting read access to the database server.

The ticket takes three weeks. The DBA wants to know the business justification. Your manager needs to approve it. IT wants to know what tool you'll be using, and when you say "Excel," there's a pause on the call. Two more weeks pass. You get access to a view that shows you twelve of the forty columns you actually need, with the rest requiring a separate request for "data governance reasons."

By the time it's sorted, the deadline has passed, someone ran the report manually from a CSV export, and the whole exercise felt like a punishment for having the audacity to want to do your job properly.

This is not a made-up scenario. It happens constantly, and it has a cause that nobody talks about: the person filing the ticket doesn't yet know what they're actually asking for, which means they're asking for the wrong thing.

The row limit everyone quotes and what it actually means

Excel's worksheet row limit is 1,048,576 rows. This number gets cited so often that it's become a kind of technical folklore - the wall you'll eventually hit, the reason you "need a database."

What gets left out: the limit was already there in Excel 2003, where it was 65,536. The 1M row version arrived with Excel 2007. The more interesting architectural change happened in 2013, when Microsoft shipped Power Query - then called Get & Transform - as a proper first-class feature.

Power Query doesn't store data in a worksheet. It stores a transformation recipe. The query runs against the source, applies your filtering and aggregation steps, and loads only the result into Excel. If you want the total sales by region for the last ninety days out of a 20-million-row database, Power Query handles that outside the worksheet entirely, and what lands in your sheet is a table with six rows.

The 1M row limit is irrelevant to that workflow. You never needed to import 20 million rows. You needed to ask a better question before the data came in.

Why the SQL access request actually kills momentum

When you file a ticket for database read access, you're implicitly signing up for a much larger scope than most people realise.

Direct database access means someone has to trust that you'll write queries that don't hammer the server at 9am on month-end. It means the DBA has to grant credentials that exist indefinitely until someone remembers to revoke them. It means your connection is subject to network policy, authentication infrastructure, timeout settings, and connection pooling rules that weren't designed for ad hoc Excel reports. It means you're now in scope for audit logging, access reviews, and potentially data classification procedures depending on what tables you're touching.

That's not bureaucracy for sport. That's what responsible data governance looks like at an organisation large enough to need it. The IT team isn't being obstructionist - they're being careful about something that genuinely requires care.

The problem is that most reporting needs don't actually require this. They require access to a specific result set on a recurring schedule, which is a much narrower ask with a much faster approval path.

If you can define your output before you file the ticket - here's the table shape I need, here's the filter logic, here's the refresh frequency - you're asking for a view or a report endpoint, not database credentials. That's a very different conversation with the data team, and it usually moves faster.

Filtering and aggregating 10 million rows without leaving Excel

Power Query can connect to most data sources directly: SQL Server, Azure SQL, SharePoint Lists, Dataverse, CSV files, APIs, and others. When it connects to a relational source, it doesn't pull everything and then filter - it sends the filtering instructions to the source as part of the initial query.

This is called query folding. When you apply a filter step in Power Query, the engine checks whether the data source can handle that operation natively. If it can - and for SQL databases, it usually can — Power Query translates your steps into SQL and runs them on the server. Only the result comes back across the network.

The practical upshot: a filter to the last 90 days, a group-by, and a sum column in Power Query over a 10-million-row SQL table will generate something like:

SELECT region, SUM(amount) FROM sales WHERE sale_date >= DATEADD(day, -90, GETDATE()) GROUP BY region

That query runs on the server in seconds. What arrives in your workbook is a small, aggregated table. You did not import 10 million rows. You did not wait for a 400MB file to parse. You wrote four Power Query steps and got the answer.

Does query folding mean I'm already writing SQL?

Not exactly - but you're thinking in the same terms, which is the part that matters.

SQL is a language for expressing what you want from a relational dataset: which table, which columns, which rows (WHERE), how to group them (GROUP BY), how to order them (ORDER BY). Power Query M, the language under the hood, is a functional transformation language that describes operations on data. They solve the same problems from different angles.

The reason query folding is worth understanding isn't technical. It's practical. Power Query steps that fold run on the server - fast, low bandwidth, efficient. Steps that don't fold pull data into Excel's memory first and process it there. If you add a step that breaks the fold chain - a custom column with a complex M expression, or a function that the source connector can't translate - everything from that point forward runs locally, against all the rows you've now pulled into memory.

You don't need to know exactly which steps fold and which don't. But you should know to: filter early (step 2 or 3, not step 12), aggregate before you merge, and be suspicious of any step that feels like it couldn't be described as a SQL clause. The Power Query Diagnostics tool can show you whether folding is happening if you want to verify.

How the technical limits actually compare

Here's the honest comparison, without the mythology:

Worksheet row limit: 1,048,576 rows. This is a real constraint on what sits in a grid. It's not a constraint on what Power Query can process or what Power Pivot can model. If you're hitting this limit, you're probably storing data in cells that should be stored in a model.

Power Query: No hard row limit on processing. The constraint is your machine's RAM and the performance characteristics of the source. A well-folded query against a SQL source can effectively handle any dataset the server can query. An unfolded query over a flat file is constrained by how much data you can hold in memory.

Power Pivot (the Data Model): Stores data in a compressed columnar format in memory. Realistically handles hundreds of millions of rows on a modern machine, with some caveats around cardinality and the number of tables. The xVelocity engine it uses is the same one under the hood of Analysis Services. It is not a toy.

SQL Server / Azure SQL: Handles billions of rows. Has proper indexing, partitioning, and query optimisation. Has a DBA who monitors it. Is the right tool when the data itself needs to be shared, version-controlled, or accessed by multiple systems simultaneously.

The decision point isn't about row counts. It's about who needs to use the data, how fresh it needs to be, and whether you're solving a personal reporting problem or a team infrastructure problem.

Frequently asked questions

Can Power Query refresh automatically without me opening the file?

Yes, via Power Automate or a scheduled task - but both require some setup. If automatic refresh is the core requirement, a Power BI dataset with a scheduled refresh is cleaner architecture than a scheduled Excel file. That said, for a single analyst maintaining their own reports, opening the file and clicking Refresh All takes about eight seconds.

What if I need data from two different systems?

This is exactly what Power Pivot is for. Connect to both sources in Power Query, load the result sets into the data model, and define the relationship between them. You're doing a relational join, but you defined it once and it runs every time you refresh. No database access required. No DBA involvement.

At what point should I actually ask for database access?

When the data you need isn't surfaced by any existing export, API, or connector, and a view or endpoint would require more DBA time to set up than direct access. Or when you're building something multiple people will rely on, in which case Power BI with a proper semantic model over the database is the right answer — and that does require a service account with database access, but it's a one-time infrastructure request rather than a personal credential.

Is Power Query available in all versions of Excel?

It shipped with Excel 2016 and has been in every version since. It exists in Excel 2013 as an add-in. If your organisation is on something older than 2013, the row limit is genuinely the least of your problems.

The actual thing to build before you file the ticket

Here's the sequence that sidesteps most of the access frustration:

Get any sample of the data you need - a CSV export, a shared report, a manual extract someone runs monthly. This is usually achievable without any special access.
Build the model in Power Pivot. Define your tables, your relationships, your measures. Use the data you have. What you're building is the shape of the solution - the actual query logic, the relationships, the calculations. This takes a day or two and produces something you can show.
Show it to the data team - not as a request, but as a specification. Here's what I've built. Here's the table shape I need. Here's the filter logic. Here's how often it refreshes. Can you give me a connection to that, or set up a view?

That conversation is completely different from "I need read access to the database." You've done the hard thinking. You're not asking for credentials - you're asking for a data source that matches a design you've already validated. That request has a clear scope, a defined output, and takes minutes to grant instead of weeks.

The row limit was never the constraint. The missing mental model was. Once you have the model, the database access conversation almost takes care of itself.