What Data Analysis Is Actually Like

Q: Is data analysis boring?

Parts of it are. Data cleaning and maintenance work can be repetitive. Rebuilding the same dashboard with slightly different filters for a new stakeholder is not exciting. But the moments where you find something nobody expected, where a pattern in the data changes a business decision, those are genuinely thrilling. The ratio of tedious to exciting depends heavily on the company. At startups, analysts are closer to decision-making and see more impact. At large companies, the work can feel more like a reporting function.

Q: What is the difference between a data analyst and a data scientist?

In practice, the line is blurry and varies by company. Generally, data analysts focus on descriptive and diagnostic work: what happened, why it happened, and reporting on metrics. Data scientists focus on predictive and prescriptive work: building models, running experiments, and statistical analysis. At many companies, especially smaller ones, the same person does both. The title often depends more on the company's naming conventions than the actual work being done.

~21 min read · 3 voices

We talked to three data analysts. One at a national retailer. One at a startup where she is the data team. One at a healthcare company where a wrong number could end up in a clinical decision. Same title. Very different realities.

These characters are composites, built from dozens of real accounts, interviews, and community threads. The people aren't real. The experiences are.

What you'll learn

What the data analysis role actually looks like across three different company types, beyond the job description
The parts of the job most people do not talk about until they are already in it
How the work shifts based on industry, company size, and the maturity of the team around you
Whether data analysis tends to be a destination or a stepping stone, and what determines which it becomes

What It's Like Being a Data Analyst at a Large Retailer

Marcus

35Senior Data Analyst at a national home goods retailer in Minneapolis5 years in · Came from an econ master's program

Maintains 43 dashboards in Looker. Knows for a fact that 7 of them have not been opened by anyone in over 90 days. Nobody has asked him to retire them.

Tell us about a specific day that stuck with you.

OK so, this was a Thursday in November. Right before Black Friday planning kicks in, so everyone in marketing is already tense. I get a Slack message at like 8:15 from a director in the e-commerce marketing team, a woman named Gina. The message says, "Hey can you pull me conversion rates by channel for the last 90 days? Need it for a deck by 2." Which, on the surface, sounds simple. Conversion rates by channel. That's one query, right?

No. Because "conversion" at our company means three different things depending on who you ask. Gina's team measures conversion as orders divided by sessions. The product team measures it as orders divided by unique visitors. The finance team measures it as revenue per session, which isn't even a conversion rate but they call it one. So step one is I have to ask Gina which definition she wants. She says "the normal one." I say which normal one. She says "the one we used in the Q2 review." I go find the Q2 review deck. It used sessions-based conversion. Fine.

Step two, "by channel." Our attribution model was rebuilt in August. Before August, paid social and organic social were lumped together. After August, they're split. So if I pull 90 days of data, the first 30 days have a different channel taxonomy than the last 60 days. If I just run the query, paid social will look like it didn't exist before August and organic social will look like it collapsed. Which would be wrong but it would look very dramatic in a slide and Gina might not notice until someone at the VP level asks why organic social dropped 60% in August.

So I have to map the old taxonomy to the new one for the first 30 days. That's not a query, that's a whole thing. I wrote a CTE that remaps the old channel labels, ran it against our Snowflake warehouse, QA'd it by spot-checking against the raw GA4 events, and put it in a temp table. That took about an hour and a half.

"Can you pull me conversion rates" turned into an hour and a half of work.

Yeah, and that's, like, that's the whole job in one sentence. People think data analysis is asking a question and getting an answer. But the question is never clean and the data is never clean and the space between the ask and the answer is where I live. I spend maybe 20% of my time doing anything you'd call "analysis." The rest is cleaning, mapping, QA'ing, and explaining why the numbers are what they are.

Anyway, I sent Gina the data at 1:30. A clean spreadsheet, conversion by channel, 90 days, properly mapped. She said "perfect, thanks." That was it. No follow-up questions. I have no idea if the number changed a decision. I don't know if the deck went well. I built a pipeline and QA'd it for three hours and the output was a table that someone glanced at and dropped into a slide. That is a very normal day for me.

I spend maybe 20% of my time doing anything you'd call analysis. The rest is cleaning, mapping, QA'ing, and explaining why the numbers are what they are.

— Marcus

You mentioned 43 dashboards. Tell us about that.

So this is, I think, the thing that would surprise people most about data analysis at a big company. I maintain 43 dashboards in Looker. Some of them I built from scratch. Some I inherited from the analyst before me, a guy named Patrick who left for Spotify. A few of them I've never actually looked at in detail, I just make sure the data pipelines feeding them don't break.

I checked the usage stats a few months ago out of curiosity. Seven of those dashboards had zero views in the last 90 days. Zero. Another twelve had fewer than five views, and three of those five views were me checking to make sure they still worked. So roughly half my dashboards are either unused or barely used.

I brought this up with my manager, Yvonne. I said, can we retire the ones nobody's looking at? She said she understood, but some of those dashboards were built for specific VPs and even if the VP hasn't looked at it in three months, sunsetting it without their sign-off would be politically risky. One of them was built for the CMO's direct reports. Nobody opens it. But if someone asks for it and it's gone, that's a problem. So I maintain it. I update the data connections when the schema changes. I fix it when a join breaks. For a dashboard that nobody uses. Because someone important might someday look at it and it needs to be right when they do.

That sounds like it would drive you crazy.

It does and it doesn't. The part that's frustrating is the waste. The part that isn't frustrating is, I've gotten very fast at maintenance. Patrick, the guy before me, he didn't document anything. When I took over his dashboards, half of them had raw SQL with no comments and variable names like "tmp2" and "final_v3_REAL." I spent my first three months rewriting everything with proper CTEs and documentation. Now when something breaks, I can usually fix it in twenty minutes instead of two hours. So the 43 dashboards cost me maybe four hours a week in maintenance. It's not nothing but it's manageable. It's just not what I thought I'd be doing with a master's in economics.

The part nobody talks about

What's yours?

How often you know the answer before you run the query, and nobody believes you until you show them the data. Gina asked me about conversion by channel. I already knew paid search was going to be up and email was going to be down because I look at this data every week. I could have told her in fifteen seconds. But she didn't want my opinion. She wanted a spreadsheet. Because a spreadsheet is evidence and my opinion is just a guy in analytics saying words.

That's the weird thing about this job. You develop intuition over time. You see the patterns. You start to just know things about the business that other people don't. But that knowledge has no authority unless it's attached to a chart. I am a person who understands this business deeply and also a person whose understanding doesn't count until it's in Looker. Those are the same person and it's a strange way to exist.

What It's Like Being the Only Analyst at a Startup

Tessa

27Data Analyst (and data engineer, and BI developer, and sometimes DBA) at a Series B proptech startup in Denver2 years in · Was a math major who "fell into" data

Her official title is "Data Analyst." Her actual job is everything that touches a database. Built the company's entire analytics stack on a Saturday because the CEO wanted metrics for a Monday board meeting.

You said you're the entire data team. What does that actually mean?

It means if a number exists at this company, I'm the reason it exists. Our product helps property managers track maintenance requests and tenant communication. We have about 900 customers. And when anyone at this company says "the data says" they mean "Tessa wrote a query."

I maintain the Postgres database. I built the ETL pipeline that moves data from our production DB into our analytics warehouse, which is BigQuery. I built every dashboard in Metabase. I write the SQL for every investor report. When our CEO, James, has a board meeting, I'm the one who generates the slide that shows MRR growth, churn, net revenue retention, and expansion revenue. When the product team wants to know which features are sticky, I write the query. When the sales team wants to know the average deal cycle for property managers with over 200 units, I write the query. When marketing wants attribution data, I write the query. I am the query.

Give us a day.

Yeah OK so last Tuesday. I come in and there are four Slack messages waiting for me. First one is from James. He's having coffee with an investor from Insight Partners tomorrow morning and he wants an updated retention cohort chart. Not the one I built last quarter. A new one. With monthly cohorts going back eighteen months. And he wants it to show net revenue retention, not logo retention, because "NRR tells a better story." He's right, it does, but calculating NRR by monthly cohort requires me to track expansion and contraction revenue per customer per month, which I've never built before. So that's, like, a half-day project that he thinks is "can you update the chart."

Second message is from Rachel in customer success. She wants to know the average time from first maintenance request to second maintenance request for our top 50 customers by revenue. She's trying to prove that customers who submit their second request within 48 hours have higher retention. Which, actually, is a really interesting hypothesis. But the query to test it requires me to join our events table with the customer revenue table, filter for top 50 by trailing twelve-month revenue, then calculate the time delta between event one and event two per customer. And our events table has 14 million rows and no index on customer_id because I haven't gotten around to adding one. So the query takes seven minutes to run. And I'll probably need to run it four or five times to get the logic right.

Third message is from Priti on the engineering team saying the nightly ETL job failed. Which means yesterday's data isn't in BigQuery. Which means every dashboard is showing data through Monday, not Tuesday. Nobody has noticed yet but they will when James opens the retention chart and sees the numbers haven't moved.

Fourth message is from the new sales hire, Colton, asking me how to export a list from Metabase. That one I answer in thirty seconds with a screenshot. But it's still a context switch.

So how do you prioritize all of that?

ETL first. Always. If the pipeline is broken, nothing else matters because every number in the company is stale. I check the logs. The failure was a timeout on the maintenance_requests table because it grew past the query limit I set when I built the pipeline eight months ago. The table had 6.2 million rows then. It has 11.4 million now. I doubled the timeout, reran the job, it finished in twenty-two minutes. Data's fresh by 9:30. Nobody noticed the gap.

Then James's cohort chart. This is the thing that eats my day. I spend from about 10 to 2:30 building the NRR cohort query. It's not algorithmically hard, it's just tedious. I have to calculate each customer's revenue in their signup month, then track how it changes month over month, then aggregate by cohort. Our billing data is in Stripe, which I pull into BigQuery via a Fivetran connector that sometimes misclassifies annual payments as twelve monthly payments, which inflates the MRR number. So I have to add a filter that identifies annual contracts and distributes the revenue evenly across months. That one edge case took me forty-five minutes to debug because I kept getting a cohort in August 2024 that showed 140% NRR, which is great but wrong.

I eat lunch at my desk at 2:30. By 3 the cohort chart is done and I've sent it to James. He says "this is great, can you add a line for the median?" which takes five minutes but requires me to rewrite the outer query because Metabase doesn't support median natively in its chart builder. I write a custom SQL view with a PERCENTILE_CONT function. The median line shows up. James says "perfect." That word, "perfect," is probably the best feedback I get in a given week.

James says "perfect." That word is probably the best feedback I get in a given week.

— Tessa

What about Rachel's hypothesis?

I get to it around 4. And this is actually my favorite part of the job, when I get to do it, which is maybe twice a week. Rachel's hypothesis is that customers who submit a second maintenance request within 48 hours of their first one have higher retention. The theory being that if a property manager is active in the product early, they stick around.

So I run the query. Seven minutes, like I said, because no index. And the answer is, she's right, but not the way she thinks. Customers who submit their second request within 48 hours have a 91% twelve-month retention rate compared to 74% for customers who take longer than a week. But when I dig deeper, I realize it's not really about the second request. It's about the first 48 hours in general. Customers who do anything, second request, invite a team member, customize their notification settings, within 48 hours have higher retention. The second request is just a proxy for early activation.

I write this up in a Notion doc with two charts. I send it to Rachel and cc James and our head of product, Amir. Amir messages me an hour later and says this changes how they think about onboarding. They'd been focused on getting customers to submit their first request. Now they want to optimize for any activity within 48 hours. That is, like, that right there is why I do this job. A question from a CS manager turned into a product insight that might actually change the onboarding flow. That happened because I didn't just answer Rachel's question. I looked at what was underneath it.

A question from a CS manager turned into a product insight that might change the onboarding flow. That happened because I didn't just answer the question. I looked at what was underneath it.

— Tessa

Do you always get to dig like that?

No. That's the problem. Most days I'm doing the ETL fix and the cohort chart and the Colton screenshot and I never get to the Rachel question. The Rachel question is the reason I took this job. The other stuff is what the job actually is. On a good week I get maybe eight hours of real analysis. The rest is infrastructure, maintenance, and answering questions where the person already knows what they want the answer to be and they just need me to generate the spreadsheet that confirms it.

The part nobody talks about

What's yours?

The loneliness of being the only one who speaks your language. When I find the onboarding insight, I can explain it to Amir and Rachel and they get it. But the process of finding it, the seven-minute query, the PERCENTILE_CONT function, the Fivetran misclassification, the missing index on customer_id, nobody understands that part except me. There's nobody at this company who can review my work. If I make a mistake in a query, there's no one to catch it. If I have a question about the best way to model something, I ask ChatGPT or I post in the dbt Slack community and hope someone answers.

My college roommate, Karen, she's an analyst at Target. She has a team of twelve. They do code reviews on each other's SQL. They have a style guide. They have an analytics engineering manager who decides the data modeling approach. I have me. I am the style guide. I am the code review. It's the most autonomy I've ever had in a job and also the most isolating. Some weeks it feels like freedom. Some weeks it feels like nobody would know if I disappeared except when the dashboards stopped updating.

What It's Like Being a Data Analyst in Healthcare

Ian

31Data Analyst at a mid-size health system's analytics group in Pittsburgh4 years in · Background in public health

Triple-checks every query he writes because a wrong number here doesn't just lose someone money. It might change a treatment protocol. Describes this as "the weight."

Healthcare analytics. What's different about it?

The stakes. That's the short answer. At a retailer or a startup, if you get a number wrong, someone makes a bad business decision. That's real, I'm not minimizing it. But in healthcare, if I get a number wrong, it could influence how a hospital allocates its nursing staff, or whether a certain procedure is flagged as cost-effective, or whether a patient population is identified as high-risk. Those are clinical-adjacent decisions. People's health is downstream of my queries.

I work in the analytics group for a health system that operates four hospitals and about thirty outpatient clinics in western Pennsylvania. My team has six analysts. We report to the Chief Analytics Officer, a woman named Dr. Okafor, who is an epidemiologist by training and the most careful person I've ever met. She reviews our methodology on anything that goes to clinical leadership. Not our code, she doesn't read SQL. Our methodology. She'll ask things like, "Did you adjust for case mix when you compared readmission rates across facilities?" And if I didn't, the report doesn't go out.

Give us a specific situation where the stakes were real.

Yeah, so about a year ago. We were doing a report for the quality committee on 30-day readmission rates for heart failure patients. This is a CMS metric, Centers for Medicare and Medicaid Services. If your readmission rate is above the national average, CMS penalizes you. Financially. We're talking hundreds of thousands of dollars a year for a hospital our size. So getting this number right matters in a very direct, money-is-on-the-line way.

I pulled the data from our Epic EHR. Heart failure patients, discharged from our main hospital, readmitted within 30 days to any of our four facilities. The number I got was 22.4%. The national average is around 21.9%. So we were slightly above. Not catastrophically, but above. And if that number was right, the quality committee was going to recommend a new discharge protocol that would cost about $200,000 to implement. Extra nursing follow-ups, pharmacy consultations, post-discharge phone calls.

Before I sent the report, I did something I do with every high-stakes query, which is I ran the same analysis a different way. Instead of pulling from our curated analytics tables, I went back to the raw ADT data. ADT is admission, discharge, transfer. The raw event log. And I got 21.1%. A full point lower. Below the national average.

So which number was right?

That's the question I spent two days answering. The difference was in how we defined "readmission." Our curated analytics tables counted transfers between our facilities as readmissions. So if a heart failure patient was discharged from our main hospital and then transferred to our rehab facility three days later, that showed up as a readmission. But CMS doesn't count planned transfers as readmissions. They only count unplanned acute care readmissions.

Eleven patients in the dataset had been transferred to rehab, not readmitted for an acute episode. When I removed them, the rate dropped from 22.4% to 21.1%. Those eleven patients were the difference between "we're above average and need to spend $200,000 on a new protocol" and "we're below average and what we're doing is working."

I brought both numbers to Dr. Okafor. She spent about an hour walking through my logic. She pulled up the CMS methodology documentation on her screen and we compared it line by line to our table definitions. She agreed with the 21.1% number. The report went to the quality committee with the corrected figure. The $200,000 protocol was shelved.

Eleven patients. That's the difference between "spend $200,000 on a new protocol" and "what we're doing is working." I almost sent the wrong number.

— Ian

What would have happened if you hadn't double-checked?

The hospital would have spent $200,000 solving a problem that didn't exist. And the quality committee would have made a clinical decision based on a data error that was, honestly, subtle. The tables weren't wrong. They were counting correctly by their own definition. The definition just didn't match the CMS definition. That's not a bug. It's a semantic mismatch. And those are the scariest errors because they don't throw an exception. The query runs. The number looks plausible. 22.4% is not a crazy readmission rate. Nobody in the room would have questioned it if I'd presented it confidently.

That's the thing that I carry. The errors that look right. In retail analytics, if your conversion rate is wrong by a point, you might waste some ad spend. In healthcare analytics, if your readmission rate is wrong by a point, a committee makes a clinical decision based on a false premise. And the really terrifying part is, I catch the ones I catch. I have no way of knowing if there are errors I didn't catch. I've gone back and re-audited old reports when I can't sleep. I've found two more issues over four years. Both were minor. But the fact that they were there at all means there could be others.

How do you deal with that pressure?

Process. That's the honest answer. Dr. Okafor built a review system that I've basically made my religion. Every query that feeds a report that goes to clinical leadership gets run two ways. I call it the "two roads" approach. Write the query against the curated tables, then write it again from raw data. If the numbers match, ship it. If they don't, figure out why before anything leaves the team.

It takes longer. It probably doubles my analysis time on the high-stakes stuff. But I've caught four discrepancies in the last year using this method, and all four would have resulted in misleading reports. One of them would have overstated emergency department wait times at our smallest hospital by twelve minutes on average, which would have triggered a staffing review that the hospital didn't need.

My colleague, Nina, she does the same thing. We peer-review each other's high-stakes work. She's caught things in my queries and I've caught things in hers. Last month she found a date filter in one of my queries that was using the wrong timezone conversion. We're in Eastern but some of our data sources timestamp in UTC. That kind of thing. An hour of timezone offset that could shift which patients fall into which reporting period. Small, invisible, potentially consequential. That's the texture of this work.

The part nobody talks about

What's yours?

The weight stays with you even when you're not at work. I'll be at dinner with my girlfriend and I'll suddenly think, wait, did I filter out observation stays from that ED throughput report? And I'll check my laptop. Or I'll wake up at 2 AM and think about whether the diagnosis codes I used for the diabetes cohort included Type 1 and Type 2 or just Type 2. And sometimes I get up and check.

My girlfriend, Mara, she's an accountant. She says she triple-checks her numbers too. And I believe her. But if she's off by a dollar, someone's tax return is slightly wrong and it gets corrected. If I'm off by eleven patients, a hospital spends $200,000 on a protocol it doesn't need, or worse, doesn't spend it on a protocol it does. That asymmetry lives in me. It's not anxiety exactly. It's more like a permanent low hum of responsibility. My therapist calls it hypervigilance. I call it the job.

Would They Do It Again?

Marcus

Probably. But not here.

I like data work. I like understanding a business through numbers. I do not like maintaining 43 dashboards and being a human query executor. If I could find a role where the analysis-to-maintenance ratio was 50/50 instead of 20/80, I'd love this job. I think that role exists at smaller companies. I might go find it.

Tessa

Yes. For the Rachel moments.

When I find something real in the data and it actually changes what the company builds, there's nothing like it. The ETL fixes and the Colton screenshots and the cohort charts, those are the tax. The Rachel moments are the reason. The ratio is wrong right now but I'm twenty-seven and learning faster than anyone I know.

Ian

Yes. Because the weight is the point.

I could go to a tech company and make more money doing analysis that, honestly, matters less. The weight of healthcare data is what makes me careful, and being careful is what makes me good. I don't want to do analysis where the worst outcome is a bad dashboard. I want the work to matter. It just costs something to carry that.

Frequently Asked Questions About Data Analysis

What does a data analyst actually do all day?

Most of a data analyst's day is not building models or finding insights. It's cleaning messy data, writing SQL queries to answer ad-hoc questions from stakeholders, maintaining dashboards that may or may not get used, and translating business questions into data questions. At larger companies, a significant portion of time goes to fielding requests from non-technical people who want "a quick pull" that requires joining six tables and explaining why the numbers don't match what they saw in a different report.

Is data analysis boring?

Parts of it are. Data cleaning and dashboard maintenance can be repetitive. But the moments where you find something nobody expected, where a pattern in the data changes a business decision, those are genuinely thrilling. The ratio of tedious to exciting depends heavily on the company. At startups, analysts are closer to decision-making and see more impact. At large companies, the work can feel more like a reporting function.

Do data analysts need to know programming?

SQL is non-negotiable. Every data analyst job requires it. Python or R is increasingly expected, especially for anything beyond basic reporting. At startups, analysts often need Python for data pipeline work, not just analysis. At larger companies, you can get by with SQL and Excel for longer, but career advancement increasingly requires Python, dbt, or similar tools.

What is the difference between a data analyst and a data scientist?

In practice, the line is blurry. Generally, data analysts focus on descriptive and diagnostic work: what happened and why. Data scientists focus on predictive and prescriptive work: building models and running experiments. At many companies, especially smaller ones, the same person does both. The title often depends more on the company's naming conventions than the actual work.

What Data Analysis Is Actually Like

What you'll learn

What It's Like Being a Data Analyst at a Large Retailer

Marcus

What It's Like Being the Only Analyst at a Startup

Tessa

What It's Like Being a Data Analyst in Healthcare

Ian

Would They Do It Again?

Frequently Asked Questions About Data Analysis

Keep Reading