Career DishReal jobs, real talk

What Data Science Is Actually Like

~26 min read · 3 voices

We talked to three data scientists. One is a senior DS at a health insurance tech company in Austin who has a text file on his desktop called "dumb_ideas.txt" with 47 entries. One runs client models at a 9-person analytics consultancy in Chicago and recently spent three weeks explaining why you cannot produce meaningful confidence intervals for a bank branch with 200 customers. One is a principal data scientist at a Seattle tech company who keeps a handwritten decisions log and is currently on Volume 4. Same job title. Very different Tuesdays.

These characters are composites, built from dozens of real accounts, interviews, and community threads. The people aren't real. The experiences are.

What you'll learn

What It's Like Being a Data Scientist at a Health Tech Startup

R

Ravi

32Senior data scientist at a Series D health insurance tech company in Austin4 years total, 2 years as senior · Statistics degree from UT Austin, went straight in
Has a text file on his desktop called "dumb_ideas.txt." It contains 47 untested hypotheses he's accumulated over four years. Most are things like "what if member age at enrollment is less predictive than ZIP code in this cohort?" The ones that turn out to be right are satisfying. The ones he never gets to test stay in the file.

What does your company actually do, and what's your job inside it?

We build software for small and mid-size health insurance carriers, mainly the regional ones that aren't big enough to have full data science teams. So when a carrier in, I don't know, rural Georgia wants to understand their member retention or model risk, they're usually doing it in Excel. We sell them a platform and then do the data science that sits underneath it. I'm on the modeling team, which is four people. My manager is Vikram, and he came out of quantitative finance, which means he has extremely high standards for statistical rigor and is sometimes mystified by how slow everything moves in healthcare data.

My job is basically to build the predictive models that power the features our clients pay for. Right now I'm working on a churn prediction model for the renewals product. The question is: which members are likely to not renew their plan at the end of the year? Because if a carrier's retention team knows 90 days out, they can intervene. Reach out, offer support, flag clinical programs. It's a straightforward enough idea. The execution is not.

What's the execution part that's not straightforward?

Let me tell you about the last three weeks. I'm building a gradient boosting model, using 18 months of claims data pulled out of Redshift. Our data engineer is Sid, and Sid is the person who keeps our pipelines running and also the person I call when my query takes 38 minutes because someone added a badly-written join to the shared cluster. Sid is very patient. Anyway, I'm running feature importance analysis, and the feature that keeps showing up as one of the strongest predictors is the number of billing disputes a member had in the last 90 days. Dispute count. Not claims frequency, not chronic condition flags, not age. Billing disputes.

And that's interesting because billing disputes are not a health variable. They're a customer service variable. A member who's arguing about charges is a member who is frustrated with the carrier. Which, when you think about it, makes perfect sense as a churn signal. Frustrated people leave. But I brought this to Priti, my PM, and she said she needed to show it to the director of renewals before she could use it in the product. The director's name is Carla, and Carla said "billing isn't a health metric, I don't want my reps calling people about insurance because they disputed a claim. That would feel punitive."

Which, I mean, Carla isn't wrong that it could feel that way in the wrong hands. But the signal is real. The dispute count variable has the highest SHAP value in the model. Removing it makes the model meaningfully worse. So now I've spent two weeks building a proxy version that captures the same underlying frustration signal through a different path, using a combination of call center contact frequency and claim resubmission rates. It works, roughly. The AUC dropped from 0.81 to 0.77. Not catastrophic but noticeable. The model is now slightly worse than it needed to be because of a business comfort decision, and that's the thing that stays with me.

How do you handle that? The model being worse than it could be?

You document it and you move on. I wrote up the tradeoff clearly in the model card so that six months from now when someone asks why we didn't use billing dispute count, there's a record. Vikram reviewed it and said "good documentation." That was the conversation.

But it's one instance of a pattern that is, I think, the thing nobody prepares you for in this job. You can have a correct answer and it can be ignored for reasons that are legitimate but have nothing to do with whether the answer is correct. Carla's concern was real. The business communication implication was real. And the model being worse is also real. All three things are true simultaneously and you have to sit with that.

My college roommate Marcus works in journalism. He files a story and it either runs or it doesn't, but when it runs, people read it. When I ship a model, there's this whole intermediate layer of whether the business trusts it enough to act on it. Even when it's working. Even when the evidence is strong. I've had models sitting in production for eight months where I can tell from the evaluation metrics that they're making good predictions, and the renewals team is still manually scoring members in a spreadsheet. The model exists, it's running, and the decision it's supposed to inform is being made by a different process. That's a thing that happens in this job.

You can have a correct answer and it can be ignored for reasons that are legitimate but have nothing to do with whether the answer is correct. And you have to sit with that.
— Ravi

What does a normal Tuesday look like for you?

I get to the office around 9:15. We're a hybrid team, I'm in three days a week. Tuesday usually starts with Vikram's standup at 9:30, which runs maybe 25 minutes. He's efficient. We go through blockers and priorities, and he'll usually ask at least one question that makes me realize I hadn't thought about something carefully enough. He does that with everyone, it's just how he works. Last Tuesday he asked why I was using 18 months of training data instead of 24, and I gave him my answer about distribution shift risk in healthcare data after the pandemic and he nodded and said "that's fine, just make sure it's in the writeup." It was a good catch. I'd thought about it but hadn't written it down.

After standup I usually have a 90-minute block for focused work. That's when I actually run models or write code. By noon there's usually at least one Slack from Priti asking for something directional, which means she wants a number or a chart she can put in a slide by Thursday. "Directional" in this context means "I know the analysis isn't finished but I need something I can show." I've gotten better at figuring out which directional answers are genuinely fine to give early and which ones will come back to haunt me if the full analysis changes them.

Afternoons are usually messier. More meetings. More Slacks. I'll often spend 45 minutes debugging a data issue that turns out to be a timezone conversion problem in the Redshift tables. I have an entry in dumb_ideas.txt from about six months ago that says "figure out once and for all whether our timestamp columns are UTC or local time." I still haven't resolved that globally. Sid and I agree on what each one is, but it's not documented anywhere official.

Is the job what you thought it would be when you were in school?

No. In school, data science was modeled as this very clean pipeline. You get data, you clean it, you build a model, you evaluate it, you deploy it, you celebrate. The reality is that roughly half my time is spent on things that aren't in that pipeline at all. Explaining to a non-technical stakeholder why a model output is probabilistic and not binary. Writing documentation. Reviewing someone else's SQL. Sitting in a meeting where the business question is still being figured out while I'm in the room. Chasing down Sid about why the Redshift query took 38 minutes. None of that is on the flowchart from my ML textbook.

I don't think that's unique to my company. I have a friend who works as a DS at a large e-commerce company and she says the same thing. The technical work is probably 40% of the job. The rest is translation and coordination and a kind of institutional patience that nobody teaches you in a statistics program.

The part nobody talks about

What's yours?

How often you know the answer before you run the analysis, and then have to run the analysis anyway to give the answer legitimacy it already had. Vikram would call this "the documentation problem" and I think that's right. The insight usually comes pretty fast for someone who knows the data well. But the insight without the validation is just an opinion. So you spend three days building a rigorous statistical case for something you were pretty confident about on day one.

I'm not complaining about rigor. Rigor matters. The AUC really does tell me something I didn't fully know before. But there's a specific feeling that comes from spending three days confirming something you already suspected, and it's not the same as discovering something genuinely new. My first year I thought that feeling would go away as I got more experienced. It didn't. Vikram has it too. He's never said it explicitly but I've watched him do a full statistical writeup for something where he was clearly already certain. That's the job. You make the invisible evidence visible. The trick is staying genuinely curious about whether you might be wrong while you're doing it.


What It's Like Being a Data Scientist at a Small Consultancy

M

Morgan

28Data scientist at a 9-person analytics consultancy in Chicago2 years · CS from Northwestern, 1 year at a CPG company first
Color-codes every project folder on her laptop. Her desktop has exactly three items, all inside a folder called "working." She once reorganized her shared drive during a slow week and nobody noticed except Simon, who said "finally." That was the most approval she's ever heard from Simon.

What's different about doing data science at a consultancy versus a single company?

The client work is the whole job. At the CPG company I was at before this, my job was basically supporting the marketing analytics team. I built the same reporting models every week, and I got good at them, but I was on one domain for a year and I was getting a little stagnant. Here at Thornfield, every engagement is different. In the last two years I've built propensity models for a regional bank, a churn analysis for a regional grocery chain, a demand forecasting model for a mid-size industrial supplier. Simon, who founded the firm, says variety is the thing that keeps you sharp. He's right about that part.

The other side of it is that you're always an outsider. You don't know the internal politics. You don't know why the stakeholder is asking for what they're asking for. You learn to triangulate fast. Read the room in the first two client calls, figure out who the real decision-maker is, figure out what outcome would make them look good to their boss, because that is ultimately what a successful engagement looks like. Simon is extremely good at this. I am getting better at it.

Walk us through a project you're working on right now.

The bank one. First Midwest Regional, they have 47 branches across Illinois and Wisconsin. The engagement is about predicting which of their checking account customers are likely to open a savings product in the next 90 days. They want to identify high-propensity customers so their branch managers can do targeted outreach. Straightforward propensity model. I've done this type before.

The part that became not straightforward: my stakeholder over there is Tae-yang, and Tae-yang is the bank's VP of analytics. Smart guy, very business-focused, and about three weeks into the engagement he asked me to provide confidence intervals by branch. Which is reasonable to ask. He wants to know: when we predict 40 customers in the Waukegan branch are high-propensity, how confident are we?

The problem is the Waukegan branch has 200 customers. And the savings product open rate in their portfolio is around 12%. So I'm building a model on 200 observations with a 12% event rate. When you break that down to branch level, your confidence intervals are not tight. They're, like, "we think the propensity is somewhere between 8% and 32%." That is not a useful range to give a branch manager who's trying to decide which 20 customers to call. The sample size issue is real and it's not going away just because the client wants the answer.

So what do you do?

I wrote up the statistical explanation and brought it to Tae-yang on a call. I showed him the confidence interval math for a 200-person branch versus a 2,000-person branch. At the big urban branches, the intervals are reasonable. At the small rural ones, you'd need to either pool observations across similar branches or accept that you're making predictions with high uncertainty. He listened carefully and then said, "can we pool by metro area?" Which is actually a smart suggestion. It's what a statistician would propose if they'd been thinking about it. But Tae-yang got there from pure business instinct, not statistics, which is something I've noticed: the best clients are the ones who can reason about tradeoffs even when they don't know the formal vocabulary.

We're now building a hierarchical model that borrows statistical strength across branches within the same metro cluster. The small branches benefit from the data in the larger branches to stabilize their estimates. Tae-yang said "so you're letting the big branches teach the small ones." I said yeah, basically. He liked that framing. That took three weeks of back-and-forth and one more statistics-to-English translation session with Simon, who is much better at those sessions than I am. I'm working on it.

The best clients are the ones who can reason about tradeoffs even when they don't know the formal vocabulary.
— Morgan

You mentioned Simon. What's working for him like?

Simon is exacting. He was at McKinsey before founding Thornfield and he has the communication standards of someone who has spent twenty years writing for senior executives. Every deliverable I produce, he reviews it and leaves comments. Not just on the analysis, on the phrasing. He once changed "the model performs well" to "the model achieves an AUC of 0.84, which exceeds the 0.80 threshold we established in the project brief." Same information. Very different level of accountability in the language.

It's genuinely made me better. I now write client-facing documents differently than I wrote my work at the CPG company. I'm more precise about what the analysis can and can't say. I'm more careful about the difference between "this model predicts X" and "this model, under these assumptions and with this data, is predictive of X." Simon would say those aren't stylistic choices, they're intellectual honesty choices. He's not wrong.

It's also occasionally exhausting. My friend Becca works as a software engineer at Amazon and makes $240K and doesn't have someone rewriting her pull request messages for precision. I make substantially less than that. The work is more interesting to me than most SDE work I've seen, but there are days when the Simon review process lands on a Thursday afternoon and I think, I could just be shipping features and going home.

What does your average week look like in terms of how you're actually spending time?

Roughly: two full days of actual modeling and analysis work, one day of client communication and documentation, half a day of internal stuff like project management and status updates to Simon, and the rest is whatever that week's specific fire is. On the bank project it's been the hierarchical model work, which is technically interesting and also time-intensive because I'm writing it in Stan, and Stan is not a language you pick up in an afternoon. It's a probabilistic programming language. Bayesian. The documentation is good but the debugging error messages are, in Becca's words, "built for academics." I've had it running correctly for two weeks and I still can't tell you I fully understand one of the prior choices Simon recommended. I just know it's more stable than my first attempt.

The part nobody talks about

What's yours?

How much of this job is managing what the data can't say. The bank wants branch-level precision and the data doesn't support it. That's not a technical failure. That's just how statistics works. But when you tell a client "the data doesn't support the precision you want," there is a version of that conversation where they hear "the data scientist doesn't know how to do what I asked." Those are very different things and the gap between them is your communication problem, not theirs.

I spent the first year frustrated that clients couldn't tell the difference between a legitimate data limitation and a skill gap on my end. Now I understand that it's my job to make that distinction clear, proactively, before they've drawn the wrong conclusion. Simon calls this "setting the evidence ceiling at the start of the engagement." You tell the client early what questions their data can and cannot answer at what level of confidence. Then when you hit a limitation, you've already named it and it doesn't look like you're covering something up. That reframe took me about 18 months to fully internalize. It's the thing I wish someone had told me before I took a client call.


What It's Like Being a Principal Data Scientist at a Large Tech Company

C

Cora

44Principal data scientist, cloud monitoring product division, large tech company in Seattle7 years at the company, 3 as principal · PhD statistics from UW · 8 years academic research before this
Keeps a handwritten notebook she calls the Decisions Log. Every significant decision and the reasoning behind it, written at the end of the meeting where the decision was made. She's on Volume 4. Each volume is a standard composition notebook. She's re-read Volume 1 exactly once and found it disorienting, like reading her own emails from years ago.

You have a PhD and eight years of academic research. How did you end up in industry?

Grant funding. That's the honest answer. I was an assistant research scientist at UW working on spatiotemporal statistical methods, which is applied statistics for things that vary across space and time, like disease spread, environmental monitoring, climate data. The work was genuinely interesting and I was good at it. But the funding structure in academic research is a cycle of writing grants, waiting, maybe getting funded, doing the work, writing more grants. I had a colleague who wrote three grants in a year and got zero. Her lab closed. She pivoted to industry and within six months she was making twice her academic salary. I watched that and started paying attention.

I came to the company seven years ago as a senior data scientist. The tech was very different from my academic work, but the statistical reasoning underneath it wasn't. My first project was building an anomaly detection model for server latency data. Which is, if you squint, not that far from environmental anomaly detection. Data that arrives over time, with seasonality, with noise, and you're trying to separate signal from variance. I adapted faster than I expected to. The things I missed from academia I started to miss about two years in, once the novelty of industry wore off. But by then I had enough stake in what I was building that leaving wasn't obvious.

What are you working on now?

A new anomaly detection feature for our cloud monitoring product. The product helps companies understand the health of their cloud infrastructure. When things go wrong, usually metrics start behaving unusually before the actual failure happens. Higher latency, weird CPU spikes, gradual memory creep. The feature I'm building would flag those automatically, give operators a head start before something breaks.

The problem I ran into two months ago: I sent a survey to the four engineering teams that would be using the output, asking them to define "anomaly" so I could understand what the model needed to detect. The survey had seven questions and took about eight minutes. I got four meaningfully different answers.

Team Osprey wants rare one-off spikes. Ninety-ninth percentile, duration under five minutes, clear departure from baseline. Team Fulcrum wants gradual degradation over a 72-hour window, the kind of thing that looks normal on any given hour but is clearly trending wrong if you zoom out. Team Navarra wants contextual anomalies, things that would be normal in isolation but are anomalous given what else is happening in the system. And the fourth team, Kestrel, wants the model to learn each service's individual normal behavior and flag deviations from that, which is basically asking for a different model per service. Four teams, four definitions. I've now been writing what I'm calling an anomaly taxonomy document for the past three weeks before I've written a single line of model code.

That sounds frustrating.

It's the job. If I'd just picked one definition and built to it, I'd have shipped something that was wrong for three of the four teams and I'd be defending it in every stakeholder review. The taxonomy document is slower but it's the right order. My skip-level manager Lorenzo keeps asking for a timeline update and I keep saying "the definition work takes as long as it takes." Lorenzo came out of engineering and his instinct is that definition work should happen over a few days, not a few weeks. I've had to explain twice now why statistical definition precision matters for model validity. He's not dismissing it; he just has a different intuition about scope.

What I've learned about projects of this type is that the definition disagreement is almost never about the words. It's about what problem the team is actually trying to solve. Team Osprey gets paged at 2 AM for outages. They want something that fires only when it's real, because a false positive means someone waking up for nothing. Team Fulcrum deals with degradation that causes customer churn over weeks. Their tolerance for a slow-building false alarm is higher because the cost of missing it is also higher. Once I understood that, I stopped trying to build one model that satisfies all four definitions and started building a framework where teams configure sensitivity parameters based on their context. That reframe is what's in the taxonomy document. It took three weeks to get to something I believe in.

The definition disagreement is almost never about the words. It's about what problem the team is actually trying to solve.
— Cora

How has the job changed over seven years? What's different at the principal level?

The leverage. At senior level you're building things. At principal level you're deciding what gets built and why. I spend more time in design reviews, more time reviewing other people's modeling choices, more time in conversations like the Lorenzo one where I'm translating statistical reasoning into business language. I write more documentation per week than I write code, I think, though I don't track it precisely.

What I didn't expect is that the seniority also means you're one of the first to know when something isn't going to work. Not because anyone tells you. But because you've been around long enough to read the patterns. Lorenzo hasn't responded to my last two Slack messages about the anomaly project budget. The skip-level above him hasn't shown up to the last three roadmap reviews. When I was a senior DS, I'd have thought that was scheduling noise. Now I recognize it as a pattern that precedes a project getting deprioritized. I opened a new entry in the Decisions Log on Monday. It says: if this project gets paused, here is the statistical work we should preserve so the next team doesn't start from scratch. I don't know if I'll need that entry. But it exists now.

My partner teaches math at a high school in Bellevue, and she asks me sometimes what I'm working on. When I describe the anomaly taxonomy problem, she immediately says "oh, so you need a common definition before you can teach the concept." That's exactly it. She got there in twenty seconds. Fifteen years of teaching statistics through geometry and algebra and she understands the problem I've been circling for three weeks. That's either comforting or clarifying, I'm still deciding which.

The part nobody talks about

What's yours?

How much the job is about what you don't say. A PhD trains you to qualify everything. The confidence interval is always there in your head. The assumptions behind the model are always visible to you. In a boardroom or a roadmap review, you say the main finding and you hold the uncertainty in your head and you decide in real time how much of it to surface. Surface too little and you're misleading people. Surface too much and you're undermining the work. That judgment call happens dozens of times a week and there's no formula for it.

I have a grad student I advise remotely, Phuong, and she's finishing her dissertation on signal detection in sparse time series. She's brilliant. She's also going to struggle with this when she gets to industry, because she will want to communicate all of the uncertainty because that's the scientific standard, and in most business contexts that level of qualification reads as lack of confidence. I've been trying to help her understand that the audience shapes the communication without compromising the rigor. It's possible to be precise and concise at the same time. It just takes more practice than the dissertation teaches you. I'm on Volume 4 of the Decisions Log and I'm still figuring it out.


Would They Do It Again?

Ravi
Yeah. Mostly.

The frustration of the model-that-sits-in-production is real. But I'm also building things that will run on tens of thousands of member records, and when they work, they actually work. I don't know if I'll be at this company in two years. But I'd do data science again.

Morgan
Yes. But not the CPG job first.

The consultancy made me better in a year than the CPG would have in three. The client work is genuinely harder and I'm glad I have it. I'd skip the large-company year and come here directly. I didn't know enough to know that was the right path. I know now.

Cora
Yes. The anomaly taxonomy is the job.

I miss academia sometimes, mostly the freedom to work on hard problems without a roadmap review. But what I don't miss is not knowing if my work would still be funded in 18 months. The Decisions Log exists because I've built things worth documenting. That's not nothing.


Frequently Asked Questions About Data Science

What does a data scientist actually do all day?

Most data scientists spend roughly 40 to 60 percent of their time on data preparation, pipeline debugging, and translating business questions into answerable statistical problems. The actual modeling is a smaller fraction than job descriptions suggest. Stakeholder communication, documentation, and institutional work, defining what a question means, aligning teams on definitions, explaining why the data can't do what someone wants, take up the rest.

Is data science hard to break into?

It has gotten harder since 2022. A portfolio and statistics background was often enough in 2019 and 2020. By 2025 the bar shifted toward master's degrees, PhDs, domain expertise, or strong engineering skills alongside the statistics. People entering from other fields, like accounting, teaching, or operations, often find domain knowledge is a meaningful differentiator, but it takes longer to demonstrate than a credential.

Is data science mostly coding?

Coding is the main tool but not the main skill. The main skill is translating ambiguous business questions into answerable statistical questions, then translating the answers back into language that changes a decision. The coding, SQL queries, Python models, pipeline automation, is significant and non-negotiable. But data scientists who struggle most are often technically strong and poor at the business translation layer. The job requires both.

How much do data scientists make?

Base salaries range from around $90,000 junior at mid-tier markets to $180,000 senior at large tech companies. Total compensation at major tech firms can reach $250,000 to $350,000 or higher when stock is included, though vesting schedules and market performance affect actual take-home. At healthcare systems, nonprofits, and mid-size companies outside tech hubs, senior data scientist salaries typically run $100,000 to $140,000. The spread is large and heavily weighted by company type and location.