Day in the Life of a Data Analyst
Three analysts wrote down everything they did on one ordinary workday. No interviews, no prompts. Just the day as it happened.
These characters are composites, built from dozens of real accounts, interviews, and community threads. The people aren't real. The experiences are.
What you'll learn
- A specific day in a data analyst's working life, broken down hour by hour
- The work that does not appear in the job description but takes up significant time
- What the role feels like from the inside on a normal day
Nora's Wednesday
Coffee from the Chemex, which takes six minutes and is the only part of my morning I control completely. Open laptop at the kitchen table. Seventeen Slack messages overnight, which is normal. Most of them are from the West Coast merch team reacting to the weekly sales dashboard I refreshed yesterday. One message from Deepa, my manager: "Can you look at the home textiles number? Kevin in merch says it doesn't match what he's seeing in Oracle." This is how most of my days start. Something doesn't match something.
Dig into the home textiles discrepancy. Kevin's Oracle report shows $4.2 million in home textiles revenue for February. My Looker dashboard shows $3.87 million. The difference is $330,000, which is too large to be a rounding issue and too specific to be a methodology difference. Something is actually wrong somewhere. Open BigQuery and start writing a reconciliation query. Pull every home textiles transaction for February from our warehouse, then pull the same from the Oracle extract. Compare row counts. My warehouse has 41,200 transactions. The Oracle extract has 43,800. So I'm missing 2,600 transactions.
Found it. The Fivetran connector that syncs Oracle to BigQuery had a hiccup on February 18th. The sync log shows a timeout at 3:47 AM. It retried and completed, but it skipped a batch of 2,600 records from the textiles product category because they were in a staging table that had already been flushed by the time the retry ran. So my warehouse is missing one day of home textiles data. I've been showing an incorrect number for three weeks. Nobody noticed until Kevin looked at both systems side by side yesterday.
Slack Deepa: "Found it. Fivetran sync gap on Feb 18. Missing 2,600 textile transactions. $330K delta. Backfilling now." She replies with a thumbs up. Kevin doesn't reply at all.
Weekly analytics team standup on Zoom. Three of us: me, Rafa, and Deepa. Rafa's working on a customer segmentation model. I mention the Fivetran issue. Deepa asks if we should set up an automated reconciliation check. I say yes, I'll build one. Add it to my list. The list has fourteen items on it. This one goes after "fix the mobile conversion funnel dashboard" and before "investigate the loyalty program anomaly from last week." The list grows faster than it shrinks. It always does.
Ad-hoc request from Layla in e-commerce marketing. She wants to know the average order value for customers who clicked a specific email campaign last Tuesday compared to customers who didn't receive it. This takes twenty minutes of SQL and five minutes to write up. The answer is $67.40 for clickers versus $52.10 for non-recipients, but I add a note that this isn't causal since the email was sent to high-value customers anyway. Layla says "great, thanks!" I don't think she read the note about causality. I'll see that $67.40 in a slide next week positioned as proof that the campaign "drove" higher spending.
Lunch at my desk. Leftover pad see ew from last night. Read a dbt Slack thread about incremental models while eating. Someone is having the exact same Fivetran sync problem I just fixed. I type up my solution, then delete it because I realize I'd have to explain our entire warehouse architecture for the context to make sense. Eat the pad see ew instead.
Back to the backfill. Write a manual load script to pull the missing 2,600 records from Oracle and insert them into BigQuery. Test it in dev first. Row counts match. Revenue totals match to the penny. Push to production. Refresh the Looker dashboard. Home textiles now shows $4.2 million. Same as Kevin's number. Three hours of work to fix a number that was wrong for three weeks and that one person noticed.
Start building the automated reconciliation check Deepa asked for. Write a BigQuery scheduled query that compares daily row counts between the Oracle extract and the warehouse tables, by product category. If the delta exceeds 0.5%, it sends a Slack alert to our analytics channel. This is the kind of work I find satisfying: infrastructure that prevents future problems. Nobody will thank me for it. If it works, the sync gaps just silently get caught. If it doesn't work, nobody will know it existed.
Rafa pings me on Slack. "Hey, does the loyalty points table in BigQuery have a created_at or an updated_at?" I check. It has both but the updated_at hasn't been populated since October. Someone changed the Oracle schema and the Fivetran mapping didn't update. I tell Rafa. He says a word I won't write here. I add "fix loyalty points updated_at mapping" to the list. The list now has fifteen items.
Close the laptop. Walk to the living room. My roommate, Jen, asks how work was. I say, "I found 2,600 missing transactions and fixed a number that was wrong for three weeks." She says, "That sounds important." It was. Nobody will remember it happened by Friday.
Omar's Thursday
Board meeting is Monday. Which means today and tomorrow are board deck prep, which means my entire job for the next 48 hours is making numbers look correct and clear enough for eight people who will glance at them for ninety seconds each. Open the board deck template in Google Slides. Fourteen slides need updated data. Our CEO, Yara, sent me a list at 11 PM last night: "Can we refresh slides 3, 5, 7, 8, 9, 11, 12, 13, 14, and 16 with January actuals? And can you add a slide on the cohort retention thing we talked about?"
Start with slide 3, the MRR waterfall. Pull January numbers from our Stripe data in BigQuery. New MRR: $187,000. Expansion: $42,000. Contraction: $18,000. Churn: $31,000. Net new MRR: $180,000. These numbers are straightforward because Stripe is clean. I have a dbt model that does the categorization automatically. The hard part isn't getting the numbers. The hard part is that Yara will want to know why churn was $31,000 when the forecast was $22,000. So I have to dig into the churn.
The churn delta is one customer. A mid-market account called Brenten Financial that was paying $8,700 a month and canceled on January 23rd. I check the CRM. Our account manager, Dani, logged a note: "Brenten switching to in-house solution. Not competitive loss." So the churn spike is a single account making a build-vs-buy decision. I write a one-sentence annotation for the slide: "January churn elevated by single mid-market cancellation (Brenten Financial, $8.7K MRR, switched to in-house)." Yara will appreciate this because it turns a scary number into a story. Boards like stories more than numbers. Which is ironic because they asked for the numbers.
Stand-up with the product team. I'm not on the product team but I go to their standup because half the metrics they discuss come from dashboards I built and if I'm not there, someone will misread a chart and make a decision based on a number they don't understand. Today the PM, Kenji, says "activation rate dropped to 34% last week." I built that metric. I know that 34% is correct but misleading because we changed the activation definition two weeks ago to include a second event. Under the old definition, activation was 51%. I say this. Kenji says, "Oh. Can we see both?" I say I'll add a toggle. Add it to the list.
Back to the board deck. Slides 5 through 9 are the growth metrics: user signups, activation, revenue per user, LTV, and CAC. Each one requires a query, a check against last month's number, and a brief annotation explaining any change greater than 10%. This is routine work. I've done it for eight board meetings now. The queries are saved. The dbt models are stable. The actual analytical thinking required is about twenty minutes. The formatting, alignment, font sizing, and annotation writing takes three hours. I spend more time making the chart look right in Google Slides than I spend generating the data behind it.
Lunch from the taqueria on Mission. Carnitas burrito, extra salsa verde. Eat at my desk because the board deck won't finish itself. Our other analyst, Priti, is on PTO this week, which means I'm solo. Normally we'd split the deck. Today it's all me. I don't resent Priti for taking PTO. I resent the company for having two analysts when the workload clearly requires three.
The cohort retention slide. This is the one Yara added last night. She wants to show the board that our January cohort is retaining better than November and December. I pull the data. She's right, it is. January 30-day retention is 68% versus 61% for December and 57% for November. Good trend. But I look deeper and notice that January had a much smaller cohort because we paused paid acquisition for two weeks during a campaign restructure. Smaller cohort, self-selected, higher retention. It's not that the product got stickier. It's that we only acquired high-intent users for half the month.
I Slack Yara: "January cohort is smaller because of the paid pause. Retention is higher but it may be a composition effect, not a product improvement. Want me to note this?" She replies: "Add a small footnote but keep the chart. The trend line is real even if the cause is mixed." This is the negotiation that happens every board meeting. I want precision. Yara wants narrative. We meet in the middle, which is a footnote in 10-point font that no board member will read.
Slides 11 through 16 done. Revenue mix, cash burn, runway, headcount, and the competitive landscape slide that I don't actually update because it's marketing's job but they never do it so I add one new competitor that launched last week. Takes five minutes. Send the full deck draft to Yara. She'll review tonight and send me a list of changes at 11 PM, like she always does, and I'll make them tomorrow morning. That's the rhythm. I generate, she edits, I revise. Repeat until Monday at 8 AM when the deck gets sent to the board.
Start on the activation toggle Kenji asked for. Build a Metabase question with a parameter that switches between the old and new activation definitions. It takes forty minutes. Test it. Both definitions match the numbers I have in the board deck. Send Kenji the link. He says "perfect." He will use this toggle once, show it to his PM lead, and then forget it exists. I know this because I've built eleven toggles and filters for the product team and the usage logs show that nine of them have been accessed fewer than three times after the first week.
Walk home through the Mission. Call my sister, Leila, who is a middle school teacher in Sacramento. She asks what I did at work today. I say, "I put numbers in a slide deck and argued about a footnote." She says, "You sound tired." I am. But it's not the work that's tiring. It's the feeling that the work matters for about twelve hours, from when the deck is sent to when the board meeting ends, and then it doesn't matter at all until the next quarter when we do it again.
Tessa's Monday
Drive to the office because the hospital system still believes in offices. Twenty-two minutes from Fishtown. Park in the garage that costs $180 a month, which they don't reimburse because "parking is a personal expense." Badge into the analytics department on the fourth floor of the admin building. The fluorescent lights are the kind that make everyone look slightly unwell, which is fitting for a hospital.
Check the overnight data loads. Our Epic EHR data feeds into a SQL Server warehouse via nightly ETL jobs. Four jobs ran successfully. One failed: the surgical volume feed. Error log says a connection timeout at 2:14 AM. This happens about once a month. I restart the job manually. It runs for eleven minutes and completes. Check the row counts: 847 surgical cases for last week across our three hospitals. Compare to last Monday's load: 831. Close enough. No anomalies.
Weekly meeting with Dr. Pham, the CMO, and two other analysts on my team, Jackie and Rob. Dr. Pham wants a report on emergency department boarding times. Boarding time is how long a patient waits in the ED after being admitted before they actually get a bed on a floor. CMS tracks this. Our target is under four hours. Dr. Pham says the medical director at our Germantown campus, Dr. Okafor, is complaining that boarding times are up and wants data. I tell Dr. Pham I'll have a preliminary pull by end of day.
Start pulling ED boarding data from the warehouse. The boarding time calculation isn't in a standard report. I have to compute it from two timestamps: the admit decision time (when the ED physician decides to admit) and the bed assignment time (when the patient is actually placed in an inpatient bed). These timestamps live in two different tables in the Epic data model. The join is on the patient encounter ID, which should be straightforward, except that about 3% of encounters have duplicate admit decision records because the physician revised the admit order. I write a deduplication subquery that takes the earliest admit decision per encounter. This kind of thing is what healthcare data analysis actually is: ninety minutes of data cleaning disguised as a simple question.
Preliminary numbers. Median boarding time at Germantown for the last 90 days: 5 hours 12 minutes. Our Center City campus: 3 hours 48 minutes. Northeast campus: 4 hours 31 minutes. Dr. Okafor is right. Germantown is the worst and it's over the CMS target by more than an hour. But I notice something. The Germantown number has a bimodal distribution. Most patients board in under four hours. But there's a cluster of patients who board for eight to twelve hours, and they're pulling the median up. I filter by admission service. The long boarders are almost all medicine patients, not surgical. Surgery patients get beds quickly. Medicine patients wait.
This makes sense clinically. Medicine beds are at higher occupancy than surgical beds at Germantown because they converted a med-surg unit to a surgical step-down unit six months ago. So they gained surgical capacity and lost medicine capacity. The boarding time problem is a bed allocation problem disguised as an ED throughput problem. Dr. Okafor is yelling at the ED. The answer is on the fourth floor.
Lunch in the cafeteria with Jackie. Turkey sandwich, apple, water. Jackie is working on a readmission model for heart failure patients. She's frustrated because the clinical data has so many missing fields. "43% of the hemoglobin A1c values are null," she says. I tell her to check whether the nulls correlate with insurance type, because in my experience, Medicaid patients are less likely to have outpatient labs in our system since they often get labs done at community health centers that don't feed into our Epic instance. She says she hadn't thought of that. She'll check after lunch. This is the kind of institutional knowledge that takes years to build and doesn't exist in any documentation.
Write up the boarding time analysis. Two pages. Executive summary at the top: "Germantown ED boarding times elevated due to medicine bed capacity reduction following surgical step-down conversion in September 2025." Three charts: boarding time by campus, boarding time distribution at Germantown showing the bimodal pattern, and boarding time by admission service at Germantown. I mark it draft and send it to Dr. Pham. He'll review it, ask me to soften the language about the bed conversion because it was his decision, and then present it to Dr. Okafor as though the insight was collaborative. This is fine. I don't need credit. I need Dr. Okafor to stop blaming the ED.
Ad-hoc from the quality team. They need thirty-day readmission rates by DRG for our annual quality report due to the state. I have a template for this. Run the query, which takes seven minutes because the readmission logic involves a self-join on the encounters table looking for any inpatient admission within thirty days of a prior discharge. 14,200 index admissions. 1,847 readmissions. 13.0% overall rate. Slightly better than last year's 13.4%. Break it down by DRG. Heart failure is the highest at 22.1%. COPD is second at 18.7%. Joint replacement is the lowest at 4.2%. None of this is surprising. These numbers are roughly the same every year. I format the report, triple-check the methodology note, and send it to quality.
Dr. Pham replies to the boarding time analysis. "Great work. Can you remove the sentence about the step-down conversion and instead say 'medicine bed capacity constraints'?" I change it. Same meaning, less blame. Send it back. He says thanks. I save the original version in my personal folder because I've learned to keep the unedited analysis separate from the version that goes upstairs.
Drive home. Twenty-eight minutes because of construction on Broad Street. Think about the boarding time analysis. Think about the fact that a bed allocation decision made by an executive six months ago is causing patients to wait five hours in an emergency department, and the executive asked me to remove the sentence that connects those two things. I'm not angry. This is how hospitals work. Everybody wants the data until the data points at them. My job is to find the truth and then help people present a version of it they can live with. Most days that feels like enough.