7:20 AM
I wake up to my phone buzzing. It's a PagerDuty alert from the churn prediction pipeline. The Airflow DAG that retrains the model nightly failed at 3:14 AM. I check the error log from bed. It's a Snowflake connection timeout. Not a code problem, an infrastructure problem. I tag it for our data engineer Kellen in Slack and close my eyes for another twenty minutes. This happens maybe twice a month. Kellen usually fixes it before I finish breakfast.
8:05 AM
Coffee. I use a French press because it's the only thing I can do in the morning that doesn't involve a screen. Five minutes of standing in my kitchen watching water darken. My roommate Jess, she's a nurse at UPMC, is already gone. Her shift started at 6. I used to think my job was intense and then Jess told me about her Tuesday and I never used the word "intense" about data science again.
8:25 AM
Open laptop. Kellen already fixed the pipeline. Retraining completed at 7:48 AM. The churn model is current. I pull up the model monitoring dashboard in Datadog and check the feature drift metrics. Everything within normal bounds. AUC is 0.81, which is where it's been for six weeks. I note the date and the metric in a tracking spreadsheet I keep. My manager Prashant doesn't require this, but I started doing it after a model degraded slowly over three months and nobody noticed until a VP asked why our churn predictions were wrong. Now I check every morning. Takes four minutes.
8:40 AM
Standup. Seven people on the call, it takes eleven minutes. Prashant asks me where I am on the patient readmission model. I tell him I'm blocked on getting access to the ADT data from our hospital partner. The data sharing agreement was signed two weeks ago but the SFTP credentials haven't been set up yet. This is the third standup in a row where I've reported this same blocker. Prashant says he'll escalate. I believe him. I also know that "escalate" in a 120-person company means he sends a Slack message to someone who sends an email to someone at the hospital.
9:00 AM
Since I'm blocked on the readmission model, I pivot to the ad-hoc request queue. There are four requests in the Jira backlog. I pick the one from Elena, our head of customer success: "Can you pull the average time-to-value for customers onboarded in Q1 vs Q4 of last year?" Time-to-value is defined as the number of days between contract signing and the customer's first logged analysis in our platform. I know where this data lives because I've pulled it three times before. I open DataGrip and start writing the query.
9:35 AM
The query runs in 38 seconds, which is fast for this table. Q1 cohort: median 14 days. Q4 cohort: median 23 days. The Q4 number is higher because we onboarded a cluster of large enterprise customers in November who took longer to configure. I know this because I remember the onboarding calls. But Elena doesn't know that context. If I just send her "14 vs 23" she'll think Q4 onboarding got worse. So I segment by customer size: for SMB customers, it's 12 vs 13. For enterprise, it's 31 vs 34. For the November cluster specifically, it's 42. The story is "enterprise takes longer, and we had more enterprise in Q4," not "onboarding got worse." Writing this up in a Slack message that's clear without being condescending takes me about fifteen minutes. The query took 38 seconds. The explanation took fifteen minutes. That ratio is roughly representative of my job.
The query took 38 seconds. The explanation took fifteen minutes. That ratio is roughly representative of my job.
— Thalia
10:10 AM
Next ad-hoc: "What's the correlation between feature usage depth and renewal rate?" This one is more interesting. I pull the usage data from our product analytics warehouse and the renewal data from Salesforce. The join key is account_id, but the product analytics table uses a hashed version and Salesforce uses the raw ID. I have a mapping table but it hasn't been updated since February. I spend 25 minutes reconciling the two, find 14 accounts that are in Salesforce but not in the product analytics data, which means they have contracts but have never logged in. That's a finding worth flagging. I note it separately.
11:00 AM
I run the correlation analysis. Pearson r = 0.43 between feature usage depth (measured as distinct features used per month) and 12-month renewal probability. Moderate positive correlation, not surprising. I build a quick scatter plot in Python, add a trend line, and note the outliers: three accounts with high feature usage that churned anyway. I look into them. Two had leadership changes. One had a budget cut. Usage depth doesn't protect against organizational change. I include that caveat in my writeup.
11:45 AM
Lunch. I walk to a place two blocks away that does Korean rice bowls. I eat at a table by the window and read a blog post about causal inference that my grad school friend Yun sent me. It's about instrumental variables. I understand about 70% of it and save the rest for later. This is how I learn now. Not textbooks. Blog posts over lunch.
12:40 PM
Back at my desk. I have two hours of unscheduled time before my 2:45 meeting. This is rare and I protect it. I open my Jupyter notebook for the churn model v2 experiment. The current model uses logistic regression because it's interpretable and our clinical advisory board can audit the coefficients. But I want to test whether a gradient-boosted model would improve accuracy enough to justify the interpretability tradeoff. I've been running this experiment in 30-minute increments for two weeks. Today I tune the hyperparameters using Optuna. I set up 100 trials and let it run.
1:15 PM
Optuna finishes. Best trial: AUC 0.86 vs the current model's 0.81. Five percentage points. That's meaningful. But the clinical advisory board meets quarterly, and Prashant warned me that they pushed back hard the last time someone proposed a non-interpretable model. The board chair is a physician named Dr. Lourdes who said, and I'm quoting from the meeting notes, "If I can't explain why the model flagged this patient, I won't use it." I need to think about how to present this. SHAP values might bridge the gap. I make a note to build SHAP explanations for the top 20 features and bring those to the next advisory meeting.
2:00 PM
Slack from Elena: "That usage analysis was really helpful, thank you. Quick follow-up: can you break it down by industry vertical?" I can. It will take about 45 minutes because the industry tags are in yet another table that needs reconciliation. I tell her I'll have it by end of day tomorrow. She says "perfect." The follow-up request is always where the real time goes.
2:45 PM
Weekly product sync. Twelve people in the room. The product manager, two engineers, the design lead, customer success, and me. I present the churn model's latest performance metrics: precision 0.74, recall 0.69 at the current threshold. The product manager asks if we can lower the threshold to catch more at-risk accounts. I explain that lowering the threshold would increase recall to 0.82 but drop precision to 0.58, which means 42% of the accounts flagged as at-risk would actually be fine, and customer success doesn't have the bandwidth to follow up on that many false positives. This is the conversation I have every three weeks. The answer is always the same. But each time someone new in the room hears it for the first time, so I explain it again.
3:30 PM
Meeting ends. I spend 40 minutes documenting the hyperparameter experiment in Notion. Model version, dataset hash, feature list, best parameters, performance metrics, next steps. Prashant reads these. Nobody else does. But six months from now when someone asks "did we try a GBM for churn," the answer will be findable.
4:15 PM
Start on Elena's industry breakdown. I pull the vertical tags from the CRM export, join to the usage data, and realize that 23% of accounts don't have an industry tag. I message Elena about this. She says "tag the untagged ones based on their website." I do not want to do this. But I also know that presenting results with a 23% "unknown" category will undermine the analysis. I pull up 15 of the untagged accounts, look at their websites, and assign verticals. It takes 35 minutes. This is the least scientifically rigorous thing I do on a regular basis, and it's also probably the most practically useful.
5:10 PM
I close my laptop. The readmission model is still blocked. The churn v2 experiment needs a presentation strategy. Elena's follow-up is half done. The ad-hoc queue still has two items. I made progress on exactly none of the things I was supposed to prioritize this sprint and meaningful progress on three things that weren't on the sprint board. This is a normal Wednesday.
6:30 PM
Jess gets home. She asks how my day was. I say "fine, I explained the same precision-recall tradeoff for the fourth time and manually tagged 15 companies based on their websites." She says "I had a patient code in the elevator." We eat pad see ew on the couch and watch a renovation show. Neither of us talks about work again.