So You Want to Be a Data Scientist? Read This First
Lessons from the Trenches: What They Don’t Tell You in Courses (or Anywhere Else)
So, you want to be a data scientist? Or something in the data world? Just like… well, everybody else, right?
Well, who doesn’t? And let’s be real, I thought I did too (and to some extent, still do).
But first, ask yourself - why?
You don’t know if you don’t try - make your own mistakes and decide for yourself!
Or maybe you already are one and find yourself thinking, “Why didn’t anyone warn me about this?” Or better still, “what was I thinking?”
I’ve been there — straight out of a PhD, I thought there would be more “science” in data science than there really is. I thought data science, or at least a job in ML and AI, would be largely about novel algorithms, AI breakthroughs, and revolutionizing industries.
After five years in the field, I now know that half the battle is just acquiring or finding data, and then wrangling the messy data. Then comes the fun part: convincing stakeholders that you have added enough “business value”. Most stakeholders also need their ego stroked, so good luck convincing them that AI cannot replace them.
The other half? Convincing yourself that AI won’t replace you, either. What this actually means: constantly stay on the hamster wheel informing yourself of updates and “productivity hacks” which is basically an endless race against time and the hustlers out there - it can be a lot of fun in the beginning (to be completely honest) but it isn’t for the fainthearted or the easily discouraged.
I started working with statistics and analytics during my PhD, only to already discover then that data science (the discipline, not the job) isn't magic—it’s about having the right data, not just having more of it. Stepping into industry, I learned that the real challenge wasn't just machine learning or statistics; it was figuring out what stakeholders actually wanted.
Here is some good news as a filler: they still don’t know what they want which is why data jobs still exists, Gen AI or not!
So why am I writing this today? This is meant to be a note and pointers to my younger self but to all of you who might need not just a reality check but actionable insights that would prepare you for the journey ahead.
You will suffer anyway, you can only choose how.
After four years in academia and five in industry, here are 10 brutally honest lessons I wish someone had told me earlier.
1️⃣ Domain Knowledge >> Fancy Algorithms
If I could have twenty EUR cents for time every somebody told me “focus on the data science fundamentals and you can work in any industry”, I would be a millionaire today. There is admittedly some (if little) truth to these statements but for me, at least, they come with a huge caveat.
Firstly, you need to be interested (enough) in the domain to educate yourself about it thoroughly enough to see (and show) how the data science you do could possibly add value (for scientific disciplines, this can be an uphill climb). Secondly, and more importantly, data skills are transferable only if you allow them to be. Context switching and being able zoom in and out of problems are just as important skills.
Sure - good for you if you can build a neural network in your sleep. With AI-assisted coding and chatGPT, the word of coding has changed anyway. In any case though, if you don’t understand the industry you’re working in, none of it matters anyway. Why? And here’s the hard truth, nobody cares.
No employer or stakeholder cares about how good a data scientist you are, they need to know how good you can be for them. And help them make money while they’re paying you a meagre salary.
The best data scientists aren’t just good at algorithms, analysis, and coding—they understand the business problem, the background and the context inside out. That’s what separates an impactful data professional with an analytical from just another person (vibe) coding away through their day.
The first job I had was the hardest to find because I competed with the general market for a generalist role. And a PhD with a science degree is not going to beat the general market - definitely not right now. What I learnt the hard way: niche down, stand out.
TLDR: Consider building data science skills and a portfolio aligned with a sector and industry that excites you (or you have a background in). It will become exponentially easier for you to stand out.
✔ Tip: Spend time with domain experts, ask questions, and learn how the business operates. Understand bottlenecks and inform yourself about the industry and competitors. Your data science know-how will instantly become more valuable.
2️⃣ Welcome to Data Cleaning Hell: Enjoy your stay
Dreaming of cutting-edge AI? Guess what— I can only speak for the jobs I have witnessed or been in, but most of your time will be spent chasing data. Followed by cleaning up the messy, inconsistent, and incomplete data that you have just been blessed with.
If only I had more data, everything would work. Really? You don’t say.
Being able to evaluate the data that is available and the requirements for an application, in a niche field, to work is what makes or breaks the show. And guess how we’re going to do that?
Remember the domain knowledge I kept harping out? This is where that can pay dividend. One has to be able to analyze data requirements and access custom data to add business impact.
When the world digs for gold, I will sell shovels.
Once you know the data you need - and you have it - what’s next? Clean it and clean it fast. It’s the unsexy, unspoken reality of data science, but mastering data wrangling is what makes or breaks your projects.
✔ Tip: Get really really good at Pandas, SQL, and regex—they will save your life more times than deep learning or Gen AI (insert suitable alternative if you can’t relate) ever will.
3️⃣ If You Can’t Explain It, It Doesn’t Matter
You’ve built a “state-of-the-art” AI model with 99.9% accuracy? Cool. But if you can’t explain it in simple terms to a non-technical audience, it’s useless.
Well, worse than useless actually - what happens to all the hours and money you burnt up? If you can’t explain to the people with money what you’ve done and why they need it for their business (like yesterday), it’s not happening.
“Data storytelling” is a term that always caught my fancies for this reason, the individual narratives created around the data science we do is one of the few truly unique ways we can still be “human” today. Don’t let that chance slip away!
If you can’t explain it simply, you don’t understand it well enough.
The best data scientists don’t just crunch numbers and hack away at a computer—they tell compelling data stories, drive decisions and turn heads in business discussions. Being able to deliver that soft punch with data insights will command respect in a way that working with the next new tool can’t.
✔ Tip: Practice explaining your insights to a non-tech friend. If they don’t get it, simplify it further.
4️⃣ Garbage In, Garbage Out: Not All Data is Good Data
Okay, this one is self explanatory. The best data science carried out on incorrect or inaccurate data or even biased data would lead to questionable conclusions.
As already established, data science is the process - a means to an end. It is relatively useless without the end in sight - it is the insight, the application and the value add at the end that justifies the effort.
With Gen AI, one can easily wreak havoc on a codebase and data architecture without checking details carefully. Remember the good old days where we would check for missing values, data completeness and rigorous exploratory data analysis before writing a line of code. Turns out these lessons aren’t so useless after all.
If I had to summarize my learnings about data quality and preliminary checks into one line, it would be: don’t jump the gun.
Bias, missing values, duplicate records, and tampered data—among many other issues— lead to one inevitable outcome: bad data results in bad models.
If you don’t check your data quality, even the most sophisticated AI models and coding assistants will produce garbage results. And guess who will be held accountable?
✔ Tip: Always ask “Where did this data come from?” before trusting it. Make sense of the data and think from the user’s point of view. Your future self will thank you.
5️⃣ Experiment Like a Scientist
With a scientific background, this is one thing that came to me naturally - to put the “science” back into data science. Corporate speak and the Gen AI wave would have you think data science is a bunch of scripted rules that can be executed in order (and by just about anybody).
As it turns out, not so much. Not with the kind of data science that will give you employment today anyway. The only way to remain valuable is to be able to explore the unknown. Inculcating curiosity
Data science is part art, part science—meaning there’s no single best approach.
The best insights don’t have to come just from reading research papers—they come from tweaking, testing, and iterating. The “fail fast, learn fast” culture is predominant in AI and data science today and as such, it is one to be embraced.
✔ Tip: Don’t be afraid to fail fast and iterate—your best ideas might come from unexpected places.
6️⃣ Not Learning? You’re Falling Behind
Time to wake up and smell the coffee: working in tech today, especially in AI, is no bed of roses. If the job feels too comfortable, I’d say it won’t last for too long. How do I know? I was laid off from my first job because I was too naive to see the danger signs and stayed in my artificially constructed comfort zone.
Ever since ChatGPT joining the party, it has been the best of times and the worst of time for data science and AI: there is no shortage of new developments to hone in on and upskill. Staying stagnant is equivalent to going backward, and very fast too at that.
My biggest regret: not working on enough pet projects, especially in NLP and GenAI. Learning by doing is massively underrated. I am here to change that perception.
AI and data science evolve faster than GPUs go out of stock. If you’re not continuously learning, you’ll quickly become outdated. Pick a niche aligned with your interest but make sure you are at the top of your game. Fastest way to fail? Try to do be good at everything, guess how I know?
✔ Tip: Follow industry blogs, take online courses, carry out new pet projects and engage with the data science community on LinkedIn/X.
7️⃣ Networking Is a Cheat Code for Your Career
You can be the best data scientist in the world, but if no one knows you, it won’t matter. As an introvert straight out of academia, I took my time to learn this as well. And I am no expert now but the awkwardness wears off with time.
And one realizes that the fastest way to learn is by interacting with experts in the field.
Who you know if more important than what you know.
The best job opportunities, collaborations, and insights come from building relationships.
Your network is your net worth.
✔ Tip: Go to meetups, join data science Slack groups, and engage on LinkedIn. Your network is your superpower.
8️⃣ Simplicity Wins: Don’t Overcomplicate Things
Remember Occam’s razor? The tendency to overcomplicate things and making them sound more complex than they are is irresistible, I know, especially when there is little output to show.
In data science, it is important to take a practical and pragmatic approach that can stand the test of time. Several sub-fields of data science, chemometrics in particular, still use classical machine learning and statistical methods. And probably for good reason.
A model doesn’t have to be complex to be effective. In fact, simpler models are often:
✅ Faster to deploy
✅ Easier to interpret
✅ More reliable
✔ Tip: If a logistic regression works just as well as a deep learning model, go with the simpler option.
9️⃣ Document Everything: Your Future Self Will Thank You
Ever spent hours debugging only to realize you forgot what you did last week?
Documentation isn’t just an annoying chore—it’s a lifesaver.
✔ Tip: Keep clear, structured notes on your experiments, assumptions, and findings. You’ll thank yourself later.
🔟 Failure Isn’t the End—It’s the Process
Not every approach will work. Not every algorithm will help you crack the code.
Not every project will succeed - and that’s okay.
The best data scientists fail, learn, and adapt—because every failure teaches you something new.
✔ Tip: Instead of fearing failure, ask yourself: "What did I learn from this?"
Final Thoughts: Data Science is a Marathon, Not a Sprint
Becoming a great data scientist isn’t just about technical skills—it’s about thinking strategically, adapting, and bridging AI with real-world problems.
If you’re in this for the long haul, build strong fundamentals, keep learning, and stay curious.
👉 If you found this useful, subscribe for more insights on AI, strategy, and career growth!