Data Engineering
Posts about Data Engineering
Step-by-Step Tutorial: Writing Python Extensions in Rust With PyO3
I hit a massive performance wall last Tuesday. I was tasked with parsing a 50GB dataset of nested JSON logs for a cybersecurity client doing malware.
Marimo vs Jupyter Notebook: Which Python Environment is Best?
I just spent three hours debugging a machine learning pipeline, only to realize I had executed cell 14 before cell 12.
How to Process Massive CSV Files With DuckDB and Python Fast
A 60GB CSV file lands in your AWS S3 bucket. Your data pipeline triggers, spins up a standard EC2 instance, and attempts to run pandas.read_csv() .
I Dropped Pandas for Polars. Here’s What Broke.
It was 11 PM on a Thursday last month. My data pipeline running on a t3.xlarge AWS instance crashed for the fourth time that week. The culprit?
Mastering Gil Removal: Advanced Techniques and Best Practices for Modern Developers
Introduction to Gil Removal In today’s rapidly evolving technological landscape, GIL removal has emerged as a critical skill for developers seeking to.
FastAPI on the Edge: Running Local LLMs on a Pi
I’m officially sick of renting $3/hour cloud GPUs just to parse text. For the last few weeks, I’ve been moving my background Learn about FastAPI news.
I Turned on the Python 3.14 JIT in Production (Well, Staging). Here’s the Truth.
Well, I have to admit, I was a bit skeptical about this whole Python JIT thing at first. In my experience, “free performance” usually comes …
TF 2.18 & Keras: Real-World Performance Review
I finally bit the bullet last week. After ignoring the notification icons for two months, I upgraded our main training pipeline to TensorFlow 2.18.
Taipy: Why I Finally Ditched Streamlit for Production Apps
Well, I have a confession to make. For the last five years, I’ve been utterly hooked on “script-to-web” tools.
LlamaCloud’s Multimodal RAG: Finally, No More Glue Code
Well, that’s not entirely accurate — I’ve actually been playing around with LlamaCloud for a while now. You know the drill.
