An ML competition for NTNU TDT4173
Out of Append's engagement with Hydro Aluminium, we wanted more eyes on a real forecasting problem from the smelter line. I designed and led a Kaggle-style competition for NTNU's TDT4173 ("Modern Machine Learning in Practice") course. Students built models, Hydro engineers benchmarked, and the metric measured the thing the business actually cared about.
The forecasting task
Each raw material has a rm_id and a stream of historical deliveries
through the end of 2024. Given any end date between 1 January and 31 May 2025,
the model has to predict the cumulative incoming weight of each raw
material from 1 January up to that date.
The dataset was split into a required core (receivals.csv,
sales_orders.csv) and optional extensions (material metadata,
suppliers, transportation). Anonymised, but real.
The metric: asymmetric quantile loss
Per-material loss at the 0.8 quantile:
QuantileLoss_0.8(F, A) = max( 0.8·(A − F), 0.2·(F − A) )
averaged across all materials. The asymmetry is deliberate: underestimating available raw material is cheap (the smelter just runs with what is on hand); overestimating is expensive (you plan a smelt that can't be completed). Penalising overestimation 4× harder than underestimation pushes models toward cautious forecasts, which is what the operations team actually wants.
The metric was implemented and unit-tested as a Jupyter notebook so participants could reproduce the leaderboard score locally.
What I built and led
- Brief. A course-ready task description with motivation, data definitions, and the metric, written for both NTNU students and Hydro engineers. Approved on both sides.
- Dataset. A curated slice of real operational data with a held-out test set the participants never saw.
- Scoring code. The quantile metric, in Python, packaged so anyone could compute their own leaderboard score before submitting.
- Rules & submissions. Who could submit what, how often, how leakage was prevented.
- Coordination with Hydro's domain experts, NTNU's course staff, and Append's team. The technical work was the smaller half.
What it taught me
Designing a competition is a useful inversion of doing one. You learn very fast which parts of an ML brief are ambiguous, because the ambiguity comes back as a hundred student questions in the first week. And the scoring metric is the whole project. If it does not reward what you actually want, the leaderboard will Goodhart its way to a wrong answer in days.
Course material
The brief handed to students and the data-table documentation are both in the repo: