Back to timeline
school Spring 2026

A vision transformer that knows its salmon

The ML half of a five-person NTNU project on monitoring invasive pink salmon (pukkellaks) in Norwegian fjord straits. Underwater cameras watch fish pass through a 1×2-metre observation chamber; a small CNN detects them locally on a Raspberry Pi; a DINOv2 vision transformer, fine-tuned with LoRA, classifies species and estimates length on the server. I shared hardware responsibility for the camera rig and built the server-side classifier.

Why pink salmon

Norwegian wild salmon (laks) is classified as near threatened on the national red list. The fastest-growing threat is pink salmon, an invasive Pacific species that spawns earlier and more aggressively, displaces wild salmon and sea trout from their best spawning grounds, and then dies en masse (the species has a fixed two-year life cycle), polluting the watercourse with rotting fish. In 2025 control efforts ran in 63 Norwegian rivers.

Existing fish-monitoring tech is mostly designed for inside salmon farms or for river-mouth traps. Nobody is watching what comes into the fjords before the fish move upstream. That is the gap our project tries to address.

Hand-drawn top-down sketch of a river mouth: a green guide-net ('Ledenett') angles out from the land, funnelling fish into a black observation chamber. Fish drawings show the intended swim direction; a blue arrow labelled 'Svømmeretning' marks the natural current.
Concept sketch. A guide-net funnels fish through the observation chamber on their way upstream.

The system, briefly

Isometric CAD render of the chamber: an aluminium-profile frame, approximately 2 × 1 × 2 metres, with labelled numbers for the camera mounts, tarp back wall, depth-sensor positions, and overhead detail camera.
Chamber CAD. Aluminium-profile frame, white tarp back wall, camera mounts at three depths.
System dataflow diagram in Norwegian. Inputs 'Fisk' (fish) and 'Vann' (water) feed into 'Fanger video' and 'Innhenter data' nodes via cameras and a temperature/salinity sensor. The MPU forwards data to a 'Server' which performs 'Gjenkjenne lengde og art' (recognise length and species), then forwards to a 'Nettside' (website) and a 'Resultat' (result) node.
End-to-end dataflow. Cameras and sensors → edge MPU → server → web UI.

The two-stage ML pipeline

The interesting design choice was splitting detection from classification:

The "no fish" class is what filters false positives (branches, waves, shadows) that the small local model lets through. Tracking IDs let the server pool multiple frames of the same fish during its passage and vote, which makes the final classification far more robust than any single frame.

A frame from an underwater camera inside the chamber. A dark salmon-shaped fish is in the lower-right, with a cyan YOLO bounding box around it labelled 'fish'. Tick-marks along the chamber wall act as a length reference.
Local YOLO output. The small detector running on the Pi, drawing a box around a fish in the chamber.
Demo of the full pipeline using a desk webcam: a person walks past holding paper printouts of two salmon. The web UI overlays bounding boxes on each printout, assigns 'FISK ID: 1' and 'FISK ID: 2', and reports estimated lengths (13.7 cm and 39.0 cm) and species confidence around 96–98%.
End-to-end demo. A test rig with paper-printout fish; the server returns IDs, estimated lengths, and species confidence.

The LoRA part

DINOv2 is a strong general-purpose ViT, but it has not seen pink salmon in low light through an acrylic dome. Full fine-tuning would update hundreds of millions of parameters; that is expensive and overkill for our dataset size.

LoRA (Low-Rank Adaptation) assumes the changes you actually need to make during adaptation have low intrinsic rank. For each frozen weight matrix W ∈ ℝd×k, you inject a trainable low-rank update ΔW = B·A where A ∈ ℝr×k and B ∈ ℝd×r with r ≪ min(d, k). The original weights stay put; only the adapters are trained.

In practice this gave us a model that handled our domain well, trained in hours instead of days, and didn't need a massive amount of labelled data to land somewhere useful.

My slice of the project

Five-person team. From the report's work-allocation section, my share was:

Teammates owned sensors, detection & length estimation, frontend, and backend respectively. Everyone wrote.

Writeup

Full Norwegian report: system design, hardware bill of materials, ML method, validation, all of it.

Back to timeline