The automatic object detection.
The third stage of a MAIA job processes all images of a volume with a supervised object detection method. This method uses the training proposals that were obtained with one of the three methods (novelty detection, existing annotations or knowledge transfer) to learn a model for what you determined to be interesting objects or regions in the images. The object detection method produces a set of "annotation candidates", which are image regions that the method found to be interesting based on your provided training proposals. When the object detection is finished, the MAIA job will continue to the next stage in which you can manually review the annotation candidates.
The object detection stage of a MAIA job can take many hours to complete. The exact runtime depends on the size of the dataset and the capabilities of the computing hardware. But generally, more training data results in a better object detector, so don't hold back your annotated data to reduce the training time.
In the original MAIA paper [1], Mask R-CNN was used for instance segmentation in this stage. However, it was no "real" instance segmentation, as the segmentation masks were converted back to bounding boxes. In the implementation in BIIGLE, Faster R-CNN is used for object detection, which directly produces bounding boxes (but is otherwise similar to Mask R-CNN).