In this paper, we present an effective deep learning approach for  supervised detection and tracking of vocal tract contours in a sequence of rtMRI frames. We train a single input multiple output deep temporal regression network (DTRN) to detect the vocal tract (VT) contour and the separation boundary between different articulators. The DTRN learns the non-linear mapping from an overlapping fixed-length sequence of rtMRI frames to the corresponding articulatory movements, where a blend of the overlapping contour estimates defines the detected VT contour. The detected contour is refined at a post-processing stage using an appearance model to further improve the accuracy of VT contour detection.

