This paper proposes a new visual tracking algorithm, which leverages the merits of both template matching approaches and classification models for long-term object detection and tracking. To this end, a regression network is learned offline to detect a set of target candidates through target template matching. To cope with target appearance variations in long-term scenarios, a target-aware feature fusion mechanism is also developed, giving rise to more effective template matching. Meanwhile, a verification network is trained online to better capture target appearance and identify the target from potential candidates. During online update, contaminated training samples can be filtered out through a monitoring module, alleviating model degeneration caused by error accumulation. The regression and verification networks operate in a cascaded manner, which allows tracking to be performed in a coarse-to-fine manner and enforces the discriminative power. To further address the target reappearance issues in long-term tracking, a learning-based switching scheme is proposed, which learns to switch the tracking mode between local and global search based on the tracking results. Extensive evaluations on long-term tracking in the wild have been conducted. We achieve state-of-the-art performance on the OxUvA long-term tracking dataset. Our submission based on the proposed method has also won the 1st place of the long-term tracking challenge in VOT-2018 competition.