Optimization over a set of probability measures is the fundamental numerical problem in maximum likelihood statistics. In this talk I discuss the special case of fitting a finite mixture model of parametric densities to observed data. This problem has broad applications, and has been greatly discussed in the literature. I review several of the most prominent methods for fitting finite mixture models. I also introduce a general framework for viewing these models as non-convex finite dimensional embeddings of an infinite-dimensional convex optimization problem, and discuss a duality theory for each of these problems. We finish by discussing related applications and future directions.