Zé Vinícius bio photo

Zé Vinícius

EE undergraduate

Email GitHub Twitter LinkedIn Instagram Speakerdeck Skype

One of the prime works on finding a fast way to compute good enough estimates of periods of planet transit events was carried out by Kovács, Zucker, and Mazeh. Their algorithm, the Box-least Squares (or BLS for short), has been very successful in finding those periodic events (their paper has over 500 citations!).

This blog post has the goal of investigating the maths that underpin the BLS method from a different angle, and verify whether there are unexplored speed-ups or properties. More precisely, I derive the parameters of a box function that “best” fits (from a maximum likelihood perspective) a planet transit in time series brightness data of stars (these are often called “lightcurves” by astronomers).

Assume the following relation for the folded lightcurve \begin{align} Y_i = h - d\cdot \mathbb{I}(t_0 \leq t_i \leq t_0 + w) + \eta_i\nonumber, \end{align} where , is known, and is the indicator function. The parameters to be estimated on the basis of observations are , i.e. the baseline flux , the depth of the planet transit , the start time of the transit , and the duration or width of the transit .

Pictorially:

In this setting, , and the joint probability density function of is given by where .

The log-likelihood function (up to an additive constant) may be written as \begin{align} \log p(\boldsymbol{y}) = -\dfrac{1}{2\sigma^2}\left(\sum_{i \in I^c}(y_i - h)^2 + \sum_{i \in I}(y_i - (h - d))^2\right) \end{align}

Therefore, the maximum likelihood estimator may be written as \begin{align} {\theta}^{\star}({\boldsymbol{y}}) &= \arg \min_{h, d, t_0, w} \sum_{i \in I^c}(y_i - h)^2 + \sum_{i \in I}(y_i - (h - d))^2 \end{align}

Note that, has continuous partial derivatives with respect to and . And therefore, for fixed values of and , \begin{align} {\theta}^{\star}({\boldsymbol{y}}) &= \arg \min_{h, d} \sum_{i \in I^c}(y_i - h)^2 + \sum_{i \in I}(y_i - (h - d))^2 \end{align}

Differentiating with respect to , and solving , it follows that \begin{align} h^{\star} = \dfrac{1}{n}\left(N(I)\cdot d + \sum_{i=1}^{n}y_i\right),\nonumber \end{align} where is the number of elements in the set .

Now, differentiating with respect to , and solving , it follows that \begin{align} d^{\star} = \dfrac{\dfrac{1}{n}\displaystyle\sum_{i=1}^{n}y_i - \dfrac{1}{N(I)}\sum_{i \in I}y_i}{1 - \dfrac{N(I)}{n}}\nonumber, \end{align}

The formulae for and presented here are precisely the same as the ones developed by Kovács, Zucker, and Mazeh, in case one assumes that the “weights” , as presented in the aforementioned paper, are constant w.r.t. , e.g., .

Hence, one can iterate between numerically optimizing for and and analytically computing and . On my experiments, I found that two iterations of this procedure are enough for convergence.

In the BLS paper, it seems to me that they used a grid-search approach in order to find and .

Interestingly, the maximum likelihood estimator for the transit depth, , is an unbiased estimator for any sample size . To see that, let’s take the expected value of with respect to the distribution of the data \begin{align} \mathbb{E}\left(d^{\star}\right) = \dfrac{\dfrac{1}{n}\displaystyle\sum_{i=1}^{n}\mathbb{E}\left(Y_i\right) - \dfrac{1}{N(I)}\sum_{i \in I}\mathbb{E}\left(Y_i\right)}{1 - \dfrac{N(I)}{n}}, \end{align} noting that , and after doing the maths, one arrives at \begin{align} \mathbb{E}\left(d^{\star}\right) = d, \end{align} which proves that is unbiased.

Another interesting statistical figure to look at is the variance of . Before proceeding, note that can be rewritten as \begin{align} d^{\star} = \dfrac{\dfrac{1}{n}\displaystyle\sum_{i \in I^c}y_i + \left(\dfrac{1}{n} - \dfrac{1}{N(I)}\right)\sum_{i\in I}y_i}{1 - \dfrac{N(I)}{n}} \end{align} then, it follows that \begin{align} \mathrm{var}\left(d^{\star}\right) = \dfrac{\dfrac{1}{n^2}\displaystyle\sum_{i \in I^c}\mathrm{var}\left(Y_i\right) + \left(\dfrac{1}{n} - \dfrac{1}{N(I)}\right)^2\sum_{i\in I}\mathrm{var}\left(Y_i\right)}{\left(1 - \dfrac{N(I)}{n}\right)^2}, \end{align} after algebraic simplifications \begin{align} \mathrm{var}\left(d^{\star}\right) = \dfrac{\sigma^2}{N(I)\left(1 - \dfrac{N(I)}{n}\right)} \geq \dfrac{\sigma^2}{N(I)} \end{align}

The inequality shown above is particularly interesting specially if we rewrite it as \begin{align} N(I) \geq \left(\dfrac{\sigma}{\mathrm{std}\left(d^{\star}\right)}\right)^2 \end{align} which tells us the minimum number of samples in transit that are needed in order to achieve a given uncertainty on the transit depth .

Another interesting figure of merit is the assymptotic ratio \begin{align} \lim_{n\rightarrow\infty} \dfrac{\mathbb{E}\left(d^{\star}\right)}{\sqrt{\mathrm{var}\left(d^{\star}\right)}} = \dfrac{d\sqrt{N(I)}}{\sigma} \end{align} usually called signal-to-noise ratio by astronomers.

More broadly, note that . As a result, various others interesting quantities may be derived ;)

See the next blog post for an example of these maths. :)