# Open problems

Some problems that I can't solve (due to my low capability in higher mathematics) but stay in my mind.

## Q1. (since 2017-Oct) The expected number of disjoint tree of a randomly generated data set

The following question suddenly pops up in my head when I was queueing in a government building waiting to get some documents.

The question in one sentence :

What is the expected number of disjoint tree of a randomly generated data set that all the points are connected to their nearest neighbor?

Given several points (of any dimension). These points are connected to their “nearest neighbour” (consider Euclidean distance first). Several disjoint “trees” are formed after linking the dots. Let $$x$$ be the number of disjoint trees. Find the expected value of $$x$$.

The data points are generated under some distribution (e.g. uniform, Gaussian).

Sub-problems are :

• Find a “good” (fast, efficient, low complexity) algorithm to find $$x$$ given a data matrix.

• Generalize to other distance measures.

• Generalize to other distributions.

The following gives an example on two dimensional data points. Why the number is 108? It is because my queue ticket is 108 when I was waiting :)

## Q2. (Since 2018-July) Interchange interpolation and extrapolation

Matrix completion is basically a interpolation problem.
Can we, do something similar, but in extrapolation sense?