Large archives and digital sky surveys with dimensions of 1012
bytes currently exist, while in the near future they will reach sizes of
the order of 1015. Numerical simulations are also producing comparable
volumes of information. Data mining tools are needed for information
extraction from such large datasets. In this work we propose
a multidimensional indexing method, based on a static R-tree data
structure, to efficiently query and mine large astrophysical datasets.
We follow a top-down construction method, called VAMSplit, which
recursively splits the data set on a near median element along the dimension
with maximum variance. The obtained index partitions the
dataset into non overlapping bounding boxes, with volumes proportional
to the local data density. Finally, we show an application of this
method for the detection of point sources from a gamma-ray photon
list.
1