UW N-body Simulation Dataset and Benchmark



Background: N-body simulation

N-body simulations are simulations of a system comprised of a large number ("N") of bodies that are under the influence of mutually-induced physical forces. For astrophysical simulations, the bodies represent packets of self-interacting fluids that model the material structure of the universe. Check out Wikipedia for more detail.


The three benchmark datasets are generated by N-body shop research group at University of Washington.

Summary of datasets
DatasetNumber of ParticlesNumber of SnapshotsSize of each snapshotTotal size
dbtest128g4.2 million128169 MB21 GB
cosmo5033.6 million91.4 GB12.6 GB
cosmo25916.8 million236 GB72 GB

The dataset is available only on request due to its volume. If you want to use the dataset, please contact to arrange download.


Ad-hoc analysis

In our IASDS 2009 paper, we presented five types of queries that astrophysicists frequently ask over simulation data. The paper includes detailed use-cases as well as benchmark result. We release the benchmark scripts for Pig/Hadoop. For SQL queries, please refer page 6 of the IASDS 2009 paper.

Distributed Friends-of-Friends (dFoF)

The Friends-of-Friends algorithm (FoF and references therein) is a domain-specific clustering algorithm that is also a simplified version of the more general and commonly used DBSCAN algorithm. A concrete application that also uses FoF is kernel density estimation (KDE), which involves searching for all points whose kernel can contribute to the density at a given point. In astrophysics, KDE techniques are used for object classification in a multi-dimensional parameter space of sky survey data.

The distributed Friends-of-Friends (dFoF) is an optimized implementation of FoF algorithm running in shared-nothing computational platform such as Hadoop and Dryad. Here we release an implementation using DryadLINQ.



It is a pleasure to acknowledge the help we have received from Tom Quinn, both during the project and in writing this publication. Simulations "Cosmo25" and "Cosmo50" were graciously supplied by Tom Quinn and Fabio Governato of the University of Washington Department of Astronomy. The simulations were produced using allocations of advanced NSF--supported computing resources operated by the Pittsburgh Supercomputing Center, NCSA, and the TeraGrid.

This work was funded in part by the NASA Advanced Information Systems Research Program grants NNG06GE23G, NNX08AY72G, NSF CAREER award IIS-0845397, NSF CRI grant CNS-0454425, the eScience Institute at the University of Washington, gifts from Microsoft Research, and Balazinska's Microsoft Research New Faculty Fellowship.

[Back to Nuage Project]