Reproducible and Reusable Science
Connecting research results to the underlying data and analysis is central to the validation and extensibility of scientific discoveries. Our tools encourage open data and methodological transparency, when possible, and promote and enable data citation.
Computationally Assisted Exploration
We build analytical tools, such as Consilience and TwoRavens, that assist a researcher to understand and discover new insights from their data by connecting their own knowledge, expertise and judgement with the vast array of quantitative methods available in computational analysis.
Interdisciplinary Quantitative Scientific Scope
While social science research informs many of our goals, our tools and research frameworks address broad methodological issues in quantitative science and are often employed in numerous other research domains, including deep partnerships in the health and biomedical sciences, astronomy, and the humanities.
When Data are Not Open
While we support open data in all possible forms, the increasing ability of science to measure our lives, brings increasing ethical responsibiities to safeguard privacy. We need to find solutions to preserve privacy, while still providing science the fundamental ability to learn, access and replicate findings. DataTags and PrivateZelig are two of our solutions towards these goals.
Large-Scale Data Sets
In the coming years, the Data Science team will be expanding its software applications to handle large-scale data sets, as Big Data science reaches all disciplines. This means extending Consilience for millions of text documents, and Zelig and Dataverse to handle TB and PB-scale data sets.
Zelig: Everyone's Statistical Software is an interface, that allows a large body of different statistical models in the R statistical language to be implemented and interpreted in a common framework and interface.
For almost a decade, Dataverse has been at the forefront of data publication, citation and preservation. We continue to innovate and expand to more domains, and interoperate with more systems.
This is the first public release of a new, interactive Web application to explore data, view descriptive statistics, and estimate statistical models.
DataTags guides data contributors through all legal regulations to appropriately set a level of sensitivity for dataset through a machine-actionable Tag, that can then be coupled, tracked and enforced with that data's future use.
Consilience allows you to simultaneously explore every existing clustering method in the literature to help you discover new clusters and patterns in your text documents.
The first public version of the new RBuild application will provide a continuous integration build solution, from freshly developed code in Git to archived published code in CRAN, for developers and contributors of R packages.