Prior to joining Gonzaga, I worked as a Project Scientist at the UC Davis Genome Center
in the Data and Knowledge Systems Lab and as a Postdoctoral Researcher at the San Diego Supercomputer Center
My research interests are broadly in the area of database technology (conceptual modeling, data discovery, data integration) and scientific data management (observation models and scientific workflows). I am currently involved in the following projects.
Observational Data Semantics
Scientists often rely on observational data (i.e., sets of raw or derived "observations" and "measurements") to carry out analyses. While observational data is largely stored in spreadsheets or simple relational structures, the discovery, interpretation, and integration of observational data often requires complex metadata (e.g., to capture contextual information, measurement scales, experimental methods, and so on). As part of the NSF-funded Semtools and SONet projects, I collaborate with members of NCEAS and researchers at UC Davis to develop ontology-based models for representing the semantics of observational data, approaches for semantically annotating observational data sets, and tools that leverage annotations and corresponding ontologies for improving discovery and integration of ecological data. Our goal is to develop approaches and technology that can help scientists to more easily describe, find, and reuse observational data.
Scientific Workflow Modeling and Design
As a contributor to the Kepler Scientific Workflow System my interests are in making scientific workflows easier to specify, re-purpose, and reuse for scientists and workflow engineers. My work in this area explores typing mechanisms for scientific workflows, methods for composing dataflow and control-flow constructs, and support for processing nested data (i.e., XML) within Kepler. This work is being carried out within the NSF sponsored Kepler/CORE, Processing PhyloData, and the UC Davis Accelerating Genome-Scale Biological Research informatics projects.
Scientific Workflow Provenance
An advantage of scientific workflow systems over traditional scripting approaches is their ability to automatically record data and process dependencies introduced during a workflow run. With colleagues at UC Davis we are developing approaches to efficiently store, query, and visualize the provenance of workflow runs. Our work largely focuses on capturing and storing explicit data dependencies for general classes of workflow models, including those that work over structured (i.e., XML) data. This work has produced a new Query Language for Provenance (QLP) and corresponding storage and evaluation techniques for processing QLP queries.