Detailed Schedule

Additional details for talks and tutorials for the 2024 Collaboration Meeting.

Monday, April 15

- Opening Plenary: Tiny Galaxies (and So Much More) with Rubin Observatory

(J. Carlin)</a>

I will introduce the Vera C. Rubin Observatory’s Legacy Survey of Space and Time (LSST), and discuss the planned data products and the timeline for their availability to the data rights-holding community. I will discuss available resources, including the Data Previews and how to access the current Data Preview 0 data products via the Rubin Science Platform. Among the vast discovery space LSST will open is the potential for discovering and characterizing the faintest dwarf galaxies around the Milky Way and nearby systems, as well as the remnant stellar debris from tidally disrupted dwarf satellites. I will briefly highlight these “near-field cosmology” topics as well as contributions in a variety of scientific areas that this unprecedented survey will enable.

- Anomaly Detection Update (K. Malanchev)

This will present current status and future plans of the ISSC Anomaly detection interest group, the first interest group of the ISSC.

- Unsupervised Anomaly Detection in Astronomical Images: A Prelude to LSST with ZTF Data (F. Stoppa)

Initiated by the ISSC Anomaly Detection Interest Group, this project explores an unsupervised approach to anomaly detection in astronomical images, using data from the Zwicky Transient Facility (ZTF) as a foundation for developing methodologies for the forthcoming Legacy Survey of Space and Time (LSST). Moving beyond traditional real-bogus classifications, our project employs unsupervised learning techniques, specifically focusing on autoencoders, to identify outliers at the image level. Here, outliers refer to both astrophysical phenomena and instrumental artifacts that deviate from the norm, providing new insights or challenges in data interpretation. This presentation will detail our approach, from data selection and model training to the implications of our findings for future astronomical surveys.

- How information theory can help guide survey design decisionmaking (A. Malz)

The degree to which LSST’s data can be used to answer the most pressing questions about the universe depends on many choices we as scientists and engineers make, from the observing strategy encompassing the frequency and duration of visits to each portion of the sky as a function of wavelength, to the realtime approach to selecting time-domain events to target with limited follow-up observational resources. In this brief talk, I will demonstrate how information theory can be applied to these two aspects of survey design decisionmaking with a goal of opening up a discussion of how we in the ISSC can assist the Rubin community in maximally gaining knowledge of the cosmos through optimizing such choices in preparation for the survey and throughout the decade of planned observations.

- The Extended LSST Astronomical Time-Series Classification Challenge (ELAsTiCC) (G. Narayan)

ELAsTiCC, or how I learned to stop worrying and test broker and AI/ML infrastructure for Rubin

- Tutorial on the Active Anomaly Discovery (K. Malanchev)

In this tutorial I will demonstrate how an active machine learning approach can be combined with anomaly detection algorithms to create personalized discovery pipelines. We will use SNAD coniferest package, which extends the Isolation Forest algorithm to incorporate expert-in-the-loop behavior. We will examine how the algorithm adopts to an expert’s feedback across various datasets, including ZTF time-domain data.

The presentation is based on an extended version of SNAD Coniferest tutorial. The tutorial was previously used at the Michigan Cosmology Summer School 2023 and going to be used on AISSAI Anomaly detection workshop in March. https://coniferest.readthedocs.io/en/latest/tutorial.html

- Tutorial: ANTARES (G. Narayan)

Deploying a simple filter for ANTARES

Tuesday, April 16

- Astronomy Re-envisioned: Investigating the Physics of Galaxy Evolution with Machine Learning (J. Wu)

Astronomical imaging of galaxies reveals how they formed and evolved. While spectroscopy is necessary for measuring galaxies’ physical properties, such as their cold gas content or metallicity, it is now possible to reliably predict these properties direct from three-color optical image cutouts by using convolutional neural networks (CNNs). Even the entire optical spectrum can be determined purely from galaxy images. Highly optimized CNNs can also robustly identify nearby dwarf galaxies from large-area imaging surveys, resulting in a dramatic increase in the total number of satellite galaxy systems we can study at low redshifts. Recent developments with cosmic graph neural networks (GNNs) are also able to reveal the impact of large-scale environment on the connection between galaxies and dark matter halos. These applications are prime examples of how deep learning with strong inductive biases can facilitate new science in galaxy evolution and near-field cosmology. With the upcoming Legacy Survey of Space and Time by the Rubin Observatory, cutting-edge machine learning techniques will further transform our ability to study the cosmos.

- Introduction to LINCC Frameworks Software (M. DeLucchi)

The LINCC Frameworks project, supported by Schmidt Sciences, is an ambitious program to develop state-of-the-art analysis techniques capable of meeting the scale and complexity demanded by the Rubin Observatory Legacy Survey of Space and Time (Rubin LSST) data.

- Measurements of the time delays in distant quasars (A. Siemiginowska)

TBD

- Automatic generation of magnification maps for lensed quasars and supernovae using deep learning (S. Khakpash)

Better modeling the microlensing variability in light curves of lensed quasars and supernovae enhances accurate measurements of time delays and the Hubble constant along with improving our understanding of quasars structure and the stellar mass distributions in distant galaxies. In the era of Rubin LSST, there will be thousands of events that need microlensing modeling. Traditional modeling approaches use computationally-intensive ray-tracing methods to generate microlensing magnification maps. While libraries of precomputed maps now exist, they only sample the parameter space on a fixed grid, and the data volume is challenging to handle in modeling. An efficient, automated approach will be needed to enhance this process for large volume of data expected from large surveys like LSST. In this project, we have trained an Autoencoder (a type of deep- learning model) on pre-computed magnification maps to reduce their dimension and form a latent space representation while optimizing for acceptable reconstruction of the maps. We then use a Convolutional Neural Network (CNN) to connect the lensing galaxy parameters to the latent space dimension of the maps. Given the trained Autoencoder and the CNN, we then can generate maps for a given set of lensing galaxy parameters in less than a second. This approach will enhance the treatment of microlensing variability in analysis of light curves for lensed quasars and supernovae.

- Calibrated predictive distributions for photometric redshifts (B. Dey)

Many astrophysical analyses depend on estimates of redshifts (a proxy for distance) determined from photometric (i.e., imaging) data alone. Inaccurate estimates of photometric redshift uncertainties can result in large systematic errors. However, probability distribution outputs from many photometric redshift methods do not follow the frequentist definition of a Probability Density Function (PDF) for redshift — i.e., the fraction of times the true redshift falls between two limits z1 and z2 should be equal to the integral of the PDF between these limits. Previous works have used the global distribution of Probability Integral Transform (PIT) values to re-calibrate PDFs, but offsetting inaccuracies in different regions of feature space can conspire to limit the efficacy of the method. We leverage a recently developed regression technique that characterizes the local PIT distribution at any location in feature space to perform a local re-calibration of photometric redshift PDFs resulting in calibrated predictive distributions. Though we focus on an example from astrophysics, our method can produce predictive distributions which are calibrated at all locations in feature space for any use case.

- Tutorial: Calibration of uncertainty estimates for astronomical analysis (B. Dey)

This tutorial will guide participants through the some methodologies for calibrating uncertainty estimates from simulator based inference and prediction algorithms. Attendees will learn the importance of calibrated uncertainties for ensuring the reliability and trustworthiness of predictions and parameter inferences in astrophysics. The session will cover theoretical foundations, practical applications of the LF2I and Cal-PIT frameworks. By the end of the tutorial, participants will be equipped with the skills to diagnose and calibrate the probabilistic outputs from their own analysis, enhancing the accuracy and dependability of their results.

- Tutorial: LINCC Frameworks Python Project Template (M. DeLucchi)

We will work through applying out team’s python project template and addressing CI-related concerns.

This tutorial on using the LINCC Frameworks python project tempate is adapted from a similar tutorial at the DESC sprint week.

Wednesday, April 15

- Closing Plenary (D. Sculley)

TBD

- Galaxy number density weights are biased in power spectrum (T. Karim)

Galaxy redshift surveys such as Rubin LSST use galaxy number density field as a proxy for the cosmic matter density field to constrain cosmological models. However, observational systematics such as Galactic dust and variation in imaging quality modulates the true galaxy number density field and not accounting for these effects can lead to a biased conclusion about our Universe. Existing methods attempt to correct for this issue by re-weighting the observed galaxy number density field. In this talk, I will argue that while the existing method of estimating these weights can yield an unbiased galaxy density field, these same weights are not unbiased the galaxy power spectrum, the statistic most commonly used in cosmology. I will additionally discuss how this bias can be corrected for and how the bias affects the covariances used in cosmological likelihoods. “

- The Power of Large Numbers, ML & Robust Statistics: from HSC to LSST(A. Ghosh)

Many unsolved challenges in extragalactic astronomy are driven by the fact that galaxy evolution is stochastic in nature. Rubin-LSST will present us with a large, uniform dataset (20 billion galaxies over a decade) – perfect for settling these outstanding questions. I will outline how we can already leverage LSST pre-cursor datasets along with robust statistical analysis and machine-learning frameworks to unlock new insights into galaxy evolution.

One such issue that has remained enigmatic with over a decade of conflicting results is the relationship between galaxy size and large-scale environmental density. Using 3 Million Hyper Suprime-Cam galaxies, I will demonstrate how we can conclusively confirm with >5𝛔 confidence that galaxies in denser environments are up to 25% larger than their counterparts with similar mass and morphology in less-dense regions of the universe. Compared to most previous studies, this sample is ~1000 times larger and goes ~1 dex deeper in mass completeness.

I will discuss how existing theoretical frameworks fall short in explaining the observed correlations and emphasize the need for more comprehensive investigations into the galaxy-halo connection. I will also touch upon outstanding challenges and infrastructure needs for replicating similar studies at LSST-scale.

- Building and using predictive catalogs (T. Loredo)

Bayesian probabilistic graphical models—also known as hierarchical Bayes (HB) models, or Bayes nets—are well-suited to modeling data with measurement errors and selection effects. They were introduced into astronomy in the 1990s for analyzing spectro-temporal data in particle astrophysics, and demographic data for gamma-ray burst and trans-Neptunian object population studies. That early work explicitly noted that the fundamental data products needed by HB models include member likelihood functions, i.e., likelihood functions for the properties of the members of the analyzed population. Since the likelihood function is the (conditional) predictive distribution for the data, we dub catalogs reporting likelihood functions predictive catalogs. HB models can produce probabilistic catalogs that use data to address a specific scientific question. Predictive catalogs instead are enabling technology allowing catalog users to address diverse scientific questions. We discuss issues arising in both building and using predictive catalogs. This talk describes work being done with Tamás Budavári and Matthew Graham.

- Hierarchical Bayesian Models with generative and physical components for inference with corrupted data (B. Remy)

Generative models have shown great promise in enabling inference where traditional physical models fall short. In this talk, I will discuss the possibility of training these generative models as part of Hierarchical Bayesian Models (HBMs), including physical processes. This approach allows for learning in a self- or unsupervised manner, solely from the data. I will demonstrate how the trained HBM can then be used to perform accurate probabilistic inference of physical quantities.

The astrophysical problem we address involves inferring the weak gravitational shear of galaxy images convolved with a telescope Point Spread Function (PSF). The main challenge in this problem is that there is currently no physical model that can accurately describe arbitrary galaxy light profiles, which translates into biases in shear estimates due to model misspecification.

To address this issue, we propose a HBM that combines a Variational Autoencoder (VAE) modeling arbitrary galaxy light profiles with physical equations that model sheared, noisy, and PSF-convolved galaxy images. This hybrid generative and physical model allows us to perform unbiased probabilistic inference of physical quantities, such as cosmic shear, while accounting for model misspecification.