Institute for Advanced Study, Princeton
Australian National University
with Sihao Cheng and Brice Menard
Lecture at Tsinghua University, Dec 2020
Cat
Dog
Cat
Dog
(or "activation function")
e.g., ReLU
e.g., tanh, sigmoid
Can we understand the insights behind these operations?
Cosmic Microwave Background
Cosmic Reionization
Large-scale structure
Motions of a billion stars
Complexity
Stochasticity
("uninformative" variances)
(or "structures")
Mandelbrot Set
Complexity
Stochasticity
My niece
E.g.,
orientation
projecting out uninformative variability
Sersic profile
(or "structures")
("uninformative" variances)
Complexity
Stochasticity
Cosmic Microwave Background
Weak Lensing
Reionization
Intergalactic Medium Tomography
("Cosmic Web")
(or "structures")
("uninformative" variances)
Complexity
Stochasticity
Complex
Simple
A stationary Gaussian Process
Cosmic Microwave Background
(or "structures")
("uninformative" variances)
Physical parameters
Observations
Very high dimension, impossible to characterize
Complexity
Stochasticity
Cosmic Microwave Background
Eliminating uninformative variability
(or "structures")
A stationary Gaussian Process
("uninformative" variances)
Definition :
A random process is a Gaussian Process iff
Definition: A random process is stationary iff
If the Gaussian process is stationary, then
Parseval's Theorem
A realization
Summary statistics
Let Fourier Transform:
Ergodic 遍历
Complexity
Stochasticity
Complex
Simple
Taking the power spectrum
Losing structural information
(or "structures")
("uninformative" variances)
Vary cosmological parameters
0.9
0.8
0.7
0.25
0.30
0.35
0.40
Dark Matter Density
Growth Amplitude
Power spectrum fails to distinguish the intricate differences between the two maps
Vary cosmological parameters
0.9
0.8
0.7
0.25
0.30
0.35
0.40
Dark Matter Density
Growth Amplitude
Power spectrum fails to distinguish the intricate differences between the two maps
Completely delocalized kernel in the real space
Extremely localized information in the Fourier space
Uncertainty principle
We need to cross correlate more than one point in the Fourier space to define locality
also only has 2D of freedom
The "locality" of a random process expresses itself in the form of the degeneracy in the Fourier phases
Phase
When performing a Fourier analysis, to extract the locality information, second-order moment alone is not sufficient
Consider a single random variable
In 1D, power spectrum is equivalent to taking the second moment
Variance
Skewness
But skewness defines locality
Consider a single random variable
Variance
Skewness
Skewness defines locality
Classical ideas : characterizing
with all its moments
Study the dependency of phases in the Fourier space
E.g., Bispectrum
Reducing dimension
Complexity
Stochasticity
Complex
Simple
Power spectrum
Bispectrum
(The "dimension reduction" is too inefficient)
(or "structures")
("uninformative" variances)
Heavy tail
(Non-Gaussian)
Depend critically on the "outliers" that are usually not well sampled
The estimate can be noisy
Let's consider an 1D distribution
Classical ideas : characterizing
with all its moments
But for distributions with "heavy tails", e.g., power law distributions, moments fail
e.g.,
then
when
Limitations of the power spectrum
Limitations of higher order moments
(or "activation function")
e.g., ReLU
e.g., tanh, sigmoid
In Fourier transform :
Wavelets
"Delocalized" in real space
Preserve locality
Alternative :
Wavelet
Frequency mask
Wavelet
An agglomerate of many Fourier eigenmodes
= preserve locality
Wavelet
Frequency mask
Wavelet
Number of summary statistics
Simple averaging -- to construct translation invariant statistics
Convolution and expectation are commutative
The system is stationary
properties of wavelet
(both can be written as integrals)
(or "activation function")
e.g., ReLU
e.g., tanh, sigmoid
(or "activation function")
e.g., ReLU
e.g., tanh, sigmoid
Let
to be a linear function
need to be a non-linear function
(or "activation function")
e.g., ReLU
e.g., tanh, sigmoid
Let
be the modulus squared
Power spectrum
Binning/Averaging
Weight
Frequency mask
The wavelet "operation block" retains the locality information, but we have not yet extracted the locality information
Higher-Order moments are unstable
E.g., take the modulus, instead of modulus squared
(or "activation function")
e.g., ReLU
e.g., tanh, sigmoid
The importance of choosing a linear / sublinear function at large x
"Folding"
"Folding" = non-linear operation + averaging
"Folding"
"Folding" = non-linear operation + averaging
"Folding"
Linear order with respect to
stable and robust summary statistcs
(or "activation function")
e.g., ReLU
e.g., tanh, sigmoid
Convolution with wavelets
Averaging
Low-order non-linear function
Iterate over these operations
Complexity
Stochasticity
Complex
Simple
Power spectrum
Bispectrum
(or "structures")
("uninformative" variances)
Scattering Transform
Do you know how to describe these images now?
Scattering
Transform
Vary cosmological parameters
0.9
0.8
0.7
0.25
0.30
0.35
0.40
Dark Matter Density
Growth Amplitude
Power spectrum fails to distinguish the intricate differences between the two maps
Power Spectrum
Peak Counts
Scattering Transform
15 coefficients
37 coefficients
20 coefficients
10
100
1000
Figure of Merit
Galaxy number density (arcmin )
-2
10
30
100
(state-of-the-art)
(our study)
x2
x10
x3
Weak lensing
Background galaxies
Foreground dark matter
Unlensed
Lensed
0.9
0.8
0.7
0.25
0.30
0.35
0.40