
Given data samples and derised corresponding
labels , find a mapping function
between and so that is true most of the time.
Linear RegressionWhich coefficients best explain the relationship between and so that
?
Linear RegressionBrute-force learning
– a modern way for
"testing plenty of solutions"
Linear Regressionis the label predicted by the model for a given data sample
With the true label , we can define the prediction loss:
Linear RegressionThe task becomes finding
so that
How do we infer efficiently?


OverfittingMany parameters may be used for complex regressions.
The green model error 0, yet the model does not generalize.
training & testing dataThe parameters are learned from the training data with the training loss
The algorithm must also minimize the testing loss
(= generalization error)
regularization penaltyAdding a penalty to the loss function for preventing noise contamination
Support-vector machine
hyperplaneIn two dimensions, a hyperplane is a straight line.
What if te data cannot be split by a line?


kernel trickIncrease the data dimension with non-linear feature combinations, and find the hyperplane "there" in the high-dimensional space.

Here, we have the Radial Basis Function (RBF) kernel

Machine learning for signal analysis
A -point waveform can be seen as a
single data sample with features

These two waveforms are shifted a tiny little bit in the time domain.
This makes a huge change in the feature comparison...



Manual feature extraction for time-to-failure earthquake prediction

Manual feature extraction for time-to-failure earthquake prediction



Deep neural networks
Which features should matter?
Where is an input data sample, are the weights and the bias, and the function is the non-linear activation. The output is a new feature .









ConvNetQuake – a CNN for earthquake detection and location

ConvNetQuake – a CNN for earthquake detection and location

PhaseNet – a seismic phase detector

PhaseNet – a seismic phase detector

PhaseNet – a seismic phase detector

PhaseNet – a seismic phase detector

Unsupervised learning
Clustering – most common definitions

Find cluster centroids that minimize
the within-cluster variance. In order words:



Principal component analysisLet consider , find a decomposition of so that
Principal component analysis for compression

Independant component analysis

Independant component analysis for blind-source separation


Deep unsupervised learning
Task: learn to reconstrut the data () from itself () with constrains

Learn to encode a 512-point seismogram into a 32-point with best reconstuction quality

Rejecting the seismograms that are hard to decode with low errow.