Signal & Noise- Nate
Silver
One of the best book that I read in 2015 was
the “Signal & Noise” by Nate Silver. This book is about forecasting. The writer
is famous for his works on baseball and political forecasting. The book
includes insights from interviews with experts in weather/climate forecasting,
finance and also decision makers / policy makers who their works depend heavily
on forecasting data. Here are some of the interesting insights from this book:
Is that possible to
forecast?
One of
famous statement regarding scientific
determinism is Laplace’s demon: if someone (the Demon) knows the precise
location and momentum of every atom in the universe, their past and future
values can be calculated from the laws of
classical mechanics.
Despite uncertainty
in quantum level, the macroscopic behaviour is deterministic (e.g. the rain
cloud movement and the chess pawn movements are deterministic).
Success stories:
·
Weather
forecast: its physics model is well known, the PDEs can be solved with finite elements.
Regarding improving accuracy, roughly speaking a 4D model (x,y,z,t) need 16x
computational power to double grid resolution.
· Games (e.g. baseball, chess) are easier to forecast than stock markets/ politics: game rules are clear & consistent (not like man behaviour), and many data available. Chess is a deterministic game, can be solved with search algorithms.
Chaos
Deterministic dynamic systems
can be impossible to predict.
The initial points difference (e.g. because of data truncation / noise) leads
to huge different prediction points (due nonlinear nature) and for every (time)iteration
the divergences become bigger & bigger so that the trajectories seem unpredictable.
For
example, despite successful weather
modelling, the prediction is unreliable for more than 5 days horizon.


---------------
If it’s difficult to model
it doesn’t mean you can neglect it.
For
example: financial model of credit risk
·
The
model assumes independent distribution, the investment-bank management add 50%
margin but in fact the effect is much greater than was imaginable (e.g. 600%) .
·
Hedge
fund firms advertise that their risks (volatility) are much lower than other
risky assets. In fact the risk is not constant as this assumption, in the crisis
event the volatility jumps much higher than its average.
Overfitting
Overfitting
will more likely to happens when:
·
the
data is noisy e.g. almost impossible to measure displacement/temperature 20km
below earth accurately for earthquake prediction)
·
sparse
event (e.g. earthquake/tsunami, terrorist attack, flu/ebola outbreak, economy
crash)
·
when
the models are not well understood or changing (e.g. stock market)
·
when
the model is complex (e.g. earthquake)
Bias
·
Different
weather channels present government data (NOAA/KNMI) with their own
adjustments, tend to more amplify rainy weather (e.g. in the image
presentation) to avoid false negative punishment.
Non ethical bias
·
political
polls
·
investment
firms recommendations
Media bias
Media want news (TV weather, Politics, Ebola epidemic) to keep the media interesting so the most controversial statements (whether from political candidates or weathermen) will get more attention regardless how accurate are they.
Distinguish noise from
signal

· Select only variables that are most relevant, less noisy. Neglect noisy / difficult to measure variables.
·
Noise
(e.g. false alarms) is difficult to distinguish from signals until it happened
(e.g. intelligence reports before WTC911).

Communicate uncertainty
clearly
When it was forecast that the water level will be 4.5 +/- 1m. People neglected evacuation order because the news reporter & audience thought that the water would be 4.5m at max while the dike is 5m high so they would be safe.



