Wednesday, January 13, 2016

Signal and Noise by Nate Silver (book summary)

Signal & Noise- Nate Silver


One of the best book that I read in 2015 was the “Signal & Noise” by Nate Silver. This book is about forecasting. The writer is famous for his works on baseball and political forecasting. The book includes insights from interviews with experts in weather/climate forecasting, finance and also decision makers / policy makers who their works depend heavily on forecasting data. Here are some of the interesting insights from this book:

Is that possible to forecast?

One of famous statement regarding scientific determinism  is Laplace’s demon: if someone (the Demon) knows the precise location and momentum of every atom in the universe, their past and future values can be calculated from the laws of classical mechanics.

Despite uncertainty in quantum level, the macroscopic behaviour is deterministic (e.g. the rain cloud movement and the chess pawn movements are deterministic).


Success stories:

·         Weather forecast: its physics model is well known,  the PDEs can be solved with finite elements. Regarding improving accuracy, roughly speaking a 4D model (x,y,z,t) need 16x computational power to double grid resolution.


·         Games (e.g. baseball,  chess) are easier to forecast than stock markets/ politics: game rules are clear & consistent (not like man behaviour), and many data available. Chess is a deterministic game, can be solved with search algorithms.

Chaos

Deterministic dynamic  systems can be impossible to predict. The initial points difference (e.g. because of data truncation / noise) leads to huge different prediction points (due nonlinear nature) and for every (time)iteration the divergences become bigger & bigger so that the trajectories seem unpredictable.


For example,  despite successful weather modelling, the prediction is unreliable for more than 5 days horizon.

---------------

If it’s difficult to model it doesn’t mean you can neglect it.

For example: financial model of credit risk
·         The model assumes independent distribution, the investment-bank management add 50% margin but in fact the effect is much greater than was imaginable (e.g. 600%) .
·         Hedge fund firms advertise that their risks (volatility) are much lower than other risky assets. In fact the risk is not constant as this assumption, in the crisis event the volatility jumps much higher than its average.
These mistakes cost 15 trillion global wealth during financial crisis 2008.







Overfitting


Overfitting will more likely to happens when:
·         the data is noisy e.g. almost impossible to measure displacement/temperature 20km below earth accurately for earthquake prediction)
·         sparse event (e.g. earthquake/tsunami, terrorist attack, flu/ebola outbreak, economy crash)
·         when the models are not well understood or changing (e.g.  stock market)
·         when the model is complex  (e.g. earthquake)


 Bias

·         Different weather channels present government data (NOAA/KNMI) with their own adjustments, tend to more amplify rainy weather (e.g. in the image presentation) to avoid false negative punishment.
·         Investment firms tend to give positive recommendation, also due to asymmetric reward.

Non ethical bias








Image result for climate lobby

·         climate lobby
·         political polls
·         investment firms recommendations

Media bias

Media want news (TV weather, Politics, Ebola epidemic) to keep the media interesting so the most controversial statements (whether from political candidates or weathermen) will get more attention regardless how accurate are they.

Distinguish noise from signal




·         Select only variables that are most relevant, less noisy. Neglect noisy / difficult to measure variables.
·         Noise (e.g. false alarms) is difficult to distinguish from signals until it happened (e.g. intelligence reports before WTC911).





Communicate uncertainty clearly

When it was forecast that the water level will be 4.5 +/- 1m. People neglected evacuation order because the news reporter & audience thought  that the water would be 4.5m at max while the dike is 5m high so they would be safe.
Image result for dike flooding

No comments: