Wednesday, 23 October 2013

Data Science for business [book review]

Data Science for Business: What you need to know about data mining and data-analytic thinking
Foster Provost and Tom Fawcett
O'Reilly Media 

Data science is the new best thing, but like Aristotle’s elephant people study to define 
exactly what data science is and what the skills required are.

When we see data science we tend to recognise what it is, that mixture 
of analysis, inference and logic  that pulls information out of numbers, be it social 
network analysis, plotting interest in a topic over time, or predicting the impact of the 
weather on supermarket stock levels.

This book serves as an introduction to the topic. It’s designed for use as a 
college textbook and perhaps  aimed at business management courses. It starts at a very 
low level, assuming little or no knowledge of statistics or of any of the more advanced 
techniques such as cluster analysis or topic modelling.

If all you ever do is read the first two chapters you’ll come away with enough 
high level knowledge to fluff your way through a job interview as long as you’re 
not expected to get your hands dirty.

Chapter three and things get a bit more rigorous. The book noticably changes 
gear and takes you through some fairly advanced mathematics, discussing 
regression, cluster analysis and the overfitting  of mathematical models, all of 
which are handled fairly well

It’s difficult to know where this book sits. The first two chapters are most 
definitely ‘fluffy’, the remainder demand some knowledge of probability theory 
and statistics of the reader, plus an ability not to be scared by equations embedded 
in the text.

It’s a good book, it’s a useful book. It probably asks too much to be ideal for the 
general reader or even the non numerate graduate, I’d position it more as an 
introduction to data analysis for beginning researchers and statisticians more than 
anything else, rather than as a backgrounder on data science.

[originally written for LibraryThing]

No comments: