Web Picks (week of 1 August 2016)

Posted on August 9, 2016

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

The 4 Mistakes Most Managers Make with Analytics
HBR identifies four common mistakes managers make when it comes to data.

It’s ML, not magic: simple questions you should ask to help reduce AI hype
This post aims to outlay a few simple rules that, without restricting one’s ability to “dream big”, should be able to cull the most ludicrous examples of AI-prefixing.

Deep Reinforcement Learning: Pong from Pixels
Introductory post showing off reinforcement learning.

Don’t Replace People. Augment Them.
If we let machines put us out of work, it will be because of a failure of imagination and the will to make a better future!

JupyterLab: the next generation of the Jupyter Notebook
“It’s been a long time in the making, but today we want to start engaging our community with an early (pre-alpha) release of the next generation of the Jupyter Notebook application, which we are calling JupyterLab.”

Approaching (Almost) Any Machine Learning Problem
Abhishek Thakur, a Kaggle Grandmaster, outlines some general concepts on the process of data science.

Google Cuts Its Giant Electricity Bill With DeepMind-Powered AI
Google is using technology from the DeepMind artificial intelligence subsidiary for big savings on the power consumed by its data centers, according to DeepMind Co-Founder Demis Hassabis.

A neural network tried to write a 9th Harry Potter book, and the results are hilarious
Do you remember that memorable scene in the Harry Potter books when a person seeking revenge on Ron turns out to be Dumbledore hiding behind a cream cake? You don’t?

The rectangularness of countries
“A Facebook friend recently noted that Turkey was “a remarkably rectangular country.” I wondered how it compared to other countries, and this post shows my answers.”

This mind-blowing photo app makes Instagram’s filters look so lame
Neural network style transfer gets its first customer-oriented apps.

The UX Secret That Will Ruin Apps For You
Facebook servers crunch your data in milliseconds, but the user interface takes longer to load. That’s by design.

An Introduction to Model-Based Machine Learning
“This blog post follows my journey from traditional statistical modeling to Machine Learning (ML) and introduces a new paradigm of ML called Model-Based Machine Learning (Bishop, 2013). Model-Based Machine Learning may be of particular interest to statisticians, engineers, or related professionals looking to implement machine learning in their research or practice.”

Critical Behavior from Deep Dynamics: A Hidden Dimension in Natural Language (paper)
“We show that in many data sequences – from texts in different languages to melodies and genomes – the mutual information between two symbols decays roughly like a power law with the number of symbols in between the two. In contrast, we prove that Markov/hidden Markov processes generically exhibit exponential decay in their mutual information, which explains why natural languages are poorly approximated by Markov processes.”

Why I’m Not a Fan of R-Squared
“People sometimes use R2 as their preferred measure of model fit. Unlike quantities such as MSE or MAD, R2 is not a function only of model’s errors, its definition contains an implicit model comparison between the model being analyzed and the constant model that uses only the observed mean to make predictions. As such, R2 answers the question: does my model perform better than a constant model? But we often would like to answer a very different question: does my model perform worse than the true model?”

I Accidentally Some Machine Learning – My Story of A Month of Learning Elixir

The New Data Scientist Venn Diagram
Where do you fall in the new Venn Diagram?

Data Scientist’s Toolbox for Data Infrastructure I

Spark ADMM
The code in this repository provides a framework for solving arbitrary separable convex optimization problems with Alternating Direction Method of Multipliers (ADMM). In particular, the algorithm implemented is the generalized consensus algorithm described in the paper Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein.

Apartment Prices in Finland
With all the data science and big data hype going on it’s always nice to see real case examples. At Reaktor we have created Kannattaakokauppa.fi, a probabilistic modeling-based interactive visualisation of regional apartment price trends in Finland.