Web Picks (week of 20 March 2017)

Posted on March 27, 2017

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

Kaggle Joins Google Cloud
Big news in the past two weeks: Kaggle is joining Google Cloud!

Ideas on interpreting machine learning
Mix-and-match approaches for visualizing data and interpreting machine learning models and results. Excellent article!

How DeepMind’s Memory Trick Helps AI Learn Faster
While AI systems can match many human capabilities, they take 10 times longer to learn. Now, by copying the way the brain works, Google DeepMind has built a machine that is closing the gap.

Learning to communicate
Very interesting blog post from OpenAI in which agents develop their own language.

The Graveyard of Empires and Big Data
The Pentagon’s secret plan to crowdsource intelligence from Afghan civilians turned out to be brilliant — too brilliant.

The wired brain: how not to talk about an AI-powered future
The way we talk about AI is a mess. It starts with the most obvious, the imagery. Just like stock photos of happy people pointing at whiteboards were a symbol of the modern workplace, wired brains and robots have now come to represent “the AI”. But the visual messaging is only a small part of a much larger problem.

FairML: Auditing Black-Box Predictive Models
FairML is a python toolbox auditing the machine learning models for bias. “Predictive models are increasingly been deployed for the purpose of determining access to services such as credit, insurance, and employment. Despite societal gains in efficiency and productivity through deployment of these models, potential systemic flaws have not been fully addressed, particularly the potential for unintentional discrimination. This discrimination could be on the basis of race, gender, religion, sexual orientation, or other characteristics. This project addresses the question: how can an analyst determine the relative significance of the inputs to a black-box predictive model in order to assess the model’s fairness (or discriminatory extent)?”

Stich Fix’s algorithm tour
A beautiful explorable document detailing how data science is woven into the fabric of Stitch Fix. Great example of top-notch visualisation!

Apache Spark MLlib 2.x: How to Productionize your Machine Learning Models (presentation)
Apache Spark has rapidly become a key tool for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. The question then becomes, how do I deploy these model to a production environment? How do I embed what I have learned into customer facing data applications?

Clickbaits Revisited: Deep Learning on Title + Content Features to Tackle Clickbaits
Obtaining 99.2% accuracy on test data with title and content features.

SentencePiece is an unsupervised text tokenizer and detokenizer
SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training. Developed by Google.

How We’re Predicting AI—or Failing To (paper)
“This paper will look at the various predictions that have been made about AI and propose decomposition schemas for analyzing them. It will propose a variety of theoretical tools for analyzing, judging, and improving these predictions.”

The Case For HAL’s Sanity
A fun weekend read: “Some viewers of Stanley Kubrick’s film “2001: A Space Odyssey” have theorized that HAL, the computer genius turned villain of the spaceship Discovery, went mad during the Jupiter mission. However there is an alternative theory: that HAL acted rationally and logically, indeed with cold, calculating precision befitting a machine of his intelligence. This alternative theory will be presented here, with supporting evidence.”