Web Picks (week of 29 May 2017)

Posted on June 2, 2017

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

Google’s AlphaGo Defeats Chinese Go Master in Win for A.I.
Google’s AlphaGo does it again and defeated the world’s best human GO champion, 3-0. The New York Times covered the first game with some thoughts about AlphaGo’s performance as well as the politics behind the match.
Garry Kasparov on the interplay between machine learning and humans
Meanwhile, Garry Kasparov, one of the greatest chess players of all time, was recently interviewed about AI and how things have evolved since his defeat to Deep Blue in 1997.
How A Data Scientist Can Improve Productivity
From KDnuggets: Data Science projects involve iterative processes and may need changes in data at every iteration. But Data versioning, data pipelines and data workflows make Data Scientist’s life easy, let’s see how.
New Leader, Trends, and Surprises in Analytics, Data Science, Machine Learning Software Poll
KDnuggets has announced the results of its 18th annual data science poll. “Python caught up with R and (barely) overtook it; Deep Learning usage surges to 32%; RapidMiner remains top general Data Science platform; Five languages of Data Science.” Very interesting results and more trustworthy than your typical Gartner report.
Everything that Works Works Because it’s Bayesian: Why Deep Nets Generalize?
Great article: “For too long we Bayesians have, quite arrogantly, dismissed deep neural networks as unprincipled, dumb black boxes that lack elegance. We said that highly over-parametrised models fitted via maximum likelihood can’t possibly work, they will overfit, won’t generalise, etc.”
What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? (paper)
Speaking of Bayesian inference: this paper is worth a read. “Traditionally it has been difficult to model epistemic uncertainty in computer vision, but with new Bayesian deep learning tools this is now possible. We study the benefits of modeling epistemic vs. aleatoric uncertainty in Bayesian deep learning models for vision tasks.”
Using Machine Learning to Explore Neural Network Architecture
“At Google, we have successfully applied deep learning models to many applications. Typically, our machine learning models are painstakingly designed by a team of engineers and scientists. This process of manually designing machine learning models is difficult because the search space of all possible models can be combinatorially large. For this reason, the process of designing networks often takes a significant amount of time and experimentation by those with significant machine learning expertise.”
Democratizing Data at Airbnb
“However [data] also creates a new challenge: effectively navigating a sea of data resources of varying quality, complexity, relevance, and trustworthiness. In this post we describe our observation of this problem and the Dataportal, a novel data resource search and discovery tool that addresses this issue.”
Automated Machine Learning — A Paradigm Shift That Accelerates Data Scientist Productivity
Another article from AirBnb’s team that has been making the rounds.
Deploying Machine Learning Models to Production (Slideshare)
Machine learning techniques are powerful, but building and deploying such models for production use require a lot of care and expertise.
DeepChatModels: Conversation models in TensorFlow
Interesting project building a Chatbot using TensorFlow.
Learning Deep Nearest Neighbor Representations Using Differentiable Boundary Trees (paper)
“We introduce a new method called differentiable boundary tree which allows for learning deep kNN representations. We build on the recently proposed boundary tree algorithm which allows for efficient nearest neighbor classification, regression and retrieval. By modelling traversals in the tree as stochastic events, we are able to form a differentiable cost function which is associated with the tree’s predictions. Using a deep neural network to transform the data and back-propagating through the tree allows us to learn good representations for kNN methods.”
How does Airbnb’s Airflow compare to Spotify’s Luigi?
Interesting discussion on which task manager to use.
NotHotdog-Classifier
“Do you watch HBO’s silicon valley? Because I do and I was inspired by Mr. Jian-Yang to make my own not hotdog classifier”