Web Picks (week of 27 February 2017)

Posted on March 12, 2017

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

Analytics as a Source of Business Innovation
The 2017 Data & Analytics Report by MIT Sloan Management Review finds that the percentage of companies deriving competitive advantage from analytics increased for the first time in four years.

Inside Facebook’s AI Machine
The Applied Machine Learning group helps Facebook see, talk, and understand. It may even root out fake news.

Hadoop Is Falling – Why?
“Three years ago, looking beyond Hadoop was insanity, and there was little else that could come close. Recently, adoption of Hadoop has slowed down considerably. We examine why.”

Will Democracy Survive Big Data and Artificial Intelligence?
“We are in the middle of a technological upheaval that will transform the way society is organized. We must make the right decisions now.”

PathNet: a new Modular Deep Learning (DL) architecture
PathNet is a new Modular Deep Learning (DL) architecture, brought to you by who else but DeepMind, that highlights the latest trend in DL research to meld Modular Deep Learning, Meta-Learning and Reinforcement Learning into a solution that leads to more capable DL systems.

DeepCoder: Learning to Write Programs (paper)
“We develop a first line of attack for solving programming competition-style problems from input-output examples using deep learning. The approach is to train a neural network to predict properties of the program that generated the outputs from the inputs.” Just don’t say it “steals code from StackOverflow“.

Is attacking machine learning easier than defending it?
Interesting post on adversarial training in machine learning.

Facebook scales back AI flagship after chatbots hit 70% f-AI-lure rate
Facebook has scaled back its ambitions and refocused its application of “artificial intelligence” after its AI bots hit a 70 per cent failure rate.

Google Research: Assisting Pathologists in Detecting Cancer with Deep Learning
“The prediction heatmaps produced by the algorithm had improved so much that the localization score (FROC) for the algorithm reached 89%, which significantly exceeded the score of 73% for a pathologist with no time constraint.”

Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US
“Here, we present a method that determines socioeconomic trends from 50 million images of street scenes, gathered in 200 American cities by Google Street View cars. Using deep learning-based computer vision techniques, we determined the make, model, and year of all motor vehicles encountered in particular neighborhoods.”

In fintech, China shows the way
By just about any measure of size, China is the world’s leader in fintech (short for “financial technology”, and referring here to internet-based banking and investment). It is far and away the biggest market for digital payments, accounting for nearly half of the global total.

IPython Or Jupyter?
Great post by Datacamp on data science notebooks.

Prophet: forecasting at scale
Facebook is open sourcing Prophet, a forecasting tool available in Python and R. Forecasting is a data science task that is central to many activities within an organization. For instance, large organizations like Facebook must engage in capacity planning to efficiently allocate scarce resources and goal setting in order to measure performance relative to a baseline.

Stream Analytics with SQL on Apache Flink (presentation)
Talk by Fabian Hueske on the future of Apache Flink’s relational APIs for stream analytics.

Discovering Anomalies in Real-Time with Apache Flink
“In Summer 2016, Mux began work on a system for real-time anomaly-detection and alerting. This is the challenge we’ve taken on and achieved with the help of Apache Flink.”

Surprise Maps: Showing the Unexpected
“For geographic data, our proposed solution is called a Surprise Map: a form of heat map that gives more weight to surprising data. The idea behind Surprise Maps is that when we look at data, we often have various models of expectation: things we expect to see, or not see, in our data. If we have these models, we can also measure deviation or difference from these models. This deviation is the unexpected, the data that surprise us. Such surprising data is sometimes important, and at the very least justifies follow-up analysis.”

Char2Wav: End-to-End Speech Synthesis
“We present Char2Wav, an end-to-end model for speech synthesis. Char2Wav has two components: a reader and a neural vocoder. The reader is an encoder-decoder model with attention. The encoder is a bidirectional recurrent neural network that accepts text or phonemes as inputs, while the decoder is a recurrent neural network (RNN) with attention that produces vocoder acoustic features.”

Deep Voice: Real-Time Neural Text-to-Speech for Production
Baidu Research presents Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. The biggest obstacle to building such a system thus far has been the speed of audio synthesis – previous approaches have taken minutes or hours to generate only a few seconds of speech. We solve this challenge and show that we can do audio synthesis in real-time, which amounts to an up to 400X speedup over previous WaveNet inference implementations.

Finding the most depressing Radiohead song with R
… using the Spotify and Genius Lyrics APIs.

Announcing ggraph: A grammar of graphics for relational data
“I am absolutely thrilled to announce that ggraph has finally been released on CRAN. ggraph is my most ambitious package to date and its very early genesis has been described in a prior post. If any mention of ggraph is completely new to you, then in short terms ggraph is an extension of the ggplot2 API to support relational data such as networks and trees.”

Predicting food preferences with sparklyr
“The question I want to address with machine learning is whether the preference for a country’s cuisine can be predicted based on preferences of other countries’ cuisines.”

Image-to-Image demo in the browser
The pix2pix model works by training on pairs of images such as building facade labels to building facades, and then attempts to generate the corresponding output image from any input image you give it.

Self-driving cars in the browser
The goal of this project was to create a fully self-learning agent, that would be able to control a car in a 2D bottom-down environment. Written solely in JavaScript.