Web Picks (week of 3 October 2016)

Posted on October 11, 2016

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

Google swallows 11,000 novels to improve AI’s conversation
As writers learn that tech giant has processed their work without permission, the Authors Guild condemns ‘blatantly commercial use of expressive authorship’.

Ten Myths About Machine Learning by Pedro Domingos
Some important warnings contained herein.

Announcing YouTube-8M: A Large and Diverse Labeled Video Dataset for Video Understanding Research
Google announces the release of YouTube-8M, a dataset of 8 million YouTube video URLs (representing over 500,000 hours of video), along with video-level labels from a diverse set of 4800 Knowledge Graph entities.

What is hardcore data science—in practice?
The anatomy of an architecture to bring data science into production. Great article for anyone trying to understand the issues in bring data science from development to production.

The Fundamental Limits of Machine Learning
“As a human, the challenge is to find any pattern at all. Of course, we have intuitions that limit our guesses. But computers have no such intuitions. From a computer’s standpoint, the difficulty in pattern recognition is one of surplus: with an endless variety of patterns, all technically valid, what makes one “right” and another “wrong?””

TensorFlow for R
A new package providing access to the complete TensorFlow API from within R. Also see sparklyr for a great R interface for Apache Spark, including dplyr operations.

Generative Visual Manipulation on the Natural Image Manifold
“Realistic image manipulation is challenging because it requires modifying the image appearance in a user-controlled way, while preserving the realism of the result. Unless the user has considerable artistic skill, it is easy to “fall off” the manifold of natural images while editing. In this paper, we propose to learn the natural image manifold directly from data using a generative adversarial neural network.”

Instagram photos reveal predictive markers of depression (pdf)
“Using Instagram data from 166 individuals, we applied machine learning tools to successfully identify markers of depression.”

traces – A Python library for unevenly-spaced time series analysis
Taking measurements at irregular intervals is common, but most tools are primarily designed for evenly-spaced measurements. Also, in the real world, time series have missing observations or you may have multiple series with different frequencies: it’s can be useful to model these as unevenly-spaced.

A Neural Network for Machine Translation, at Production Scale
Google has been busy: “Today we announce the Google Neural Machine Translation system (GNMT), which utilizes state-of-the-art training techniques to achieve the largest improvements to date for machine translation quality.”

Deep learning goes wide
A company called Bonsai joins a movement to democratize machine learning. Get ready to build your own neural net.

Generating Faces with Deconvolution Networks
Similar setup: using deep learning to generate faces.

Prof. Yann LeCun – Deep Learning and the Future of AI (Youtube)
Recording of LeCun’s recent talk on deep learning.

AI generates abstract diagrams of IQ tests as good as 10th grade students
Amazing work: “So, we can say that our model is very good generator and comparable to even 10th grade humans. An interesting aspect is that the model is never trained on the correct answers, it is just trained on multiple sequences from the problem images and still performs remarkably well.”

Image Compression with Neural Networks
In “Full Resolution Image Compression with Recurrent Neural Networks”, Google expands on previous research on data compression using neural networks, exploring whether machine learning can provide better results for image compression like it has for image recognition and text summarization.

Microsoft expands artificial intelligence (AI) efforts with creation of new Microsoft AI and Research Group
Computer vision luminary Harry Shum to lead more than 5000 (!) people worldwide.

Bitmap & tilemap generation from a single example with the help of ideas from quantum mechanics.
Not really data science, but an impressive generative algorithm nonetheless!

Stealing Machine Learning Models via Prediction APIs
Interesting new research on “model extraction attacks”: where an adversarial client learns a close approximation of your model using as few queries as possible.

The great question of the 21st century: Whose black box do you trust?
Interesting read: “A lot of attention has been paid to the role of algorithms in shaping the experience of consumers. Much less attention has been paid to the role of algorithms in shaping the incentives for business decision making.”

Scalable Stream Processing: A Survey of Storm, Samza, Spark and Flink
“With this article, we would like to share our insights on real-time data processing we gained building Baqend.”

Three Challenges for Artificial Intelligence in Medicine
“Why is the world’s most advanced AI used for cat videos, but not to help us live longer and healthier lives? A brief history of AI in Medicine, and the factors that may help it succeed where it has failed before.”

Running Graph Analytics with Spark GraphFrames: A Simple Example
“Below, I will cover how to setup PyCharm to run a standalone Spark application using GraphFrames, which where recently released with Apache Spark’s 2.0 distribution.”