Web Picks (week of 15 May 2017)

Posted on May 19, 2017

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

AI-powered trading raises new questions
The growth in AI-directed investing could have radical consequences, especially in a scenario where a single investor or investment fund using proprietary AI is able to secure an unfair advantage over other market actors. Call it “stock market singularity.” And the groundwork for such an occurance has already been laid.
Magic AI: These are the Optical Illusions that Trick Computers
Very interesting article about the dangers and impact of adversarial machine learning.
Resisting the Habits of the Algorithmic Mind
Clearly, it’s important that we grapple with the power of algorithms, real and imagined, but where do we start?
Understanding deep learning requires re-thinking generalization
What is it that distinguishes neural networks that generalize well from those that don’t? A satisfying answer to this question would not only help to make neural networks more interpretable, but it might also lead to more principled and reliable model architecture design.
The Myth of Superhuman AI
We’ve read that, in the future, computerized AIs will become so much smarter than us that they will take all our jobs and resources, and humans will go extinct. But is this true?
The mind in the machine: Demis Hassabis on artificial intelligence
The co-founder of DeepMind explains how AI will help us make unimaginable leaps in understanding the world.
The great British Brexit robbery: how our democracy was hijacked
A shadowy global operation involving big data, billionaire friends of Trump and the disparate forces of the Leave campaign influenced the result of the EU referendum. As Britain heads to the polls again, is our electoral process still fit for purpose?
Using Deep Learning at Scale in Twitter’s Timelines
Twitter explains how their ranking algorithm is powered by deep neural networks.
Hidden Technical Debt in Machine Learning Systems (pdf paper)
“Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt, we find it is common to incur massive ongoing maintenance costs in real-world ML systems. We explore several ML-specific risk factors to account for in system design. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns.”
Facebook presents a novel approach to neural machine translation
The Facebook Artificial Intelligence Research (FAIR) team published research results using a novel convolutional neural network (CNN) approach for language translation that achieves state-of-the-art accuracy at nine times the speed of recurrent neural systems.
ParlAI (pronounced “par-lay”) is a framework for dialog AI research, implemented in Python
From Facebook Research: a new toolkit for training and testing dialog models.
Overkill Analytics (presentation)
OKA favors volume over precision, utility over elegance, and CPU over IQ… hmm.
Navigating the Unsupervised Learning Landscape
Interesting review of state-of-art unsupervised ML approaches.
What if Google searches were used to award points in the Eurovision Song Contest?
Seems like this was pretty close to the actual result!
Scaling Airbnb’s Experimentation Platform
Airbnb talks about their experimentation reporting platform.
The Blockchain Immutability Myth
If you ask someone well-informed about the characteristics of blockchains, the word “immutable” will invariably appear in the response: once a blockchain transaction has received a sufficient level of validation, some cryptography ensures that it can never be replaced or reversed. This marks blockchains as different from regular files or databases, in which information can be edited and deleted at will. Or so the theory goes…
Bringing interactive BI to big data
“SQL on Hadoop is continuously improving, but it’s still common to wait minutes to hours for a query to return. In this post, we will discuss the open source distributed analytics engine Apache Kylin and examine how it speeds up big data query orders for interactive BI.”
A Guide to Time Series Forecasting with Prophet in Python 3
This guide will cover how to do time series analysis on either a local desktop or a remote server using Twitter’s Prophet library.
Network Dissection: Quantifying Interpretability of Deep Visual Representations
“Network Dissection is our method for quantifying interpretability of individual units in a deep CNN (i.e., our answer to question #1). It works by measuring the alignment between unit response and a set of concepts drawn from a broad and dense segmentation data set called Broden.”
Word2GM (Word to Gaussian Mixture)
Word2vec? Meet Word2GM: “We represent each word in the dictionary as a Gaussian Mixture distribution and train it using a max-margin objective based on expected likelihood kernel energy function.”
Flight Paths Edge Bundling
Neat d3.js example for flight path visualisation.
Mosaic: Processing a Trillion-Edge Graph on a Single Machine (pdf presentation)
Interesting graph analytics presentation from the Georgia Institute of Technology.
Top 15 Python Libraries for Data Science in 2017
Not a lot of surprising entries in here, but an interesting starters-overview nonetheless.
Griffon Data Science Virtual Environment (coming soon)
Griffon is a VM environment for data science, based on Ubuntu MATE and including numerous data science tools, all installed and configured for immediate use.
100 Days of Algorithms
Medium blog collections showcasing 100 algorithms, day by day.
Machine Learning with Robotics (pdf)
Huge pdf of ML with robotics lecture notes (124 pages).