Web Picks (week of 2 May 2016)

Posted on May 7, 2016

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

DeepMind moves to TensorFlow
The DeepMind team announces that they’ll start using TensorFlow in future projects and moving away from Torch.

The Humans Hiding Behind the Chatbots
Another article in the same vein as ‘Humans pretending to be computers pretending to be humans’, By Oscar Schwartz, this time from Bloomber’s Ellen Huet. Again, the article talks about “the actual people behind virtual assistants, reading e-mails and ordering Chipotle.”

How to Prevent a Plague of Dumb Chatbots?
Staying in the world of AI bots: MIT Technology Review notes that “the best (and least annoying) chatbots will be those that recognize their limitations and occasionally turn to humans for help.”

Explorable explanations
What if a book didn’t just give you old facts, but gave you the tools to discover those ideas for yourself, and invent new ideas, or, while reading a blog post, you could insert your own knowledge, challenge the author’s assumptions, and build things the author never even thought of… all inside the blog post itself? Explorable explanations is an attempt at answering some of those questions, and it’s a great way to visualize ideas and concepts.

Back to the Future of Hand riting Recognition
The reason why we mention “exporable explanations”: this post describes and shows how the Graphical Input Language software system (GRAIL) worked, a handwriting recognition system from fifty years ago!

Thought Experiments in the Browser
Another recent post capturing the idea of explorable explanations, using agent-based visualisations.

The amazing power of word vectors
“For today’s post, I’ve drawn material not just from one paper, but from five! The subject matter is ‘word2vec’ – the work of Mikolov et al. at Google on efficient vector representations of words (and what you can do with them).”

Modern pandas
Best practices and more in this modern pandas series. Worth a read for anyone starting out with pandas.

OpenAI Gym
OpenAI announces Gym: a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Go.

Demystifying Deep Reinforcement Learning
Wondering what is meant with deep reinforcement learning as used in OpenAI Gym? This post provides a good starting point on Q-learning and more!

Dat is an open source, decentralized data tool for distributing datasets small and large
Dat is a new, grant-funded, open-source, decentralized data sharing tool for efficiently versioning and syncing changes to data.

csv-schema: analyze a CSV file and generates database table schema, all within the browser
This application parses CSV files (including huge ones) within the browser. It analyzes each field to suggest the best database field type, max length, and whether or not there are any null values. From there, you can rename fields, ignore them, override field types/lengths, etc. and generate database table creation sql for MySQL, MariaDB, Postres, Oracle, or SQLite3.

Your Friendly Guide to Colors in Data Visualisation
A light article on choosing colors in data visualisations.

TensorFlow Examples
Code examples for some popular machine learning algorithms, using TensorFlow library. This tutorial is designed to easily dive into TensorFlow, through examples. It includes both notebook and code with explanations.

Cookiecutter Data Science: a logical, reasonably standardized, but flexible project structure for doing and sharing data science work
Very interesting article outlining a sensible project structure for data science projects. The author makes some insightful points, such as the dangers of notebooks in production (they are for communication and exploration) and the fact that analysis is a DAG. Worth a read!

Sorry ARIMA, but I’m Going Bayesian
In which the author makes the cases for Bayesian structural time series models versus ARIMA.

6 Lesser Known Python Data Analysis Libraries
Discusses prettytable, vincent, tinydb, natsort, delorean and mrjob.

BetaGo: AlphaGo for the masses
BetaGo lets you run your own Go engine. It downloads Go games for you, preprocesses them, trains a model on data, for instance a neural network using keras, and serves the trained model to an HTML front end, which you can use to play against your own Go bot.

Movidius Announces Deep Learning Accelerator and Fathom Software Framework
For when your neural network just doesn’t train fast enough: another compute USB stick, this time for deep learning.

On Nested Models
“Current data science practice is quietly losing statistical power through inappropriate re-used of data in different stages of the process (the analyst looking, variable pruning, variable treatment, dimension reduction, an so on).”

A Different Approach to Low-Rank Matrix Completion: Part 1

Sketch Simplification
“We present a novel technique to simplify sketch drawings based on learning a series of convolution operators. In contrast to existing approaches that require vector images as input, we allow the more general and challenging input of rough raster sketches such as those obtained from scanning pencil sketches.”

Containerized Data Science and Engineering – Part 1, Dockerized Data Pipelines
This is part 1 of a two part series of blog posts about doing data science and engineering in a containerized world.

Where Will Your Country Stand in World War III?
“In the recent Panama Papers scandal, journalists analyzed 11.5 million documents using network graphs to trace the use of offshore tax structures. In this chapter, we use a network graph technique called Social Network Analysis (SNA) to map weapons transfer between countries. By analyzing bilateral weapons trade, a network of multilateral ties can be distilled, providing insights into the complex arena of international politics.”

AlphaGo under a Magnifying Glass
Where to author zooms in on the workings of AlphaGo.

Machine Learning Meets Economics, Part 2
Follow-up to the first part, zooming in on classification where a reject option is available, i.e. where the classifier can choose not to classify an instance.

Solving 4×4 KenKen Puzzles with Computer Vision