Web Picks (week of 16 May 2016)

Posted on May 23, 2016

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

Researchers just released profile data on 70,000 OkCupid users without permission
One of the bigger stories of the past weeks. Or: how not to publish research…

Announcing SyntaxNet: The World’s Most Accurate Parser Goes Open Source
One of the other big news facts: Google releases SyntaxNet, an open-source neural network framework implemented in TensorFlow that provides a foundation for Natural Language Understanding (NLU) systems.

Let’s stop writing like it’s 1995
Promising article on the upcoming, AI-driven advances in word processing and writing. Many of our PhD’s are looking forward to “if you replace this word by ****, more reviewers will accept your paper”-type recommendations. Also see Writing with the machine in the same vein.

Extreme Style Machines: Using Random Neural Networks to Generate Textures
Wait, what! Generating high-quality images based on completely random neural networks? That’s the unreasonable effectiveness of deep representations…

Artistic style transfer for videos (YouTube)
Supplementary video accompanying the paper “Artistic style transfer for videos” by Manuel Ruder, Alexey Dosovitskiy and Thomas Brox http://arxiv.org/abs/1604.08610 — impressive work!

Number plate recognition with Tensorflow
Fun read on recognizing number plates using deep learning.

Using Machine Learning to Predict Out-Of-Sample Performance of Trading Algorithms
Interesting article on backtesting: “[…] it became clear that while the Sharpe ratio of a backtest was a very weak predictor of the future performance of a trading strategy, we could instead […] train a classifier on a variety of features to predict out-of-sample performance with much higher accuracy.”

sqlbiter
sqlitebiter is a CLI tool to convert CSV/JSON/Excel/Google-Sheets to a SQLite database.

Why rainbow colour scales can be misleading
It is because rainbow scales are not ‘perceptually uniform’ – they create sharp artificial boundaries between colours (particularly involving yellow) that are not necessarily representative of the underlying data.

Shot Blocking in the NHL Playoffs
Interesting NHL analysis post.

The R Data I/O Shootout
Domino Labs pits newcomer R data I/O package, feather, against popular packages data.table, readr, and the venerable saveRDS/writeRDS functions from base R.

Evaluating Hyperparameter Optimization Strategies
Hyperparameter optimization is a common problem in machine learning. Machine learning algorithms, from logistic regression to neural nets, depend on well tuned hyperparameters to reach maximum effectiveness. Different hyperparameter optimization strategies have varied performance and cost (in time, money, and compute cycles.) So how do you choose?

How to get into the top 15 of a Kaggle competition using Python
The Expedia Kaggle is making the rounds, and this blog post discusses some potential strategies.

Minecraft, ENHANCE! Neural Networks to Upscale & Stylize Pixel Art
How about taking pixelated graphics and using a neural network to increase their resolution, using example photos or textures?

TakeNote: Transforming online lectures into customized course notes
Very impressive work.