Web Picks (week of 9 July 2018)

Posted on July 19, 2018

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

Deploying Machine Learning at Scale
A great article that accurately pinpoints the unique challenges of machine learning deployment compared to traditional software projects.
High-Skilled White-Collar Work? Machines Can Do That, Too
“One of the best-selling T-shirts for the Indian e-commerce site Myntra is an olive, blue and yellow colorblocked design. It was conceived not by a human but by a computer algorithm — or rather two algorithms.”
The rise of ‘pseudo-AI’: how tech firms quietly use humans to do bots’ work
“Using what one expert calls a ‘Wizard of Oz technique’, some companies keep their reliance on humans a secret from investors”
Layoffs at Watson Health Reveal IBM’s Problem With AI
Axed engineers say IBM isn’t always smart about artificial intelligence
Yes, Amazon Is Tracking People
“According to documents obtained by American Civil Liberties Union affiliates in three states, Amazon is providing police departments in Orlando, Fla., and Washington County, Ore., with powerful facial recognition technology.”
An AI system for editing music in videos
“Given a video of a musical performance, CSAIL’s deep-learning system can make individual instruments louder or softer.”
AI Can Smell Illnesses in Human Breath
“The sense of smell is used by animals and even plants to identify hundreds of different substances that float in the air. But compared to that of other animals, the human sense of smell is far less developed and certainly not used to carry out daily activities.”
The unreasonable effectiveness of Deep Learning Representations
Building an image search service from scratch.
Facebook Patent Imagines Triggering Your Phone’s Mic When a Hidden Signal Plays on TV
The patent is titled “broadcast content view analysis based on ambient audio recording.”
AdamW and Super-convergence is now the fastest way to train neural nets
“When you hear people saying that Adam doesn’t generalize as well as SGD+Momentum, you’ll nearly always find that they’re choosing poor hyper-parameters for their model. Adam generally requires more regularization than SGD, so be sure to adjust your regularization hyper-parameters when switching from SGD to Adam.”
A New Angle on L2 Regularization
A beautiful explorable exploration talking about regularization, overfitting, and adversarial attacks using easy to understand concepts. Recommended read, and illustrates why even non-deep models might be prone to attacks and overfit.
How to make MongoDB not suck for analytics
“TLDR: Row stores are fast to write but slow to read. Column stores are fast to read but slow to write. Load data from Mongo into Parquet files for fast querying using AWS Athena.”
Making beautiful maps with Rayshader (part 2)
“There was some great initial positive feedback, but also a bunch of notes from a few people describing what they thought was missing. One person was aghast that my raytracer didn’t have Lambertian reflectance, which is the technical term for the fact that surfaces pointing towards the light are brighter than those askew. Reasonable suggestion, I thought, so–I’ll implement it.”
shap: A unified approach to explain the output of any machine learning model
Another one after eli5, Skater, LIME, and DALEX, though this one looks very powerful as well!
Pandas on Ray – Early Lessons from Parallelizing Pandas
“In our last blog post, we introduced Pandas on Ray with some preliminary progress for making Pandas workflows faster by requiring only a single line of code change. Since then, we have received a lot of feedback from the community and in response we worked to significantly improve the functionality and performance. In this blog post, we will go over a few of the lessons we learned along the way and talk about performance and how we plan to continue improving the library moving forward.”
Introducing plotly.py 3.0.0
“I don’t know of any that can both display a million points this quickly, and support zooming, panning, and selection at interactive speed.”
Tensorflow: The Confusing Parts
“I’m writing this blog post as a message-in-a-bottle to my former self: it’s the introduction that I wish I had been given before starting on my journey. Hopefully, it will also be a helpful resource for others.”
Customising Airflow: Beyond Boilerplate Settings
“Apache Airflow is a data pipeline orchestration tool. It helps run periodic jobs that are written in Python, monitor their progress and outcome, retry failed jobs and convey events in a colourful and concise Web UI. In this blog post I’m going to walk through six features that should prove helpful in customising your Airflow installations.”
uwot: An R package implementing the UMAP dimensionality reduction method.
An R implementation of the Uniform Manifold Approximation and Projection (UMAP) method for dimensionality reduction (McInnes and Healy, 2018).
robosat: Semantic segmentation on aerial and satellite imagery.
Extracts features such as: buildings, parking lots, roads, water.
mlflow: An open source platform for the complete machine learning lifecycle
Mainly geared towards the development-end currently, though monitoring is rumored to be worked on as well!
onnx: Open Neural Network Exchange
Similar to PMML, ONNX provides an open source common format for deep learning models.
hnatt: Train and visualize Hierarchical Attention Networks
This is a Keras implementation of the Hierarchical Network with Attention architecture (Yang et al, 2016), and comes with a webapp for easily interacting with the trained models.
The Matrix Calculus You Need For Deep Learning
“This paper is an attempt to explain all the matrix calculus you need in order to understand the training of deep neural networks. We assume no math knowledge beyond what you learned in calculus 1, and provide links to help you refresh the necessary math where needed.”
Keras or PyTorch as your first deep learning framework
“We strongly recommend that you pick either Keras or PyTorch. These are powerful tools that are enjoyable to learn and experiment with.”
OpenAI Five
“Our team of five neural networks, OpenAI Five, has started to defeat amateur human teams at Dota 2.”