Web Picks (week of 18 November 2019)

Posted on November 25, 2019

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

The never-ending issues around AI and bias
Who’s to blame when AI goes wrong?
A.I. Systems Echo Biases They’re Fed, Putting Scientists on Guard
Experts claim BERT could be picking up on and mimicking biases that are found in the sources it learns from — potentially decades of biases.
How A Week With Chauffeurs Showed The Major Flaw In Our Self-Driving Car Future
“The industry was making big promises about how great self-driving cars would be for society, and those claims were attracting billions of dollars of research from the world’s biggest companies to make the technology a reality.”
How 20th Century Fox uses ML to predict a movie audience
“When it comes to movies, analyzing text taken from a script is limiting because it only provides a skeleton of the story, without any of the additional dynamism that can entice an audience to see a movie. The team wondered if there was some way to use modern, advanced computer vision to study movie trailers.”
Spleeter is the Deezer audio source separation library
This neural network can separate out vocals, drums, bass and piano from songs.
Query2vec: Search query expansion with query embeddings
“At Grubhub we leverage recent advancements in Representation Learning — namely Sequential Modeling and Language Modeling — to learn a Latent Food Graph.”
How To Turn Physics into an Optimization Problem?
This post is mostly about a tool called Lagrangian Mechanics which lets you solve physical problems like an optimization problem.
OpenAI releases their largest GPT-2 model
Want to see it in action? Check https://transformer.huggingface.co/doc/gpt2-large
Time Series Prediction – A short introduction for pragmatists
Are you trying to predict time series but don’t know where to start? This blog post will provide a comparison of the most prominent techniques and show you how to implement them.
Bar Chart Race, Explained in d3
This is a pedagogical implementation of an animated bar chart race. Read on to learn how it works, or fork this notebook and drop in your data!
Naïve Bayes for Machine Learning – From Zero to Hero
The Bayes theorem is a lot more than just a theorem based on conditional probability.
bamboolib, a GUI for pandas
Sadly not a free offering.
Don’t Take Their Word For It: The Misclassification of Bond Mutual Funds
We provide evidence that mutual fund managers misclassify their holdings, and that these misclassifications have a real and significant impact on investor capital flows.
Audio samples from “Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”
This is an older one, but the results are still amazing… maybe used by the article below?
Scammers deepfake CEO’s voice to talk underling into $243,000 transfer
“Welcome to a hybrid version of those hoodwinks: deepfake audio, which was recently used in what’s considered to be the first known case of an AI-generated voice of a CEO to bilk a UK-based energy firm out of €220,000 (USD $243,000).”
How to turn the complex mathematics of vector calculus into simple pictures
Feynman diagrams revolutionized particle physics. Now mathematicians want to do the same for vector calculus.
Teaching a neural network to use a calculator
This article explores a seq2seq architecture for solving simple probability problems in Saxton et. al.’s Mathematics Dataset. A transformer is used to map questions to intermediate steps, while an external symbolic calculator evaluates intermediate expressions.
Engineering Uncertainty Estimation in Neural Networks for Time Series Prediction at Uber
Uncertainty estimation in deep learning remains a less trodden but increasingly important component of assessing forecast prediction truth in LSTM models.
Scientists have found a way to decode brain signals into speech
It’s a step towards a system that would let people send texts straight from their brains.
Effectively Using Matplotlib
“The python visualization world can be a frustrating place for a new user. There are many different options and choosing the right one is a challenge.”
Neurons spike back
This article retraces the history of artificial intelligence through the lens of the tension between symbolic and connectionist approaches.
Detecting Glaucoma Using 3D Convolutional Neural Network of Raw SD-OCT Optic Nerve Scans
“We propose developing and validating a three-dimensional (3D) deep learning system using the entire unprocessed OCT optic nerve volumes to distinguish true glaucoma from normals in order to discover any additional imaging biomarkers within the cube through saliency mapping.”