Web Picks (week of 4 March 2019)

Posted on March 9, 2019

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

Focus: NLP

A lot has been going on in the world of data science and AI. There is an arms race going on in NLP, with lots of algorithms competing against each other.

It started with fast.ai introducing ULMFiT: a new method to classify documents by making heavy use of transfer learning
This is turn was followed by ELMo: which constructs representation that are contextual, deep and character based
Google then introduced BERT (Bidirectional Encoder Representations from Transformers), a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks (blog post)
This was followed by Microsoft’s MT-DNN, outperforming Google BERT
OpenAI then decided not to release their GPT-2 large scale model for fake text generation because it was deemed too dangerous, followed by lots of discussion: http://deliprao.com/archives/314, https://thegradient.pub/openai-please-open-source-your-language-model/, http://approximatelycorrect.com/2019/02/17/openai-trains-language-model-mass-hysteria-ensues/, https://arstechnica.com/information-technology/2019/02/twenty-minutes-into-the-future-with-openais-deep-fake-text-ai/
For a good summarizing read, this post from AllenNLP (a new NLP framework which has been rapidly making the rounds) is a good starting point

Focus: data and society

‘You can track everything’: the parents who digitise their babies’ lives
Socks that record heart rate and cots that mimic the womb might promise parents peace of mind – but is the data given to tech firms a fair exchange?
China bans 23m from buying travel tickets as part of ‘social credit’ system
People accused of social offences blocked from booking flights and train journeys
AI Safety Needs Social Scientists
“Properly aligning advanced AI systems with human values will require resolving many uncertainties related to the psychology of human rationality, emotion, and biases. These can only be resolved empirically through experimentation — if we want to train AI to do what humans want, we need to study humans.”
Nadella: Microsoft will sell war tech to democracies to “protect freedoms”
A growing number of employees feel that the military project crosses a line.
Life and society are increasingly governed by numbers
When everything is quantified, power accrues to whoever is keeping score
Algorithmic Justice Could Clear 250,000 Convictions in California
When Algorithms Think You Want to Die
Don’t Let Robots Pull the Trigger
Weapons that kill enemies on their own threaten civilians and soldiers alike
Farmworker vs Robot
Agricultural workers of the future may soon be made of tech and steel. Can a robot pick a strawberry better, faster, and cheaper than a seasonal farmworker?
How did the police know you were near a crime scene? Google told them
Google and Microsoft Warn That AI May Do Dumb Things
Artificial intelligence, algorithmic pricing, and collusion
China’s Tech Firms Are Mapping Pig Faces
As a devastating disease afflicts the country’s swine, companies are scrambling to roll out facial and voice recognition and other unproven ways to save them.

Focus: on management and other thoughts

Seven Myths in Machine Learning Research
Machine learning can boost the value of wind energy
Data science is different now
Facebook’s chief AI scientist: Deep learning may need a new programming language
SQL: One of the Most Valuable Skills
Machine Learning for Everyone
In simple words. With real-world examples
Don’t believe the hype: the media are unwittingly selling us an AI fantasy
Journalists need to stop parroting the industry line when it comes to artificial intelligence

Focus: reinforcement learning

Focus: tools and other news

xforest: A super-fast and scalable Random Forest library based on fast histogram decision tree algorithm and distributed bagging framework
Interpretable Machine Learning: A Guide for Making Black Box Models Explainable
Christoph Molnar has been wrapping up his fantastic book!
Lingvo: A TensorFlow Framework for Sequence Modeling
MLJ: A Julia machine learning framework
A Python implementation of LightFM, a hybrid recommendation algorithm
Microsoft AutoML: An open source AutoML toolkit for neural architecture search and hyper-parameter tuning
This Waifu Does Not Exist
“I describe how I made the website ThisWaifuDoesNotExist.net (TWDNE) for displaying random anime faces generated by StyleGAN neural networks, and how it went viral.”
Best Deep Learning Books: Updated for 2019
XKCD-style plots in Matplotlib (an oldie from 2012 that has resurfaced)
Beyond Local Pattern Matching: Recent Advances in Machine Reading
Amazon Personalize
Real-time personalization and recommendation, based on the same technology used at Amazon.com
AdaBound: An optimizer that trains as fast as Adam and as good as SGD
Photo-Sketching: Inferring Contour Drawings from Images
Forecasting in Python with Prophet
Forecasting is often considered a natural progression from reporting. Reporting helps us answer, what happened? Forecasting helps answer the next logical question, what will happen?
The why and how of nonnegative matrix factorization
Audio AI: isolating vocals from stereo music using Convolutional Neural Networks
The Unreasonable Effectiveness of Deep Feature Extraction
Terrapattern
“This is the alpha version of Terrapattern, a visual search tool for satellite imagery. The project provides journalists, citizen scientists, and other researchers with the ability to quickly scan large geographical regions for specific visual features.”
Introducing Ludwig, a Code-Free Deep Learning Toolbox
Datashader: Turns even the largest data into images, accurately
Spektral: Deep learning on graphs with Keras
Putting neural networks under the microscope
Researchers pinpoint the “neurons” in machine-learning systems that capture specific linguistic features during language-processing tasks.
Finding Kafka’s throughput limit in Dropbox infrastructure
Hacker News discussion on Apache Flink
“Apache Flink, Flume, Storm, Samza, Spark, Apex, and Kafka all do basically the same thing. I feel like this is a bit overboard. And this is before we talk about the non-Apache stream-processing frameworks out there.”