Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
Focus: NLP
A lot has been going on in the world of data science and AI. There is an arms race going on in NLP, with lots of algorithms competing against each other.
- It started with fast.ai introducing ULMFiT: a new method to classify documents by making heavy use of transfer learning
- This is turn was followed by ELMo: which constructs representation that are contextual, deep and character based
- Google then introduced BERT (Bidirectional Encoder Representations from Transformers), a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks (blog post)
- This was followed by Microsoft’s MT-DNN, outperforming Google BERT
- OpenAI then decided not to release their GPT-2 large scale model for fake text generation because it was deemed too dangerous, followed by lots of discussion: http://deliprao.com/archives/314, https://thegradient.pub/openai-please-open-source-your-language-model/, http://approximatelycorrect.com/2019/02/17/openai-trains-language-model-mass-hysteria-ensues/, https://arstechnica.com/information-technology/2019/02/twenty-minutes-into-the-future-with-openais-deep-fake-text-ai/
- For a good summarizing read, this post from AllenNLP (a new NLP framework which has been rapidly making the rounds) is a good starting point
Focus: data and society
- ‘You can track everything’: the parents who digitise their babies’ lives
Socks that record heart rate and cots that mimic the womb might promise parents peace of mind – but is the data given to tech firms a fair exchange? - China bans 23m from buying travel tickets as part of ‘social credit’ system
People accused of social offences blocked from booking flights and train journeys - AI Safety Needs Social Scientists
“Properly aligning advanced AI systems with human values will require resolving many uncertainties related to the psychology of human rationality, emotion, and biases. These can only be resolved empirically through experimentation — if we want to train AI to do what humans want, we need to study humans.” - Nadella: Microsoft will sell war tech to democracies to “protect freedoms”
A growing number of employees feel that the military project crosses a line. - Life and society are increasingly governed by numbers
When everything is quantified, power accrues to whoever is keeping score - Algorithmic Justice Could Clear 250,000 Convictions in California
- When Algorithms Think You Want to Die
- Don’t Let Robots Pull the Trigger
Weapons that kill enemies on their own threaten civilians and soldiers alike - Farmworker vs Robot
Agricultural workers of the future may soon be made of tech and steel. Can a robot pick a strawberry better, faster, and cheaper than a seasonal farmworker? - How did the police know you were near a crime scene? Google told them
- Google and Microsoft Warn That AI May Do Dumb Things
- Artificial intelligence, algorithmic pricing, and collusion
- China’s Tech Firms Are Mapping Pig Faces
As a devastating disease afflicts the country’s swine, companies are scrambling to roll out facial and voice recognition and other unproven ways to save them.
Focus: on management and other thoughts
- Seven Myths in Machine Learning Research
- Machine learning can boost the value of wind energy
- Data science is different now
- Facebook’s chief AI scientist: Deep learning may need a new programming language
- SQL: One of the Most Valuable Skills
- Machine Learning for Everyone
In simple words. With real-world examples - Don’t believe the hype: the media are unwittingly selling us an AI fantasy
Journalists need to stop parroting the industry line when it comes to artificial intelligence
Focus: reinforcement learning
- Introducing PlaNet: A Deep Planning Network for Reinforcement Learning
- Long-Range Robotic Navigation via Automated Reinforcement Learning
- Controlling a 2D Robotic Arm with Deep Reinforcement Learning
Focus: tools and other news
- xforest: A super-fast and scalable Random Forest library based on fast histogram decision tree algorithm and distributed bagging framework
- Interpretable Machine Learning: A Guide for Making Black Box Models Explainable
Christoph Molnar has been wrapping up his fantastic book! - Lingvo: A TensorFlow Framework for Sequence Modeling
- MLJ: A Julia machine learning framework
- A Python implementation of LightFM, a hybrid recommendation algorithm
- Microsoft AutoML: An open source AutoML toolkit for neural architecture search and hyper-parameter tuning
- This Waifu Does Not Exist
“I describe how I made the website ThisWaifuDoesNotExist.net (TWDNE) for displaying random anime faces generated by StyleGAN neural networks, and how it went viral.” - Best Deep Learning Books: Updated for 2019
- XKCD-style plots in Matplotlib (an oldie from 2012 that has resurfaced)
- Beyond Local Pattern Matching: Recent Advances in Machine Reading
- Amazon Personalize
Real-time personalization and recommendation, based on the same technology used at Amazon.com - AdaBound: An optimizer that trains as fast as Adam and as good as SGD
- Photo-Sketching: Inferring Contour Drawings from Images
- Forecasting in Python with Prophet
Forecasting is often considered a natural progression from reporting. Reporting helps us answer, what happened? Forecasting helps answer the next logical question, what will happen? - The why and how of nonnegative matrix factorization
- Audio AI: isolating vocals from stereo music using Convolutional Neural Networks
- The Unreasonable Effectiveness of Deep Feature Extraction
- Terrapattern
“This is the alpha version of Terrapattern, a visual search tool for satellite imagery. The project provides journalists, citizen scientists, and other researchers with the ability to quickly scan large geographical regions for specific visual features.” - Introducing Ludwig, a Code-Free Deep Learning Toolbox
- Datashader: Turns even the largest data into images, accurately
- Spektral: Deep learning on graphs with Keras
- Putting neural networks under the microscope
Researchers pinpoint the “neurons” in machine-learning systems that capture specific linguistic features during language-processing tasks. - Finding Kafka’s throughput limit in Dropbox infrastructure
- Hacker News discussion on Apache Flink
“Apache Flink, Flume, Storm, Samza, Spark, Apex, and Kafka all do basically the same thing. I feel like this is a bit overboard. And this is before we talk about the non-Apache stream-processing frameworks out there.”