Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- What Data Folk Were Saying about Zillow
NOT about what happened AT Zillow, but instead about the discussion that lots of professional numbers people, data scientists, risk managers, finance, quants, etc have to say about and around the situation - Zillow, Prophet, Time Series, & Prices
“So I made a mildly controversial tweet.” - A Non-Technical Guide to Interpreting SHAP Analyses
With interpretability becoming an increasingly important requirement for machine learning projects, there’s a growing need for the complex outputs of techniques such as SHAP to be communicated to non-technical stakeholders. - Avoiding Data Disasters
Things can go disastrously wrong in data science and machine learning projects when we undervalue data work, use data in contexts that it wasn’t gathered for, or ignore the crucial role that humans play in the data science pipeline. - How and why we built a custom gradient boosted-tree package
In order to make accurate and fast travel-time predictions, Lyft built a gradient boosted tree (GBT) package from the ground up. It is slower to train than off-the-shelf packages, but can be customized to treat space and time more efficiently and yield less volatile predictions. - Machine learning is not nonparametric statistics
Not only can I never get a consistent definition of what “nonparametric” means, but the jump from statistics to machine learning is considerably larger than most expect. - I’ve Stopped Using Box Plots. Should You?
“After having explained how to read box plots to thousands of workshop participants, I now believe that they’re poorly conceived” - Coloring your ggplot2 art like a pro
ggplot2 makes strong assumptions about how many different color scales there can be in a plot (exactly one) and how color palettes should be designed - The Breathing K-Means Algorithm
The Breathing K-Means is an approximation algorithm for the k-means problem that (on average) is better (higher solution quality) and faster (lower CPU time usage) than k-means++. - Improving a Machine Learning System
Making measurable improvements to a mature machine learning system is extremely difficult. In this post, we will explore why. - Hierarchical Transformers Are More Efficient Language Models (paper)
“These large language models are impressive but also very inefficient and costly, which limits their applications and accessibility. We postulate that having an explicit hierarchical architecture is the key to Transformers that efficiently handle long sequences.” - Large Language Models: A New Moore’s Law?
Yet, should we be excited about this mega-model trend? I, for one, am not. Here’s why. - Just Ask for Generalization
Generalizing to what you want may be easier than optimizing directly for what you want. We might even ask for “consciousness”. - Ask Delphi
Delphi is a research prototype designed to investigate the promises and more importantly, the limitations of modeling people’s moral judgments on a variety of everyday situations. - Where are China’s Recent AI Ethics Guidelines Coming From?
Examining Three Guidelines from the Ministry of Science and Technology’s Recent Document in Context - The theory-practice gap
I want to claim that most of the action in theoretical AI alignment is people proposing various ways of getting around these problems by having your systems do things that are human understandable instead of doing things that are justified by working well. - AI can see through you: CEOs’ language under machine microscope
CEOs and other managers are increasingly under the microscope as some investors use artificial intelligence to learn and analyse their language patterns and tone, opening up a new frontier of opportunities to slip up. - A Gentle Introduction to Graph Neural Networks
Neural networks have been adapted to leverage the structure and properties of graphs. We explore the components needed for building a graph neural network – and motivate the design choices behind them. - Bayesian histograms for rare event classification
Bayesian histograms are a stupidly fast, simple, and nonparametric way to find how rare event probabilities depend on a variable (with uncertainties!). - AI Recognises Race in Medical Images
Previous studies have shown that AI can predict your sex and age from looking at an eye scan, or your race from a chest X-ray. - DeepMind AI predicts incoming rainfall with high accuracy
Having flexed its muscles in predicting kidney injury, toppling Go champions and solving 50-year-old science problems, artificial intelligence company DeepMind is now dipping its toes in weather forecasting. - ClickHouse vs TimescaleDB