Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- Bart’s new video on Benford’s law for Fraud Analytics (video)
- Data Science at Spotify Camp Nou
How data powers ticketing at FC Barcelona’s iconic stadium - Using graph neural networks to recommend related products
Dual embeddings of each node, as both source and target, and a novel loss function enable 30% to 160% improvements over predecessors. - How Transformers Seem to Mimic Parts of the Brain
Neural networks originally designed for language processing turn out to be great models of how our brains understand places. - How undesired goals can arise with correct rewards
Exploring examples of goal misgeneralisation – where an AI system’s capabilities generalise but its goal doesn’t - Why the Future of Open Source AI is So Much Bigger Than Stable Diffusion 1.5 and Why It Matters to You
“There is a reason we’ve taken a step back at Stability AI and chose not to release version 1.5 as quickly as we released earlier checkpoints. We also won’t stand by quietly when other groups leak the model in order to draw some quick press to themselves while trying to wash their hands of responsibility.” - AI will replace middle management before robots replace hourly workers
AI managers are the missing link between the when and how of implementing retail level automation. - AI Data Laundering: How Academic and Nonprofit Researchers Shield Tech Companies from Accountability
In addition to a massive chunk of Shutterstock’s video collection, Meta is also using millions of YouTube videos collected by Microsoft to make its text-to-video AI. - NVIDIA’s Implicit Warping Is a Potentially Powerful Deepfake Technique
In fact, Implicit Warping has extraordinary potential to create hyper-realistic deepfake motion, to an extent that none of its predecessors have been equipped to do. - This Danish Political Party Is Led by an AI
The Synthetic Party in Denmark is dedicated to following a platform churned out by an AI, and its public face is a chatbot named Leader Lars. - TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second
… as long as your data set is small. This release was all the rage but various people have since claimed it is not so good as it seems. - TabLLM: Few-shot Classification of Tabular Data with Large Language Models
There’s a lot of attention being given to making deep learning work on tabular data. - You’re probably monitoring your models wrong
“Model monitoring is a necessary (but not sufficient) part of operating production models. In this post, we’ll explore what actually works for model monitoring.” - A Look at Tesla’s Occupancy Networks
“In 2022, Tesla announced a brand new algorithm about to be released in their vehicles. This algorithm is named occupancy networks, and it is supposed to improve the HydraNets from Tesla. But how does it work?” - An Introduction to Poisson Flow Generative Models
“Poisson Flow Generative Models (PFGMs) are a new type of generative Deep Learning model, taking inspiration from physics much like Diffusion Models. Learn the theory behind PFGMs and how to generate images with them in this easy-to-follow guide.” - CNNs—not transformers—now dominate the hardest sequence modeling benchmark.
“In short, there’s a simple CNN that crushes every transformer on a variety of sequence modeling tasks. And the only other methods to come close on this benchmark—state space models—are also CNNs under the hood.” Also see the paper. - MLOps for Foundation Models: Whisper and Metaflow
“Whisper is a new open-source system for audio-to-text and text-to-text learning tasks by OpenAI. We show how to use large models like Whisper in a production-grade workflow with Metaflow.” - LangChain
A Python library for building applications with language models through composability. - Explainpaper
An online language model that explains research papers. - Large language models are different (presentation)
Interesting presentation from Google Brain - Can language models learn from explanations in context? (paper)
“We therefore investigate whether explanations of few-shot examples can help LMs. We find that explanations can improve performance — even without tuning.” - DistilBERT Classifier as Feature Extractor
In this feature-based approach, we are using the embeddings from a pretrained transormer to train a random forest and logistic regression model in scikit-learn. - Charts.css
A CSS data visualization framework