Web Picks (week of 12 August 2024)

Posted on August 23, 2024

Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

Meta introduces Segment Anything Model 2 (SAM 2)
SAM 2 is a segmentation model that enables fast, precise selection of any object in any video or image. The results are spectacular.
OpenAI is introducing Structured Outputs in the API
“We are introducing Structured Outputs in the API—model outputs now reliably adhere to developer-supplied JSON Schemas.”
FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention
“We provide a flexible API that allows implementing many attention variants in a few lines of idiomatic PyTorch code.”
Questionable practices in machine learning
Fantastic paper: “Evaluating modern ML models is hard. The strong incentive for researchers and companies to report a state-of-the-art result on some metric often leads to questionable research practices (QRPs): bad practices which fall short of outright research fraud. We describe 43 such practices which can undermine reported results, giving examples where possible.”
AI models collapse when trained on recursively generated data
“We demonstrate that it must be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of LLM-generated content in data crawled from the Internet.”
How I Use “AI”
“Most of the people online I find who talk about LLM utility are either wildly optimistic, and claim all jobs will be automated within three years, or wildly pessimistic, and say they have contributed nothing and never will.”
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
“Can we use foundation models to automate the entire process of research itself?”
Brands should avoid [AI]. It’s turning off customers
“Even as tech giants pour billions of dollars into what they herald as humanity’s new frontier, a recent study shows that tacking the “AI” label on products may actually drive people away.”
Artificial Intelligence Gives Weather Forecasters a New Edge
“The brainy machines are predicting global weather patterns with new speed and precision, doing in minutes and seconds what once took hours.”
Predicting social science experimental results using LLMs
This is wild. Turns out LLMs are good enough to replicate social experiments.
TinyML: Why the Future of Machine Learning is Tiny and Bright
“Early TinyML solutions consisted of simple ML models like decision trees and SVMs. But since then, the field has progressed to deep-learning-based TinyML models, typically small, efficient convolutional neural networks.”
VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
This paper presents a novel method for building scalable 3D generative models utilizing pre-trained video diffusion models.
Introducing Stable Fast 3D: Rapid 3D Asset Generation From Single Images
“Stable Fast 3D generates high-quality 3D assets from a single image in just 0.5 seconds.”
ceLLama
ceLLama is a streamlined automation pipeline for cell type annotations using large-language models (LLMs).
LLM-Aided OCR Project
“By leveraging cutting-edge natural language processing techniques and large language models (LLMs), this project transforms raw OCR text into highly accurate, well-formatted, and readable documents.”
oTranscribe
A free web app to take the pain out of transcribing recorded interviews.