Web Picks (week of 2 December 2024)

Posted on January 7, 2025

Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

llama.cpp guide – Running LLMs locally, on any hardware, from scratch
A fantastic guide to get started with LLMs!
Multimodal interpretability in 2024
“This post emphasizes mechanistic and causal interpretability, in contrast to “traditional” interpretability methods such as saliency maps, visualizations, and input-based techniques.”
2:4 Sparse Llama: Smaller Models for Efficient GPU Inference
“Sparse models, though underexplored in the LLM space due to the high compute demands of pretraining, offer an increasingly promising dimension in model compression and efficiency.”
Alibaba releases an ‘open’ challenger to OpenAI’s o1 reasoning model
A new so-called “reasoning” AI model, QwQ-32B-Preview, has arrived on the scene. It’s one of the few to rival OpenAI’s o1, and it’s the first available to download under a permissive license.
An Intuitive Explanation of Sparse Autoencoders for LLM Interpretability
Machine learning models and LLMs are becoming more powerful and useful, but they are still black boxes, and we don’t understand how they do the things that they are capable of. It seems like it would be useful if we could understand how they work.
World Labs Image to 3d world
“Today we’re sharing our first step towards spatial intelligence: an AI system that generates 3D worlds from a single image. This lets you step into any image and explore it in 3D.” Tip: bypass the “Out of bounds” message by setting a Javascript breakpoint after `let t = JSON.parse(d[e].config_str)` and then run `Object.values(t.camera.presets).map(o=>o.max_distance=50&&o)` in the console.
What Is the Best Topology of Them All?
“HammingMesh: A Network Topology for Large-Scale Deep Learning” proposes a novel network topology that provides high bandwidth at low cost for deep learning training jobs.
Unlocking the power of time-series data with multimodal models
“We compare the performance of multimodal models on the understanding of time-series data when presented visually as plots compared to numerical values. We find significant performance improvements when presented with plots on tasks like fall detection.”
Google Labs GenChess
Though European users will need to use a VPN :(
When Machine Learning Tells the Wrong Story
“The paper also has two competing stories: one about how machine learning models can be used to attack web browsers, and another about how these same models are often misunderstood, leading them to be applied incorrectly. But there’s also a third story embedded in this paper, about how this paper completely altered the trajectory of my life.”
URAvatar: Universal Relightable Gaussian Codec Avatars
Our model is a high-fidelity Universal prior for Relightable Avatars. You can create URAvatar (Your Avatar) from a phone scan.
Pushing the frontiers of audio generation
SoundStream is a neural audio codec that efficiently compresses and decompresses an audio input, without compromising its quality. AudioLM treats audio generation as a language modeling task to produce the acoustic tokens of codecs like SoundStream.
Nous DisTrO
“Training large scale neural networks typically involves sharing gradients between all accelerators, which necessitates specialized, high-speed interconnects. To address this, we introduce DisTrO, a family of architecture-agnostic and network-agnostic distributed optimizers that reduces the inter-GPU communication requirements by four to five orders of magnitude without relying on amortized analysis.”
Flow: A lightweight task engine for building AI agents that prioritizes simplicity and flexibility.
“Flow is lightweight, bloat-free, and has no external dependencies for the engine. It is designed to be simple, flexible and very powerful, and is maintained by the Laminar team.”
SmolLM2
SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device.
NeuralDEM – Real-time Simulation of Industrial Particulate Flows
NeuralDEM presents an end-to-end approach to replace Discrete Element Method (DEM) routines and coupled multiphysics simulations with deep learning surrogates.
AutoFlow
An open source GraphRAG (Knowledge Graph) built on top of TiDB Vector and LlamaIndex and DSPy.
Foursquare’s new POI Dataset
Foursquare just released an open dataset of over 100M global places of interest.
Large Language Models as Markov Chains (paper)
“We draw an equivalence between generic autoregressive language models with vocabulary of size T and context window of size K and Markov chains defined on a finite state space of size O(T^K).”
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models (paper)
“We study what kind of generalisation strategies LLMs employ when performing reasoning tasks by investigating the pretraining data they rely on.”
Accelerated AI Inference via Dynamic Execution Methods (paper)
“In the case of LLMs, we provide more efficient sampling methods that depend on the complexity of the data. In the case of diffusion model generation, we provide a new method that also leverages the difficulty of the input prompt to predict an optimal early stopping point.”