Web Picks (week of 15 October 2024)

Posted on October 29, 2024

Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

Notable AI Models
“Our most comprehensive database, containing over 800 models that were state of the art, highly cited, or otherwise historically notable. It tracks key factors driving machine learning progress and includes over 400 training compute estimates.”
Movie Gen sets a new standard for immersive AI content
Meta’s latest research breakthroughs demonstrate how you can use simple text inputs to produce custom videos and sounds, edit existing videos or transform your personal image into a unique video. Crazy how fast this is going
Diffusion for World Modeling: Visual Details Matter in Atari
Diamond is a reinforcement learning agent trained entirely in a diffusion world model.
This Homemade Drone Software Finds People When Search and Rescue Teams Can’t
British Mountain Rescue workers have developed an automated drone system that can scour a landscape far quicker and more thoroughly than human eyes.
Qdrant Plays Mario Kart 64
An Image Search application using Vector Databases
Google’s AI thinks I left a Gatorade bottle on the moon
Google’s NotebookLLM is really good. But it’s also really easy to trick.
Do Large Language Models make accurate personalized recommendations?
Adding Large Language Models to Kumo’s Graph Transformer for Improved Recommendations
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models (paper)
Researchers from Apple come out with a banger: “We hypothesize that this decline is because current LLMs cannot perform genuine logical reasoning; they replicate reasoning steps from their training data. Adding a single clause that seems relevant to the question causes significant performance drops (up to 65%) across all state-of-the-art models”
FLUX is fast and it’s open source
“We used Alex Redden’s flux-fp8-api as a starting point, then optimized it with torch.compile and used fast CuDNN attention kernels in the nightly Torch builds.”
Training Diffusion Transformers Is Easier Than You Think
The performance of generative diffusion models can be improved dramatically when they are supported by an external high-quality representation from another model, such as a self-supervised visual encoder.
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA (paper)
RAG systems are a standard method for retrieving information from documents. But sometimes it may be more convenient and effective to just use long-context LLMs, as this paper argues.
supervision – democratising computer vision
“Whether you need to load your dataset from your hard drive, draw detections on an image or video, or count how many detections are in a zone. You can count on us!”
Running Llama locally with minimal dependencies
“I was a bit surprised Meta didn’t publish an example way to simply invoke one of these LLM’s with only torch (or some minimal set of dependencies)”
Aria
Aria is a multimodal native MoE model
MACHINA
A CCTV viewer with realtime object tagger
Grokking at the Edge of Linear Separability (paper)
“We study the generalization properties of binary logistic classification in a simplified setting, for which a “memorizing” and “generalizing” solution can always be strictly defined, and elucidate empirically and analytically the mechanism underlying Grokking in its dynamics” – very interesting!
Differential Transformer (paper)
“Experimental results on language modeling show that Diff Transformer outperforms Transformer in various settings of scaling up model size and training tokens. More intriguingly, it offers notable advantages in practical applications, such as long-context modeling, key information retrieval, hallucination mitigation, in-context learning, and reduction of activation outliers”
Were RNNs All We Needed? (paper)
“By removing their hidden state dependencies from their input, forget, and update gates, LSTMs and GRUs no longer need to BPTT. Building on this, we introduce minimal versions (minLSTMs and minGRUs) that (1) use significantly fewer parameters than their traditional counterparts and (2) are fully parallelizable during training (175x faster for a sequence of length 512). Lastly, we show that these stripped-down versions of decade-old RNNs match the empirical performance of recent sequence models”
Alternatives to cosine similarity
Cosine similarity is the recommended way to compare vectors, but what other distance functions are there? And are any of them better?
How the GapEncoder works (video)
The GapEncoder is an estimator from the skrub library that can do feature generation and topic modelling at the same time. Being able to do both is great for utility, but it also comes with some benefits for accuracy.
Hierarchical Navigable Small World: a scalable nearest neighbor search
What is a “Small World” (SW) graph? It’s graph structure which achieves a unique balance between regularity and randomness
Addition is All You Need for Energy-efficient Language Models (paper)
“Our numerical analysis experiments agree with the theoretical error estimation, which indicates that L-Mul with 4-bit mantissa achieves comparable precision as float8_e4m3 multiplications, and L-Mul with 3-bit mantissa outperforms float8_e5m2”
The Second edition of Geocomputation with R is complete
And it’s completely free to read
The PyData 2024 Challenge – Winner did some clever hacking (video)
The goal of this video is to highlight and explain some of the winning techniques