Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku
“Developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text” - Large Language Models Reflect the Ideology of their Creators
“The results suggest that LLMs do indeed reflect the ideological biases of their creators, raising important questions about the societal impact of these models.” - Cerebras Inference now 3x faster: Llama3.1-70B breaks 2,100 tokens/s
“We’ve re-written or optimized the most critical kernels such as MatMul, reduce/broadcast, element wise ops, and activations” - Introducing quantized Llama models with increased speed and a reduced memory footprint
“Today, we’re releasing our first lightweight quantized Llama models that are small and performant enough to run on many popular mobile devices” - Microsoft BitNet
bitnet.cpp is the official inference framework for 1-bit LLMs (e.g., BitNet b1.58). It offers a suite of optimized kernels, that support fast and lossless inference of 1.58-bit models on CPU (with NPU and GPU support coming next). - A return to hand-written notes by learning to read and write
“We present a model to convert photos of handwriting into a digital format that reproduces component pen strokes, without the need for specialized equipment.” - Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Exploring the Frontiers of Efficient Generative Foundation Models - Amortized Planning with Large-Scale Transformers: A Case Study on Chess
“This paper uses chess, a landmark planning problem in AI, to assess transformers’ performance on a planning task where memorization is futile — even at a large scale” - State-space models can learn in-context by gradient descent
“Deep state-space models (Deep SSMs) have shown capabilities for in-context learning on autoregressive tasks, similar to transformers. However, the architectural requirements and mechanisms enabling this in recurrent networks remain unclear. This study demonstrates that state-space model architectures can perform gradient-based learning and use it for in-context learning” - Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers
“We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles — because the prevailing definition of bias does not capture differences in the expressivity of the hypothesis classes formed by trees and forests” - JAX – Why is Everyone So Excited About This Framework
“JAX was developed by Deepmind to meet a simple goal which is to balance rapid prototyping, quick iteration with the ability to deploy experiments at scale”