Web Picks (week of 7 January 2025)

Posted on February 5, 2025

Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

How I program with LLMs
“This document is a summary of my personal experiences using generative models while programming over the past year.”
Things we learned about LLMs in 2024
“A lot has happened in the world of Large Language Models over the course of 2024. Here’s a review of things we figured out about the field in the past twelve months, plus my attempt at identifying key themes and pivotal moments.”
Best Data Visualization Projects of 2024
“These are my favorite data visualization projects from 2024.”
Converting handwritten, maths-heavy lecture notes to markdown using large language models
“Recent Large Language Models (LLMs) have stunning vision capabilities and so it occurred to me that they might be able to convert even old notes into beautifully formatted markdown and equations.”
AIs Will Increasingly Fake Alignment
“In a series of experiments with Redwood Research, we found that Claude often pretends to have different views during training, while actually maintaining its original preferences.”
Alignment faking in large language models
A new paper from Anthropic’s Alignment Science team, in collaboration with Redwood Research, provides the first empirical example of a large language model engaging in alignment faking without having been explicitly—or even, as we argue in our paper, implicitly1—trained or instructed to do so.
OpenAI announces new o3 models
The company unveiled o3, the successor to the o1 “reasoning” model it released earlier in the year.
OpenAI o3 Breakthrough High Score on ARC-AGI-Pub
OpenAI’s new o3 system – trained on the ARC-AGI-1 Public Training set – has scored a breakthrough 75.7% on the Semi-Private Evaluation set at our stated public leaderboard $10k compute limit. A high-compute (172x) o3 configuration scored 87.5%.
OpenAI announces plan to transform into a for-profit company
OpenAI’s new structure will put its for-profit arm in control.
Alibaba slashes prices on large language models by up to 85% as China AI rivalry heats up
Alibaba Cloud, the Chinese e-commerce firm’s cloud computing division, said Tuesday it’s cutting prices on its visual language model Qwen-VL by up to 85%.
DeepSeek-AI Just Released DeepSeek-V3: A Strong Mixture-of-Experts (MoE) Language Model
activated-for-each-token/
DeepSeek-AI just gave a Christmas present to the AI world by releasing DeepSeek-V3, a Mixture-of-Experts (MoE) language model featuring 671 billion parameters, with 37 billion activated per token.
You wouldn’t download an AI
Extracting AI models from mobile apps
deepface
DeepFace is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid face recognition framework wrapping state-of-the-art models: VGG-Face, FaceNet, OpenFace, DeepFace, DeepID, ArcFace, Dlib, SFace and GhostFaceNet.
Zasper
Zasper is an IDE designed from the ground up to support massive concurrency. It provides a minimal memory footprint, exceptional speed, and the ability to handle numerous concurrent connections.
Qwen Team Releases QvQ: An Open-Weight Model for Multimodal Reasoning
QvQ is an open-weight model specifically designed for multimodal reasoning. Building on the foundation of Qwen2-VL-72B, QvQ integrates architectural improvements that enhance cross-modal reasoning. Its open-weight design underscores the team’s commitment to making advanced AI more accessible.
Trying out QvQ—Qwen’s new visual reasoning model
“I’ve tried it out with a bunch of things, with mixed results—but it’s really fun seeing how it works through a problem.”
Deliberation in Latent Space via Differentiable Cache Augmentation (paper)
“In this work, we demonstrate that a frozen LLM can be augmented with an offline coprocessor that operates on the model’s key-value (kv) cache. This coprocessor augments the cache with a set of latent embeddings designed to improve the fidelity of subsequent decoding.”
Automating the Search for Artificial Life with Foundation Models (paper)
Automated Search for Artificial Life (ASAL), (1) finds simulations that produce target phenomena, (2) discovers simulations that generate temporally open-ended novelty, and (3) illuminates an entire space of interestingly diverse simulations.