Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
First of all, all about DeepSeek…
What a wild week it has been… OpenAI’s o1 model showed us not so long ago that when LLMs are trained to do the same—by using more compute during inference—they get significantly better at solving reasoning tasks like mathematics, coding, and logic. The details behind OpenAI’s reasoning models have been kept under wraps. That is, until last week, when DeepSeek released their DeepSeek-R1 model and promptly broke the internet (and the stock market). Besides performing as well or better than o1, the DeepSeek-R1 release was accompanied by a detailed tech report that outlined the steps of their training recipe. Here’s what else you should know:
- DeepSeek’s first generation reasoning models with comparable performance to OpenAI-o1.
And available in Ollama (it runs locally) - Open-R1: a fully open reproduction of DeepSeek-R1
An initiative to systematically reconstruct DeepSeek-R1’s data and training pipeline - The Illustrated DeepSeek-R1
In this post, we’ll see how it was built. - What is the DeepSeek company?
This tweet also provides some further info. - Nvidia’s $589 Billion DeepSeek Rout Is Largest in Market History
- Run DeepSeek R1 Dynamic 1.58-bit
DeepSeek-R1 has been making waves recently by rivaling OpenAI’s O1 reasoning model while being fully open-source. We explored how to enable more local users to run it and managed to quantize DeepSeek’s R1 671B parameter model to 131GB in size, a 80% reduction in size from the original 720GB, whilst being very functional. - DeepSeek-R1 surges to the top-3 in Arena
- DeepSeek is limiting sign-ups after a ‘large-scale’ cyberattack
The “malicious attacks” come as the Chinese startup throttles tech stocks and threatens AI dominance for the U.S. - Want to understand how DeepSeek works, check out these two posts: one, two
- And DeepSeek doesn’t stop – Janus Pro is a novel autoregressive framework that unifies multimodal understanding and generation
Try it out here
The rest of the news
- Kimi 1.5
After DeepSeek R1, there’s new OpenAI o1 level model from China that outperforms Claude Sonnet 3.5 & GPT-4o. - Alignment Faking in Large Language Models
What happens when you tell Claude it is being trained to do something it doesn’t want to do? - Hunyuan3D-2
“ Living out everyone’s imagination on creating and manipulating 3D assets.” - OpenAI has upped its lobbying efforts nearly sevenfold
The firm’s spending makes clear how much it wants to shape the new rules around government AI policy. - Google is making AI in Gmail and Docs free — but raising the price of Workspace
The B2B AI wars are heating up, and Google’s trying to make sure everyone gets a taste of Gemini - Don’t use cosine similarity carelessly
This post shows you how to be more intentional about similarity and get better results. - Accurate predictions on small data with a tabular foundation model
TabPFN is a foundation model for tabular data that outperforms traditional methods while being dramatically faster. This repository contains the core PyTorch implementation with CUDA optimization. - Interactive LLM-Powered Data Processing with DocWrangler
Getting LLMs to do what you want involves constantly cycling between writing prompts, inspecting results, and refining. DocWrangler is an open source interactive development environment (IDE) that makes this process much easier - Microsoft relaunches Copilot for business with free AI chat and pay-as-you-go agents
Microsoft 365 Copilot Chat is designed to tempt more businesses into relying on and paying for AI - Infinigen
Infinigen is a procedural generator of 3D scenes, developed by Princeton Vision & Learning Lab. Infinigen is optimized for computer vision research and generates diverse high-quality 3D training data - ChatGPT can remind you to do stuff now
OpenAI is rolling out a beta feature called Tasks to ChatGPT that lets users schedule future actions and reminders. - MiniMax-01: Scaling Foundation Models with Lightning Attention
“Experiments on both standard and in-house benchmarks show that our models match the performance of state-of-the-art models like GPT-4o and Claude-3.5-Sonnet while offering 20-32 times longer context window.” - Tensor Product Attention Is All You Need
“We introduce the Tensor ProducT ATTenTion Transformer (T6), a new model architecture for sequence modeling. Through extensive empirical evaluation of language modeling tasks, we demonstrate that T6 exceeds the performance of standard Transformer baselines including MHA, MQA, GQA, and MLA across various metrics.” - Transformer2: Self-adaptive LLMs
“A novel self-adaptation framework that adapts LLMs for unseen tasks in real-time by selectively adjusting only the singular components of their weight matrices.” - FLAME: A small language model for spreadsheet formulas
“We evaluate FLAME on formula repair, formula completion, and similarity-based formula retrieval. FLAME can outperform much larger models” - Foundations of Large Language Models
“The book is structured into four main chapters, each exploring a key area: pre-training, generative models, prompting techniques, and alignment methods.” - Machine Learning in Production (free course)