Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- NotebookLM’s automatically generated podcasts are surprisingly effective
Audio Overview is a fun new feature of Google’s NotebookLM which is getting a lot of attention right now. - Llama 3.2: Revolutionizing edge AI and vision with open, customizable models
… but not all models are available in the EU. - OpenAI to remove non-profit control and give Sam Altman equity
OpenAI is working on a plan to restructure its core business into a for-profit benefit corporation that will no longer be controlled by its non-profit board - PerpetualBooster is a gradient boosting machine (GBM) algorithm which doesn’t need hyperparameter optimization unlike other GBM algorithms
“Similar to AutoML libraries, it has a budget parameter. Increasing the budget parameter increases the predictive power of the algorithm and gives better results on unseen data.” - Getting started with generative art
There are many reasons why learning about generative art can be beneficial for data scientists. - Generalized Additive Models (GAMs) for Meta-Regression
Meta-regression is a modeling approach used in meta-analysis to see how effect sizes vary across studies. - EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer
EzAudio is an advanced text-to-audio (T2A) generation model that creates high-quality audio from text prompts. - Bias/Variance is not the same as Approximation/Estimation
“It is commonly stated that they are “closely related”, or “similar in spirit”. However, sometimes it is said they are equivalent. In fact they are different, but have subtle connections cutting across learning theory, classical statistics, and information geometry” - Fast TRAC
A Parameter-free Optimizer for Lifelong Reinforcement Learning - Jupyter Scatter
An interactive scatter plot widget for Jupyter Notebook, Lab, and Google Colab that can handle millions of points and supports view linking. - rerankers: A Lightweight Python Library to Unify Ranking Methods
“Re-ranking is an integral component of many retrieval pipelines; however, there exist numerous approaches to it. rerankers is a Python library which provides a simple, easy-to-use interface to all commonly used re-ranking approaches.” - thepi.pe
“Extract clean markdown from PDFs URLs, slides, videos, and more, ready for any LLM” - Buckaroo – The Data Table for Jupyter
Buckaroo is a modern data table for Jupyter that expedites the most common exploratory data analysis tasks. - Agentic Patterns
Implementing the main agentic patterns using Groq – no LangChain, no LangGraph, no LlamaIndex, no CrewAI. Pure and simple API calls to Groq. - uv
A single tool to replace pip, pip-tools, pipx, poetry, pyenv, virtualenv, and more.