Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- Scientists are using AI to dream up revolutionary new proteins
Huge advances in artificial intelligence mean researchers can design completely original molecules in seconds instead of months. - A Short Chronology Of Deep Learning For Tabular Data
“I want to emphasize that no matter how interesting or promising deep tabular methods look, I still recommend using a conventional machine learning method as a baseline.” - ACT-1: Transformer for Actions
“ACT-1 is a large-scale Transformer trained to use digital tools — among other things, we recently taught it how to use a web browser.” - Meta moves PyTorch to Linux Foundation
Facebook parent Meta is shifting its PyTorch AI tools, which are already available under open source license, to an outside governance model overseen by a new independent board under the auspices of the Linux Foundation. - Digitizing Smell: Using Molecular Maps to Understand Odor
“The embedding space of this model contains a representation of each molecule as a fixed-length vector describing that molecule in terms of its odor, much as the RGB value of a visual stimulus describes its color.” - Causality and its implications for supervised learning
In this note I will try to explain when and why supervised learning fails because of causality-related effects. I will present it through the prism of probabilistic modelling, which I find to be a very efficient framework for analyzing ML problems. - Why is Vector Search so fast?
Vector Search engines can run semantic queries on multi-million datasets in milliseconds. How is that possible? - Stable Diffusion web UI
A browser interface based on Gradio library for Stable Diffusion. - Have I Been Trained?
Search 5.8 billion images used to train popular AI art models - YouTube Taps Machine Learning to Convert Landscape Videos to Square, Vertical Formats
Its ML model detects key elements in the video—faces, key objects, logos, motion, text—and breaks that video into scenes, ensuring that those key elements appear properly in the reformatted video. - From Deep Learning Foundations to Stable Diffusion
fast.ai’s upcoming course will take you from the foundations of neural networks to stable diffusion. - MLJAR Automated Machine Learning for Humans
The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. It is designed to save time for a data scientist. - On Device Learning
“Before asking the question, how do I build AGI, you first must ask the question, what would I recognize as AGI?” - A huge gallery of visualization tools
Hundreds and hundreds of them. - Efficient data visualization with faded raincloud plots
Fadeclouds, anyone? - Better models by measuring
Guild AI brings systematic control to machine learning to help you build better models faster. It’s freely available under the Apache 2.0 open source license. - Time Series Forecasting with the Temporal Fusion Transformer
In Time Series Forecasting, deep learning neural networks have just lately outperformed conventional techniques and have done so by a smaller margin than in image and language processing. - FairGBM
FairGBM is an easy-to-use and lightweight fairness-aware ML algorithm with state-of-the-art performance on tabular datasets. FairGBM builds upon the popular LightGBM algorithm and adds customizable constraints for group-wise fairness (e.g., equal opportunity, predictive equality) and other global goals (e.g., specific Recall or FPR prediction targets). - DeepDPM: Deep Clustering With An Unknown Number of Clusters
“DeepDPM is a nonparametric deep-clustering method which unlike most deep clustering methods, does not require knowing the number of clusters, K; rather, it infers it as a part of the overall learning. Using a split/merge framework to change the clusters number adaptively and a novel loss, our proposed method outperforms existing (both classical and deep) nonparametric methods.” - A Friendly(-ish?) Introduction to Neural Representations of Uncertainty
Years of anatomical and experimental evidence showed that the brain maintains an internal generative model of the world. - Causal Forecasting at Lyft
“We require a mechanism for evaluating counterfactual historical decisions, so we may be assured of the model’s decisioning capabilities at least retrospectively.” - Layer Recycling and Fine-tuning Efficiency
“A recent paper by Allen AI has attracted attention in the NLP community as they cache the output of a certain intermediate layer in the training and inference phases to achieve a speedup of ~83% with a negligible loss in model performance.” - Oxidizing Machine Learning
The Rust ML community is alive, and doing well. - pythae
This library implements some of the most common (Variational) Autoencoder models under a unified implementation. - How to Visualize a Graph with a Million Nodes
Meet Cosmograph — the fastest network graph visualization tool that works in the browser. It’s capable of visualizing networks that have a million nodes and edges, and that’s not the limit. - Game Emulation via Neural Network
“Although this looks like a video game, I did not write any game code. This program is actually a neural network mimicking a video game.”