Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
In the past weeks: large language models continue to grow, with libraries to deploy “autonomous” agents being all the rage now. Meta releases SAM: a zero-shot computer vision model. And Twitter AI experts continue to debate whether and how AI is or will be dangerous, and how it should or should not be aligned, with the general media enjoying the new drama to be farmed for content.
- Segment Anything Model (SAM): a new AI model from Meta AI that can “cut out” any object, in any image, with a single click
This is wild: “SAM has learned a general notion of what objects are — this understanding enables zero-shot generalization to unfamiliar objects and images without requiring additional training.” - AI is entering an era of corporate control
“A new report on AI progress highlights how state-of-the-art systems are now the domain of Big Tech companies. It’s these firms that now get to decide how to balance risk and opportunity in this fast-moving field.” - We need a much more sophisticated debate about AI
“Twentieth-century ways of thinking will not help us deal with the huge regulatory challenges the technology poses” - $335,000 Pay for ‘AI Whisperer’ Jobs Appears in Red-Hot Market
This is hilarious: “The fast-growing apps have created a seller’s market for anyone — even liberal arts grads — capable of manipulating its output.” - fast.ai course part 2: implement the astounding Stable Diffusion algorithm from scratch
Lots of topics covered in the latest edition of fast.ai’s course! - LangChain
LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an API, but will also: be data-aware and be agentic. - AgentGPT
AutoGPTs are all the rage now: configure and deploy Autonomous AI agents. - Automatic Chain of Thought Prompting in Large Language Models
- Why think step-by-step? Reasoning emerges from the locality of experience
- Can GPT 4 Prompt Itself? MemoryGPT, AutoGPT, Jarvis, Claude-Next [10x GPT 4!] and more… (video)
- Generative Agents: Interactive Simulacra of Human Behavior
A team from Stanford & Google describe a group of generative agents that simulate human-behaviour. Based on LLMs, the agents autonomously generate their own behaviour – must read paper! - Building LLM applications for production
“A question that I’ve been asked a lot recently is how large language models (LLMs) will change machine learning workflows.” - A Recipe for Training Large Models
“When needing an AI model, you should always start by seeing if there is already an existing one that satisfies your needs and aim for the smallest possible model…” - Automatic Gradient Descent
Deep learning without hyperparameters… - Graph classification with Transformers
Explore how you can do graph classification using the Transformers library - Edge AI Just Got Faster
“We modified llama.cpp to load weights using mmap() instead of C++ standard I/O. That enabled us to load LLaMA 100x faster using half as much memory.” - Boosting Tabular Data Predictions with Large Language Models
What happens when you unleash GPT-4 on a tabular Kaggle competition to predict home prices? - Towards Reinforcement Learning with AI Feedback (RLAIF)
What open-sourced foundation models, instruction tuning, and other recent events mean for the future of AI - A New Approach to Computation Reimagines Artificial Intelligence
“By imbuing enormous vectors with semantic meaning, we can get machines to reason more abstractly — and efficiently — than before.” - How Randomness Improves Algorithms
“Unpredictability can help computer scientists solve otherwise intractable problems.” - From Deep to Long Learning?
There’s been a lot of great work on scaling Transformers to longer sequences, but many of them seem to sacrifice accuracy. - Defamed by ChatGPT: My Own Bizarre Experience with Artificiality of “Artificial Intelligence”
ChatGPT appears to have manufactured baseless accusations against professors. - MLOps is Mostly Data Engineering.
“We don’t need MLOps engineers, we need tools that will allow ML Engineers to package their work in a way that the platform and release engineers will be able to consume and produce the artifacts needed for the product engineers to integrate into the product.” - tidyverse 2.0.0 is released
There’s only really one big change in tidyverse 2.0.0: lubridate is now a core member of the tidyverse