Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- AI-Implanted False Memories
Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews - Farewell pandas, and thanks for all the fish
“pandas was the best option at the time, and it allowed new users to try out Ibis. But, it never fit well into the model of data analysis that Ibis strives for” - What’s Really Going On in Machine Learning? Some Minimal Models
“The basic structure of neural networks can be pretty simple. But by the time they’re trained up with all their weights, etc. it’s been hard to tell what’s going on—or even to get any good visualization of it. Well, what I’m going to try to do here is to get “underneath” this—and to “strip things down” as much as possible.” - Diffusion is spectral autoregression
A bit of signal processing swiftly reveals that diffusion models and autoregressive models aren’t all that different: diffusion models of images perform approximate autoregression in the frequency domain! - Police officers are starting to use AI chatbots to write crime reports. Will they hold up in court?
Oklahoma City’s police department is one of a handful to experiment with AI chatbots to produce the first drafts of incident reports - Anthropic publishes the ‘system prompts’ that make Claude tick
“The prompt for Claude 3 Opus, for instance, says that Claude is to appear as if it “[is] very smart and intellectually curious,” and “enjoys hearing what humans think on an issue and engaging in discussion on a wide variety of topics.” It also instructs Claude to treat controversial topics with impartiality and objectivity, providing “careful thoughts” and “clear information” — and never to begin responses with the words “certainly” or “absolutely.”” - How I Use “AI”
“Most of the people online I find who talk about LLM utility are either wildly optimistic, and claim all jobs will be automated within three years” - Diffusion Models Are Real-Time Game Engines
‘Can it run Doom?’ Diffusion models can, it turns out - Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs
We introduce Splatt3R, a feed-forward model that can directly predict a 3D Gaussian Splat from a stereo pair of input images with unknown camera parameters. - Vincent D. Warmerdam – Scikit-Learn can do THAT?! (video)
Many of us know scikit-learn for it’s ability to construct pipelines that can do .fit().predict(). It’s an amazing feature for sure. But once you dive into the codebase … you realise that there is just so much more. - supertree – Interactive Decision Tree Visualization
supertree is a Python package designed to visualize decision trees in an interactive and user-friendly way within Jupyter Notebooks - A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals
“In this paper, we present empirical evidence of skills and directed exploration emerging from a simple RL algorithm long before any successful trials are observed.” - Loss Rider (for fun)
Finally, a Python plotting library that can (only) output Line Rider maps!