Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
Crazy Times for Large Language Models
Things went so fast during 2023 that it’s almost impossible to keep up. Here’s a short recap: at the end of last year, OpenAI released ChatGPT. Apart from heavily buying into ChatGPT, Microsoft Research has also introduced KOSMOS-1, a multi-modal large language model. NVIDIA then followed up with Prismer, their multi-modal vision-language model. Google, feeling the heat, then releases PaLM-E, also a multi-modal language model. Just about a week ago, Microsoft comes back with Visual ChatGPT, which connects ChatGPT with Visual Foundation Models to enable both receiving and sending images during chatting. Check out the demo animation on their GitHub page!
A LLaMA of My Own
Frustrated by the lack of access to large AI compute capabilities, the community has been working hard on downscaling large language models in order to be able to run them on consumer hardware.
- This started by Meta AI open sourcing LLaMA, a “small” 65B parameter model
- The weights of this model then got “leaked”
- llama.ccp then implements a very nimble and clean version in C++
- And finally, the release of Stanford Alpaca, a cheap-compute model, fine-tuned from the Meta AI LLaMA 7B model
This has allowed people to run language models with near-ChatGPT like capabilities on consumer-grade GPUs (some have even gotten it to run on a Raspberry Pi, albeit with an extremely slow inference time):
- Local Stanford Alpaca, train it and run on your own machine
- Meta AI LLaMa in your M1 Mac
- MiniLLM: Run modern LLMs on consumer-grade GPUs, codebase mostly in Python
- Alpaca-LoRA: Run a model that performs like Open AI text-davinci-003 in a Raspberry Pi
- Int-4 LLaMa is not enough. A new way to lower LLMs RAM requirements and to easily build Python apps using faster LLM inference
- alpaca.ccp, run a fast ChatGPT-like model locally on your device
- Dalai: Automatically install, run, and play with LLaMA on your computer
And then GPT-4 Releases
As if things couldn’t get wilder… From what we know so far, GPT-4 now has an up-to 32K token window size, can engage in visual chat, is better at reasoning and math, has improved at coding and many other tasks, and is more “aligned” to human morals and values.
- @taranjeet has listed all that people are doing with GPT-4 in this awesome-gpt4 repo
- Can GPT-4 *Actually* Write Code?
- What happens if you give GPT-4 a budget of $100 and told it to make as much money as possible?
- GPT-4 does have a world model
- And of course, people came up with clever prompt hacks to jailbreak GPT-4… also this one, and this one
- If you don’t have access to GPT-4, you can try it (for now) in Replit for free
- GPT-4 architecture: what we can deduce from research literature
- The 99 page GPT-4 Technical Report is also a treasure trove of information
And Everything Else…
-
- AI is Eating The World
“Enterprises: Plan Not for One, but Thousands of AI Touchpoints in Your Systems” - What is Temperature in NLP?
“Temperature is a parameter used in natural language processing models to increase or decrease the “confidence” a model has in its most likely response.” - LLMs and SQL
“To provide the LLM with enough information for it to generate reasonable queries for a given database, we need to effectively describe the database in the prompt.” - Machine Learning Ops. Project Scaffold
“This project contains the scaffold for MLOps of a machine learning project.” - Building an Efficient Machine Learning API
“Learn the techniques we used to build a performant and efficient product categorization endpoint that will be used within our product data pipeline.” - Understanding the attention mechanism in sequence models
“This architecture innovation dramatically improved model performance for sequence-to-sequence tasks such as machine translation and text summarization.” - OpenAI’s Whisper speech model – an overview
“A lap around OpenAI’s Whisper speech model and examples on how to use it for transcription.” - Online gradient descent written in SQL
Not sure one should really use this, but it’s pretty creative… - tidypredict
“The main goal of tidypredict is to enable running predictions inside databases. It reads the model, extracts the components needed to calculate the prediction, and then creates an R formula that can be translated into SQL.” - Inverse Reinforcement Learning
Transferring Task Sequencing Policies from Humans to Robots in Manufacturing Applications - MeshDiffusion: Score-based Generative 3D Mesh Modeling
“Compared to other 3D representations like voxels and point clouds, meshes are more desirable in practice” - High-resolution image reconstruction with latent diffusion models from human brain activity
“Here, we propose a new method based on a diffusion model (DM) to reconstruct images from human brain activity obtained via functional magnetic resonance imaging (fMRI)” - NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes
“Our final 3D mesh is physically accurate and can be rendered in real time on an array of devices.”
- AI is Eating The World