Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
Are State Space Models the Next Transformers?
For now, Transformers on top of the world for sequence modelling. Some researchers and companies are starting to use SSMs to solve some of the issues that Transformers suffer from.
- Want to know how these models work? Both Hugging Face and Maarten Grootendorst (great name) have fantastic introductions
- The latest breakthrough came just a few weeks ago in the form of Mamba-2
- SSM models have been used for time series forecasting already, as well as voice generation and learning from audio. There’s also this survey which lists more application areas
- Before you get too hyped up, however, some researchers have also been critical, as in the paper “The Illusion of State in State-Space Models” in which the authors claim that SSMs still suffer from some fundamental limitations
Graphs and RAGs
Are graphs the ideal match to improve Retrieval Augmented Generation, or a new way for graph databases to generate more sales?
- According to Neo4j, the future of AI lies in knowledge graphs
- Some recent papers also see potential in using graphs together with RAG, e.g. GRAG and GNN-RAG
- LlamaIndex introduces the Property Graph Index – “A Powerful New Way to Build Knowledge Graphs with LLMs”
- And also LangChain has its own LangGraph: “Harnessing the Power of Adaptive Routing, Corrective Fallback, and Self-Correction”
The foundation models keep coming
Not just for text and language, but also for time series and now even graphs!
- Nixtla has already been benchmarking the time series foundation models and conclude that TimeGPT-1 ranks first in terms of accuracy and speed inference compared to the latest foundation models, including TimesFM (Google), Chronos (Amazon), Moirai (SalesForece), and Lag-LLama (Service Now)
- GraphAny: A Foundation Model for Node Classification on Any Graph is a bit surprising, but the authors do show that the technique works better than GCN and GAT
Other interesting links
- Scalable MatMul-free Language Modeling – This is the paper that caused a lot of people to start tweeting about “the downfall of Nvidia?”
- ReLU-KAN: New Kolmogorov-Arnold Networks that Only Need Matrix Addition, Dot Multiplication, and ReLU – Likewise, also KANs have been getting a lot of traction and can now we efficiently trained on-GPU… still needs MatMul however
- We used data drift signals to estimate model performance – Fantastic article on MLops and the first time we heard about “Probabilistic Adaptive Performance Estimation (PAPE)”
- Agents aren’t all you need – Parcha shares lessons from their journey of using GPT to automate compliance workflows
- Extracting Concepts from GPT-4 – OpenAI used new scalable methods to decompose GPT-4’s internal representations into 16 million oft-interpretable patterns
- A Picture is Worth 170 Tokens: How Does GPT-4o Encode Images? – An incredible interesting read on how GPT-4o works
- LeRobot – State-of-the-art Machine Learning for real-world robotics
- Is this the ChatGPT moment for recommendation systems? – Researchers at Meta recently published a ground-breaking paper that combines the technology behind ChatGPT with Recommender Systems