Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- Multimodal Neurons in Artificial Neural Networks
“We’ve discovered neurons in CLIP that respond to the same concept whether presented literally, symbolically, or conceptually. This may explain CLIP’s accuracy in classifying surprising visual renditions of concepts, and is also an important step toward understanding the associations and biases that CLIP and similar models learn.” - Multimodal deep learning approach for event detection in sports using Amazon SageMaker
“Our solution uses a multimodal architecture utilizing video, static images, audio, and optical flow data to develop and fine-tune a model, followed by boosting and a postprocessing algorithm.” - Understanding Deep Learning (Still) Requires Rethinking Generalization
“Our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice.” - AI Teaches Itself Diplomacy
Another classic game deepens a skillset with broad applications—diplomatic savvy often needed in a pinch, crunch, or imbroglio - ‘Deep Nostalgia’ Can Turn Old Photos of Your Relatives Into Moving Videos
“A company called MyHeritage who provides automatic AI-powered photo enhancements is now offering a new service that can animate people in old photos creating a short video that looks like it was recorded while they posed and prepped for the portrait.” - AI is killing choice and chance – which means changing what it means to be human
“As AI increasingly shapes the human experience, how does this change what it means to be human? Central to the problem is a person’s capacity to make choices, particularly judgments that have moral implications.” - Supercharging Apache Superset
How Airbnb customized Superset for business intelligence at scale - How to break a model in 20 days. A tutorial on production model analytics.
“It is a story of how we trained a model, simulated production use, and analyzed its gradual decay.” - Is Facebook’s “Prophet” the Time-Series Messiah, or Just a Very Naughty Boy?
“Funny thing is though, that if you poke around a little you’ll quickly come to the conclusion that few people who have taken the trouble to assess Prophet’s accuracy are gushing about its performance.” - Feature Stores – A Hierarchy of Needs
“Feature stores have gotten a lot of attention lately. In December 2020, Amazon Web Services released its SageMaker Feature Store. Last month, Splice Machine, a big data platform, launched its own feature store too.” - Visualizing Data Timeliness at Airbnb
“Over the last year, multiple teams came together to build SLA Tracker, a visual analytics tool to facilitate a culture of data timeliness at Airbnb.” - Gartner Top 10 Data and Analytics Trends for 2021
“These data and analytics trends can help organizations and society deal with disruptive change.” - A View Of The Future Of Our Data
Welcome to the era of data coalitions. - Can Auditing Eliminate Bias from Algorithms?
A growing industry wants to scrutinize the algorithms that govern our lives—but it needs teeth. - Do Language Models Know How Heavy an Elephant Is?
Humans have a pretty good sense of scale, or reasonable ranges of these numeric attributes, of different objects, but do pre-trained language representations? - YouTube AI Blocked Chess Channel after Confusing ‘Black’ and ‘White’ for Racist Slurs
“Although the channel was restored within 24 hours, YouTube did not explain why the Croatian chess player Antonio Radic was blocked.” - Beauty is in the brain: AI reads brain data, generates personally attractive images
Researchers have succeeded in making an AI understand our subjective notions of what makes faces attractive. - Kazuo Ishiguro Uses Artificial Intelligence to Reveal the Limits of Our Own
In his latest novel, the gaze of an inhuman narrator gives us a new perspective on human life, a vision that is at once deeply ordinary and profoundly strange. - Medical chatbot using OpenAI’s GPT-3 told a fake patient to kill themselves
“Various tasks, “roughly ranked from low to high sensitivity from a medical perspective,” were established to test GPT-3’s abilities” - An Introduction to Hierarchical Modeling
This visual explanation introduces the statistical concept of Hierarchical Modeling, also known as Mixed Effects Modeling or by these other terms. - Querybook is Pinterest’s open-sourced big data IDE via a notebook interface.
Querybook’s core focus is to make composing queries, creating analyses, and collaborating with others as simple as possible - State-of-the-Art Image Generative Models
“I have aggregated some of the SotA image generative models released recently, with short summaries, visualizations and comments.” - Introduction to Bias in AI
Since it’s impractical to create a dataset with all possible permutations and domains, all datasets have some form of bias in them. - How to Efficiently Choose the Right Database for Your Applications
“We categorized these databases by application scenario and database interface, and we built a matrix” - Dolt is Git for Data!
Dolt is a SQL database that you can fork, clone, branch, merge, push and pull just like a git repository. - Modern Text Features in R
“I’m extremely pleased to present the culmination of several years of work spanning the systemfonts, textshaping, and ragg packages.” - Gradient-free-optimizers
Simple and reliable optimization with local, global, population-based and sequential techniques in numerical discrete search spaces. - ELT for the DataOps era
Meltano is open source, self-hosted, CLI-first, debuggable, and extensible. - Clickhouse as an alternative to ElasticSearch and MySQL
“This post is about the major reasons why we chose Clickhouse and not ElasticSearch (or MySQL) as a storage solution for ApiRoad.net essential data – request logs” - Apache AGE
Apache AGE a PostgreSQL extension that provides graph database functionality. AGE is an acronym for A Graph Extension, and is inspired by Bitnine’s fork of PostgreSQL 10, AgensGraph, which is a multi-model database. The goal of the project is to create single storage that can handle both relational and graph model data so that users can use standard ANSI SQL along with openCypher, the Graph query language.