Web Picks (week of 28 September 2020)

Posted on October 17, 2020

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

Algorithm Helps New York Decide Who Goes Free Before Trial
Point system used by judges awards defendants better chance at release based on data including convictions—and if they are reachable by phone
Why Twitter’s image cropping algorithm appears to have white bias
Several users posted a lot of photos to show that in an image that has people with different colors, Twitter chooses to show folks with lighter skin after cropping those images to fit its display parameters on its site and embeds
AI Democratization in the Era of GPT-3
The exclusivity agreement and the prior decision by OpenAI to not open source the GPT-3 code represent troubling developments for the notion of democratizing AI
Advancing NLP with Efficient Projection-Based Model Architectures
“We present one such model, pQRNN, and show that this new architecture can nearly achieve BERT-level performance, despite being 300x smaller and being trained on only supervised data”
Amazon Drivers Are Hanging Smartphones in Trees to Get More Work
Someone seems to have rigged Amazon system to get orders first
A college student used GPT-3 to write fake blog posts and ended up at the top of Hacker News
He says he wanted to prove the AI could pass as a human writer
Data Science Meets Devops
MLOps with Jupyter, Git, & Kubernetes
Effective testing for machine learning systems
“In this blog post, we’ll cover what testing looks like for traditional software development, why testing machine learning systems can be different, and discuss some strategies for writing effective tests for machine learning systems.”
Differentiable Dithering
Reducing image colors using gradient descent
GPT3 Empowered Recommendation System
Recommendation as given by an NLP model
fast.ai releases new deep learning course, four libraries, and 600-page book
Artificial Intelligence is stupid and causal reasoning won’t fix it
The Computational Limits of Deep Learning
“Extrapolating forward this reliance reveals that progress along current lines is rapidly becoming economically, technically, and environmentally unsustainable. Thus, continued progress in these applications will require dramatically more computationally-efficient methods, which will either have to come from changes to deep learning or from moving to other machine learning methods.”
Developers, Choose Wisely: a Guide for Responsible Use of Machine Learning APIs
Don’t use machine learning APIs blindly, especially if they are black boxes
minGPT
“A PyTorch re-implementation of GPT training. minGPT tries to be small, clean, interpretable and educational, as most of the currently available ones are a bit sprawling. GPT is not a complicated model and this implementation is appropriately about 300 lines of code”
InvoiceNet
Deep neural network to extract intelligent information from invoice documents.
Taming the Tail: Adventures in Improving AI Economics
AI has enormous potential to disrupt markets that have traditionally been out of reach for software. These markets – which have relied on humans to navigate natural language, images, and physical space – represent a huge opportunity, potentially worth trillions of dollars globally.
sktime
sktime is a Python machine learning toolbox for time series with a unified interface for multiple learning tasks
pumas.ai
Powerful tools and services for pharmaceutical innovation and healthcare delivery, driven by scientists.
Stopping deepfake news with an AI algorithm that can tell when a face doesn’t fit
A new artificial intelligence technique can automatically detect face-swapped videos of politicians
Generative Bad Handwriting
Show this tweet to a doctor
Gigapixel AI Accidentally Added Ryan Gosling’s Face to This Photo
Word to the wise, if you’re using Gigapixel AI to upscale your landscape or cityscape images—and we have to admit it usually works really well—you may want to uncheck detect faces…
DuckDB
DuckDB is an embeddable SQL OLAP database management system
Deequ – Unit Tests for Data
Deequ is a library built on top of Apache Spark for defining “unit tests for data”, which measure data quality in large datasets
Norfair
Norfair is a customizable lightweight Python library for real-time 2D object tracking.
Eiten – Algorithmic Investing Strategies for Everyone
Eiten is an open source toolkit by Tradytics that implements various statistical and algorithmic investing strategies such as Eigen Portfolios, Minimum Variance Portfolios, Maximum Sharpe Ratio Portfolios, and Genetic Algorithms based Portfolios. It allows you to build your own portfolios with your own set of stocks that can beat the market. The rigorous testing framework included in Eiten enables you to have confidence in your portfolios.
How to win Kaggle competitions with Anthony Goldbloom
and which jobs we should be worried about losing to AI in the next few decades.
octo
Expose data from any database as web service
handcalcs
Python calculations in Jupyter, as though you wrote them by hand.
Fixing the analytics models that COVID-19 broke
COVID-19 has upended life as we knew it, and our new behaviors are wreaking similar havoc on some analytics models that rely on historical data. How can leaders enable teams to get analytics back on track?
einops
Flexible and powerful tensor operations for readable and reliable code. Supports numpy, pytorch, tensorflow, and others.