Web Picks (week of 8 June 2020)

Posted on June 21, 2020

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

40 Years on, PAC-MAN Recreated with AI by NVIDIA Researchers
GameGAN, a generative adversarial network trained on 50,000 PAC-MAN episodes, produces a fully functional version of the dot-munching classic without an underlying game engine.
Identifying and mitigating liabilities and risks associated with AI
As AI and machine learning become more widely deployed, lawyers and technologists need to collaborate more closely so they can identify and mitigate liabilities and risks associated with AI.
Why is Artificial Intelligence So Useless for Business?
AI research has structural problems that limit how much it can impact business. But understanding why gives a way to determine what will work and what won’t, as well as reveal new business opportunities.
DeepFaceDrawing Generates Photorealistic Portraits from Freehand Sketches
A team of researchers from the Chinese Academy of Sciences and the City University of Hong Kong has introduced a local-to-global approach that can generate lifelike human portraits from relatively rudimentary sketches.
Acme: A new framework for distributed reinforcement learning
Acme is a framework for building readable, efficient, research-oriented RL algorithms.
AutoSweep: Recovering 3D Editable Objects from a Single Photograph
This paper presents a fully automatic framework for extracting editable 3D objects directly from a single photograph.
How to Build your own Feature Store
“We have many conversations with companies and organizations who are deciding between building their own feature store and buying one. Given the increasing interest in building feature stores, we thought we would share our experience of building one and motivate some of the decisions and choices we took”
An Introduction to Apache Airflow
Airflow is a platform created by the community to programmatically author, schedule, and monitor workflows.
GPT-3, a Giant Step for Deep Learning And NLP
“A few days ago, OpenAI announced a new successor to their Language Model (LM) – GPT-3. This is the largest model trained so far, with 175 billion parameters.”
GPT-3: a disappointing paper
““GPT-3″ is just a bigger GPT-2. In other words, it’s a straightforward generalization of the “just make the transformers bigger” approach that has been popular across multiple research groups since GPT-2.”
Language Models are Few-Shot Learners
“Humans can generally perform a new language task from only a few examples or from simple instructions – something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.”
Self Supervised Representation Learning in NLP
“While Computer Vision is making amazing progress on self-supervised learning only in the last few years, self-supervised learning has been a first-class citizen in NLP research for quite a while. Language Models have existed since the 90’s even before the phrase “self-supervised learning” was termed. The Word2Vec paper from 2013 popularized this paradigm and the field has rapidly progressed applying these self-supervised methods across many problems.”
In Mathematics, It Often Takes a Good Map to Find Answers
“Mathematicians try to figure out when problems can be solved using current knowledge — and when they have to chart a new path instead.”
Penrose: from mathematical notation to beautiful diagrams
“We introduce a system called Penrose for creating mathematical diagrams. Its basic functionality is to translate abstract statements written in familiar math-like notation into one or more possible visual representations.”
How to calculate the alignment between BERT and spaCy tokens effectively and robustly
“Natural Language Processing (NLP) has made great progress in recent years because of neural networks, which allows us to solve various tasks with end-to-end architecture. However, many NLP systems still requires language-specific pre- and post-processing, especially in tokenizations. In this article, I describe an algorithm which simplifies calculating of correspondence between tokens (e.g. BERT vs. spaCy), one such process.”
Apache Drill is winding down
aitextgen: A robust Python tool for text-based AI training and generation using GPT-2
aitextgen is a Python package that leverages PyTorch, Hugging Face Transformers and pytorch-lightning with specific optimizations for text generation using GPT-2, plus many added features.
Newscatcher: Programmatically collect normalized news from (almost) any website.
Filter by topic, country, or language.
DuckDB is an embeddable SQL OLAP Database Management System
Real time Image Animation
The Project is real time application in opencv using the first order model
Surfboard: Audio Feature Extraction for Modern Machine Learning
“We introduce Surfboard, an open-source Python library for extracting audio features with application to the medical domain.”
julia as a cli calculator
“I have been using the Julia repl as my day-to-day calculator on the command line for more than a year now. I am surprised that I haven’t seen more posts talking about how powerful and extensible the language is for simple numerical calculations and scripting”