Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- Lessons from becoming a data-driven organization
The article to read this week: “Organizations across the business spectrum are awakening to the transformative power of data and analytics. They are also coming to grips with the daunting difficulty of the task that lies before them. It’s tough enough for many organizations to catalog and categorize the data at their disposal and devise the rules and processes for using it. It’s even tougher to translate that data into tangible value. But it’s not impossible, and many organizations, in both the private and public sectors, are learning how.”
- Breaking the black box: How Machines Learn to Be Racist
The fourth installment in a series of articles from Propublica that aims to explain and peer inside the black-box algorithms that increasingly dominate our lives. See also “Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks.“
- The problem with p-values
Solid longread article on the issues with significance tests and p-values.
- China’s plan to organize its society relies on ‘big data’ to rate everyone
Imagine a world where an authoritarian government monitors everything you do, amasses huge amounts of data on almost every interaction you make, and awards you a single score that measures how “trustworthy” you are.
- European Machine Intelligence Landscape (direct link to image)
“We @ProjectJunoAI are big fans of landscapes. That’s why we’ve created a machine intelligence landscape focused entirely on Europe.”
- TSFRESH: Automatic extraction of relevant features from time series
Data Scientists often spend most of their time either cleaning data or building features. While we cannot change the first thing, the second can be automated. TSFRESH frees your time spend on building features by extracting them automatically. Hence, you have more time to study the newest deep learning paper, read hacker news or build better models.
- A look at trends and anomalies in 1.5 million baby names
Interesting analysis and nice presentation of results!
- Building an efficient neural language model over a billion words
Facebook AI Research (FAIR) designed a novel softmax function approximation tailored for GPUs to efficiently train neural network based language models over very large vocabularies.
- What to Know Before You Get In a Self-driving Car
Uber thinks its self-driving taxis could change the way millions of people get around. But autonomous vehicles aren’t anywhere near to being ready for the roads.
- Chatbots with Social Skills Will Convince You to Buy Something
Virtual assistants that can read social cues and nonverbal signals are less jarring—and surprisingly persuasive.
- Neural Enhance
What if you could increase the resolution of your photos using technology from CSI laboratories? Thanks to deep learning, it’s now possible to train a neural network to zoom in to your images at 2x or even 4x. The catch? The neural network is hallucinating details based on its training from example images. It’s not reconstructing your photo exactly as it would have been if it was HD. That’s only possible in Holywood — but using deep learning as “Creative AI” works and its just as cool! Here’s how you can get started…
- Data Recycling
“Data recycling is using data from other contexts to bootstrap your initial statistical models until you can collect live data.”
- Scheduling Spark jobs with Airflow
This post gives a walkthrough of how to use Airflow to schedule Spark jobs.
- Analyzing Clickstream Data with Markov Chains in R
This R package allows for a modeling of lists of clickstreams as zero-, first- and higher-order Markov chains.
- DummyRDD: A test class that walks like and RDD, talks like an RDD but is just a list
Helpful library for experimentation and testing.
- A Gremlin Implementation of the Gremlin Traversal Machine
An introduction to Apache TinkerPop and GraphML using Gremlins.
- The fog of data (presentation)
Strategies for finding data where you have gaps.
- Horror imagery generated by artificial intelligence.
Since it’s Halloween…