Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- The Brutal Fight to Mine Your Data and Sell It to Your Boss
Silicon Valley makes billions of dollars peddling personal information, supported by an ecosystem of bit players. One of them, an upstart called HiQ, is going up against LinkedIn in a battle for your lucrative professional identity. - The Ivory Tower Can’t Keep Ignoring Tech
“These days, big data, artificial intelligence and the tech platforms that put them to work have huge influence and power. Algorithms choose the information we see when we go online, the jobs we get, the colleges to which we’re admitted and the credit cards and insurance we are issued. It goes without saying that when computers are making decisions, a lot can go wrong.” - How Facebook’s Oracular Algorithm Determines the Fates of Start-Ups
“The platform is so good at “microtargeting” that many small e-commerce companies barely even bother advertising anywhere else.” - What do China’s police collect on citizens in order to predict crime? Everything
“These databases can scoop up everything from addresses, to medical history, supermarket membership, and delivery records. Data analytics and cloud computing allow security bureau authorities to then look for patterns in personal data far more efficiently than was possible even a few years ago, and make predictions.” - Machine learning: a different (dashboard) light on the Paradise Papers
Interesting column by our ex-colleague, dr. Véronique Van Vlasselaer. - This AI Can Spot Art Forgeries by Looking at One Brushstroke
A new system can break a work down into individual brush or pencil lines and figure out the artist behind it. - A Year After Pledging Openness, Apple Still Falls Behind On AI
A year ago, Apple pledged it would engage more with the academic research community. But today, AI experts say it’s still not enough. - Improving TripAdvisor Photo Selection With Deep Learning
“We could ask property owners to rate photos, select the main photo for their listing, and tag photos by their scene type. We could hire an army of photo moderators to tag, rank, and select photos, but that would be slow and expensive. Instead, we’ve found that Deep Learning networks, trained on fast GPU hardware, were surprisingly good at improving our photo selections.” - How China’s meshing ride-sharing data with smart traffic lights to ease road congestion
Chinese ride-hailing giant Didi Chuxing is lending its data to authorities as part of a new initiative to ease traffic congestion. - This AI Learns Your Fashion Sense and Invents Your Next Outfit
A new kind of AI system could create personalized clothing based on a shopper’s taste. - A neural algorithm for a fundamental computing problem
“Flies use an algorithmic neuronal strategy to sense and categorize odors. Dasgupta et al. applied insights from the fly system to come up with a solution to a computer science problem. On the basis of the algorithm that flies use to tag an odor and categorize similar ones, the authors generated a new solution to the nearest-neighbor search problem that underlies tasks such as searching for similar images on the web.” - A Visual Guide to Evolution Strategies
It’s all about escaping those local optima. - MSpaint to Terrain Map with GAN
Interesting project using GANs to convert sketches to terrain maps. - Augmented reality with Python and OpenCV
Very interesting post on getting started with OpenCV and AR. - Understanding LSTM and its diagrams
At a first sight, they looks intimidating, but LSTM diagrams are a powerful way to communicate neural network architectures. - A Beginner’s Guide to Optimizing Pandas Code for Speed
“It’s true that your Pandas code is unlikely to reach the calculation speeds of, say, fully optimized raw C code. However, the good news is that for most applications, well-written Pandas code is fast enough; and what Pandas lacks in speed, it makes up for in being powerful and user-friendly.” - IBM’s Schneier: It’s Time to Regulate IoT to Improve Cyber-Security
In a keynote address at the SecTor security conference, IBM Resilient Systems CTO Bruce Schneier makes a case for more regulatory oversight for software and the internet of things. - Modern Media Is a DoS Attack on Your Free Will
How the attention economy is subverting our decision-making and our democracy. - The Girl with the Brick Earring
Combining one part web scraping, one part color theory, one part pandas, and many Lego bricks. - Evolving Stable Strategies
“In this article, I will explore applying ES to some of these RL problems, and also highlight methods we can use to find policies that are more stable and robust.” - Datasette: instantly create and publish an API for your SQLite databases
“I just shipped the first public version of datasette, a new tool for creating and publishing JSON APIs for SQLite databases.” - Trying to organize my Twitter timeline, using unsupervised learning
Using web scraping and DBSCAN, with an interesting Julia implementation.