On machine learning, job security, professional pride, and network trolling

If you are a data scientist, I am sure you wondered whether deep neural networks will replace you at your job one day. Every time I read about reports of researchers who managed to trick neural networks, I wonder whether the researchers were thinking about their job security, or their professional pride while performing the experiments. I think that the first example of such a report is a 2014 paper by Christian Szegedy and his colleagues called “Intriguing properties of neural networks“. The main goal of this paper, so it seems, was to peek into the black box of neural networks. In one of the experiments, the authors designed minor, invisible perturbation of the original images. These perturbations diminished the classification accuracy of a trained model.

Screen Shot 2017-11-21 at 16.50.05.png

In the recent post “5 Ways to Troll Your Neural Network” Ben Orlin describes five different ways to “troll a network”.

Image credit: Figure 5 from “Intriguing properties of neural networks“.

Interactive Network Visualization in Python with NetworkX and PyQt5 Tutorial

Unfortunately, there is no widely accepted, ready to use, standard way to interactively visualize networks in python. The following post shows yet another attempt to build an ad-hoc app.

Sonia Kopel

My boss came to me the other day with a new type of project. This time we would not be doing our usual predictive modeling in R, but instead we would be solving a graph theory problem… and we would be doing it in Python.

Our end goal was to create a visualization of a network that a user could click on that would do the following things: display immediate subgraph of a selected node, display shortest path between two selected nodes, and display most likely path between two selected nodes. We decided the best approach would be to start with a small test network, and set up a graphical system that would visualize the network and allow for user interactivity.

The test network featuring my boss’s daily movements was a .CSV and is shown below:

Networks network

The origin and dest columns represent the nodes (places my boss goes) connected by…

View original post 1,337 more words

Another set of ruthless critique pieces

You know that I like reading a ruthless critique of others’ work — I like telling myself that by doing so I learn good practices (in reality, I suspect I’m just a case what we call in Hebrew שמחה לאיד — the joy of some else’s failure).

Anyhow, I’d like to share a set of posts by Lior Patcher in which he calls bullshit on several reputable people and concepts. Calling bullshit is easy. Doing so with arguments is not so. Lior Patcher worked hard to justify his opinion.

 

Unfortunately, I don’t publish academic papers. But if I do, I will definitely want prof. Patcher read it, and let the world know what he thinks about it. For good and for bad.

Speaking of calling bullshit. Believe it or not, University of Washington has a course with this exact title. The course is available online http://callingbullshit.org/ and is worth watching. I watched all the course’s videos during my last flight from Canada to Israel. The featured image of this post is a screenshot of this course’s homepage.

 

 

 

We’re Reading About Simplifying Without Distortion and Adversarial Image Classification

Weekly reading list from the data.blog team

Data for Breakfast

Boris Gorelik

Recently, I heard an interview with Desmond Morris, the author of The Naked Ape. He reveals that the goal of his writing has always been to “simplify without distortion.” This interview reminded me of the (EXCELLENT) blog “Math with Bad Drawings” by Ben Orlin, a math teacher from Birmingham, England. At his blog, Ben Orlin does exactly that: He simplifies without distorting. I highly suggest following this blog. At the bare minimum, read the latest post, “5 Ways to Troll Your Neural Network.”

Do you know of other blogs that educate readers about mathematics, statistics, machine learning, and other, related fields? Share your favorites in the comments.

Carly Stambaugh

In image classification tasks, an adversarial example is one that has been altered in small ways which are imperceptible to the human eye, in a targeted manner with the intention of “fooling” the classifier. Over at…

View original post 43 more words

Good information + bad visualization = BAD

I went through my Machine Learning tag feed. Suddenly, I stumbled upon a pie chart that looked so terrible, I was sure the post would be about bad practices in data visualization. I was wrong. The chart was there to convey some information. The problem is that it is bad in so many ways. It is very hard to appreciate the information in a post that shows charts like that. Especially when the post talks about data science that relies so much on data visualization.

via Math required for machine learning — Youth Innovation

I would write a post about good practices in pie charts, but Robert Kosara, of https://eagereyes.org does this so well, I don’t really think I need to reinvent the weel. Pie charts are very powerful in conveying information. Make sure you use this tool well. I strongly suggest reading everything Robert Kosara has to say on this topic.

 

 

What are the best practices in planning & interpreting A/B tests?

Screenshots of the reading mentioned in this post

Compiled by my teammate Yanir Serourssi, the following is a reading list an A/B tests that you should read even if you don’t plan to perform an A/B test anytime soon. The list is Yanir’s. The reviews are mine. Collective intelligence in action 🙂

  • If you don’t pay attention, data can drive you off a cliff
    In this post, Yanir lists seven common mistakes that are common to any data-based analysis. At some point, you might think that this is a list of trivial truths. Maybe it is. The fact that Yanir’s points are trivial doesn’t make them less correct. Awareness doesn’t exist without knowledge. Unfortunately, knowledge doesn’t assure awareness. Which is why reading trivial truths is a good thing to do from time to time.
  • How to identify your marketing lies and start telling the truth
    This post was written by Tiberio Caetano, a data science professor at the University of Sidney. If I had to summarize this post with a single phrase, that would be “confounding factors”. A confounding variable is a variable hidden from your eye that influences a measured effect. One example of a confounding variable is when you start an ad campaign for ice cream, your sales go up, and you conclude that the ad campaign was effective. What you forgot was that the ad campaign started at the beginning of the summer, when people start buying more ice cream anyhow.
    See this link for a detailed textbook-quality review of confounding variables.
  • Seven rules of thumb for web site experimenters
    I read this review back in 2014, shortly after it was published by, among others, researchers from Microsoft and LinkedIn. Judging by the title, one would expect yet another list of trivial truths in a self-promoting product blog. This is not the case here. In this paper, you will find several real-life case studies, many references to marketing studies, and no advertising of shady products or schemes.
  • A dirty dozen: Twelve common metric interpretation pitfalls in online controlled experiments
    Another academic paper by Microsoft researchers. This one lists a lot of “dont’s”. Like in the previous link, every advice the authors give is based on established theory and backed up by real data.

How to make a racist AI without really trying (a reblog)

Perhaps you heard about Tay, Microsoft’s experimental Twitter chat-bot, and how within a day it became so offensive that Microsoft had to shut it down and never speak of it again. And you assumed that you would never make such a thing, because you’re not doing anything weird like letting random jerks on Twitter re-train […]

via How to make a racist AI without really trying — ConceptNet blog

Data Science or Data Hype?

In his blog post Big Data Or Big Hype? , Rick Ciesla is asking a question whether the “Big Data” phenomenon is “a real thing” or just a hype? I must admit that, until recently, I was sure that the term “Data Science” was a hype too — an overbroad term to describe various engineering and scientific activities. As time passes by, I become more and more confident that Data Science matures into a separate profession. I haven’t’ yet decided whether the word “science” is fully appropriate in this case is.

We have certainly heard a lot about Big Data in recent years, especially with regards to data science and machine learning. Just how large of a data set constitutes Big Data? What amount of data science and machine learning work involves truly stratospheric volumes of bits and bytes? There’s a survey for that, courtesy of […]

via Big Data Or Big Hype? — VenaData