Joining me on this week’s episode of Ethical Voices is Chris Penn, the co-founder and chief data scientist for Trust Insights, which he founded after leaving SHIFT Communications. We discuss PR ethics and artificial intelligence. He’s an authority on analytics, digital marketing and marketing technology. Frankly, he is one of the most insightful communications professionals I know when it comes to technology, AI and analytics.
In this interview, he discusses:
- What to do when your personal ethics conflict with your company’s ethics
- Ethical issues PR people will face with artificial intelligence (AI)
- What questions communications pros should ask their data scientists
- The scary thing about FaceApp people have overlooked
Why don’t we start off by telling us a little bit more about yourself and your career?
I am originally an IT person. I graduated from Boston University with a masters in information systems management, and I worked in IT until I got to a startup in 2003 in financial services, where I was the CIO, the CTO, and the guy who cleaned the restroom on Fridays.
Over that early decade, technology became marketing, and marketing adopted technology, and I’ve found, myself, in marketing…where the parties are better. I had one of the most, probably one of the earliest marketing podcasts, Marketing Over Coffee, which is still on the air.
And then, right around 2010, I jumped full time into marketing. I really started digging into data analysis, analytics and things, and then went to SHIFT for a while, and started building out my personal data science and machine learning capabilities.
And at a certain point, it was like, “You know what? This is what I want to do. I don’t want to work in public relations. I want to just focus on machine learning and AI and stuff.” And that’s how we got to where we are today.
One of the biggest challenges I actually had was when I was working in financial services. I worked at a company that fundamentally resold, or created and resold, student loans. Our job was to put people in debt.
Banks and lending companies paid thousands of dollars per loan application, particularly for federal student loans, because they were guaranteed by the government. They were super low risk financial contracts that they could then blend into higher risk contracts and sort of create these things that eventually led up to the 2008 recession.
And so, the personal challenge was, okay, I, and this company that I was working for, fundamentally, you’re making the world a worse place. Yes, people are getting access to education, but at an extremely high cost. They may not be able to pay it back. So, the challenge was, how do we balance the business need with the human need?
And what ended up happening was, I went the route of creating content for free. Here’s how you do things like scholarship search. I created a podcast, one of the earliest, called The Financial Aid Podcast. This is 2005. Did 934 episodes of that, 15 minutes a day, every weekday, for five years.
I created seven editions of a scholarship search ebook, which is still almost mostly relevant today. And so my ethical balance, was I have given you five years of daily information, and seven books on how you can go to college for free. It requires a lot of work to apply for scholarships. You treat it like a full time job.
If you don’t want to do that work, than here’s a loan application. Now, you understand the trade off. You can put in the hard work now and not have to pay back the money, or you can take the easy path and then end up having to pay the money. But at least, ethically, I gave people the choice. You can do this without the burden later on if you take on the burden now.
There’s a famous quote from the movie Jurassic Park from Doctor Ian Malcolm’s character, played by Jeff Goldblum, who said, “Your scientists were so preoccupied with whether they could that they never stopped to think about whether they should.” And while he was talking about cloning dinosaurs, that quote equally applies to artificial intelligence, to machine learning and to data science.
Just because you can doesn’t mean you should. And one of the things that I see as a massive gap in our industry is ethical oversight of the machines, and of the people who are creating the code and the frameworks and things for them. There are very few companies that are baking ethics into this, but IBM being one of them and we are an IBM registered business partner.
Fundamentally, people are not asking two questions.
- How could this be misused?
- What could go wrong?
We see this so many times. I was working with a customer not too long ago who sent us a de-identified, meaning personal information removed, data of their customers. In the second file, they had some healthcare related data, all de-identified, and then a third file, they had some purchase location, or warranty registration data.
Well, guess what? If you bind, and this is something that’s a trivial task in data science, you bind all three files together, you can reverse engineer where a person lives, what their health conditions are, et cetera, et cetera. And so the number one question that we asked, because we were thinking about this, is, how could this be misused?
If this fell into the hands of a hostile actor, competitive company, hacking group, foreign power, whatever. How could this be misused? And the answer is clear. You can reverse engineer someone’s identity and get healthcare information about that person. That is incredibly dangerous.
One of my concerns, personally, is seeing lots and lots of people are going through these crash course in data science, crash course in AI, six weeks crash course in AI from Coursera or Udemy or whoever. They’re getting a very specific, narrow skill set, usually a type of coding, like how you use TensorFlow, or Keras, or whatever.
But they’re getting none of the experience in data science. None of the ethics of data science. None of the, “Gosh, should we do this? Is this a good idea? How could this be misused? If someone were to testify against us in court, what would this look like? If this information or this database or this algorithm or a model leaked out, what would the consequences of that be?” None of those questions are being answered today.
I always say, “Do you know what your data scientists are doing?” Because I think that is an area of significant exposure for companies
It’s not just that. If you want to have a fun time, go onto YouTube. Search for Def Con 19, which is 2012, and there’s a speaker, Jason Street, I think. His talk was something like “Steal Everything, Kill Everyone, Cause Total Financial Ruin”
And it’s a 45 minute video of him reviewing all of the things that he was paid to do, as sort of like a red team consultant, to essentially break into places and take advantage of the things that everybody does in their company. Like he said, “The number one thing I do is I just walk around every printer and pick up sheets of paper people leave everywhere.”
He’s like, “Yeah, anything you could want just comes right out of that nice recycling blue bin. And even for those offices that are super secured and the premises are locked, what do they do? 5:00 PM every day, they put the little blue bin outside their door.”
So it’s not just your data scientist. It’s everybody from, literally the janitor to the CEO that has an obligation to understand what information and data you have, who has access to it, and how it’s being controlled.
What questions should communications professionals ask data scientists about how they are using AI in addition to, how can this be misused?
There are two questions that indicate the level of sophistication of a real data scientist. Number one is somebody asking about the business requirements of the project. What is the business outcome that you’re looking to engineer? What are the consequences? What are the chain of evidence from the data to the KPI to the goal? So the outcome the reported measurement.
If you’re not being asked about the why, you’re only getting questions about the how, how do you want to do this? You’re dealing with somebody who’s not a very experienced data scientist.
The second question, and the place where I see data science projects go off the rails all the time, is in data requirements gathering.
This is when you and the data scientists sit down to say, “Okay, what data do you have? What data do you need? Where can we go get it from, and then what is the governance of that data?” And it is that last part that, again, everybody who is inexperienced in the field misses on. How are you going to govern this data? Who has access to it, how should it be secured, what formats already is it in, and so on and so forth?
And it is that management of data that is the differentiator between someone who is probably just a developer who took a data science course versus someone who has grown up working with data and knows the ways that your project is most likely going to go wrong.
That’s one of the things that I talk about in a lot of the keynotes I speak at, now. It’s like, “In your first AI project, these are the four things that you are almost certainly going to do wrong. So I’m just telling you now so that you can, later on, when you make those mistakes, you’re like, ‘Oh yeah, Chris warned me about that.'”
You’re talking about safeguarding and governance and compliance. Should consumers have any expectation of privacy moving forward?
No. And the consumer shouldn’t have any expectation of privacy now.
I agree with you and I think we’re going to give away more of our privacy when it comes to biometrics in the coming years, as well.
Absolutely. So, take a look at this. For example, the big topic that people were discussing for a couple of weeks, there was the whole FaceApp thing.
A lot of the commentary around that focused on the wrong things, and not on what you can do with a single image. What are the pieces of data that are encoded in an image? You can infer, with reasonable probability a person’s age, gender, ethnicity.
You can see how much stress they have in their life based on the patterns of wrinkles and lines on their face. You can guess their general health. Depending on the type of photo, you can even infer the type of socioeconomic background they come from, because there are certain telltale signs like dental work that indicates a person’s status growing up through life.
Because people who come from more affluent backgrounds tend to have better dental work. So all of these things can be inferred from just one photo, and people are like, “Well, you know how much damage that can they do?” Well, it’s not the one photo.
It’s now, with a good machine learning model, if I can infer age, gender, ethnicity, and I have your location, and I have your socioeconomic background, and I have how much stress you have in your life, I could create a segment in a marketing database for advertising purposes to target only angry white, middle aged men and show them political propaganda to manipulate an election.
So all of that comes from just a single photo. But again, no one was asking the question, “What can go wrong with this specific piece of data?”
What other questions should people be asking about when it comes to AI and machine learning?
As a consumer, one of the things that you should be constantly asking yourself is, what is the minimum requirement of truthfulness. One of the things that I teach my kids is, there are levels of honesty that are required in life. What is the minimum level of honesty?
When you go to a strange website that you don’t know and it asks you all these questions, like your date of birth and stuff like that, the minimum level honesty for that site is zero. You don’t have to tell them anything correct at all.
Tell them you’re 99 years old. Tell them that you’re of Zimbabwean origin. Tell them that you happen to be 18 feet tall, and as a result, you still get access to whatever the service is. But there’s no commitment of honesty. However, on the other end, if you’re trying to get, say, a passport from the U.S. Government, the minimum level of honesty is high.
You have to tell the truth, because they will do extensive checking. So teaching people of all ages to think about, what is the minimum level of honesty I need to have here in order to access the services that I want and not trade my personal information? Understanding that I have to give some information, but there’s no obligation that it has to be correct.
So what’s that minimum level of honesty that corporations should provide?
That is a very good question. From the data science perspective, the number one question that data scientists should be pushing back on marketers is, well, what do you need this information for? How useful is this information? Why do we need to collect something which just becomes a security risk?
One of the things I see marketers do the most wrong is they collect a ton of data and never use it. They never model it. They never build from it, and as a result, they have all this stuff which is a massive security risk, and get no benefit from it. There’s a massive opportunity cost, but no ROI.
So the question that every marketer should be asking themselves is what is the minimum level of data we need to be effective? One of the axioms of marketing data science and marketing AI is that the more granular a piece of information, the less useful it is for modeling.
Knowing someone’s gender is a useful thing to model. Knowing that someone’s first name is Mark, not that helpful. Knowing the year somebody was born could be useful. Knowing the day, the month and the year they were born? Not that helpful. So you collect less data, but you collect data that is useful for modeling.
You will be better off, you’ll reduce the security risks, and frankly, you’ll make it easier on the consumer, because it’s less information they have to hand over.
You’ve talked a lot about the fake information and the bad data. So how do you see communicators thriving in this new era of fake data and fake news? I’ve worked with some of the most shorted companies in America and with the ability for AI to potentially churn out fake data how are we going to prepare ourselves to thrive in this new era?
I blogged about this the other day. The number one thing a marketer and a communicator needs to understand is that your brand is, first and foremost, a symbol of trust. Brand is trust. If you don’t understand that, you need to exit the communications profession right now. Perhaps take up cooking.
Brand is trust, which means that your obligation is not to be the curator or the gatekeeper of information, or even the conveyor of information. Your obligation is to be the guarantor, the validator of information. Whatever you are, whatever function you’re in, your goal is to be the the most trustworthy authority, so that when fake news of some kind of happens, people know who to phone or text.
If you can be that trusted resource to your customers, to the media, to the influencers that are out there, you will do really well. The challenge is, trust takes, appropriately, a long time to build, and it is very, very, very fragile.
So you and everybody who works with you, have to be a steward of that trust, and make it one of the first guiding principles of the communications. The New York Times is not in the information business. It is not in the news business. It is in the trusted news business. And there’s a big difference between the two.
Thinking back to your career development, what is the best piece of ethics advice you were ever given?
Tell the truth. That’s it. Tell the truth. Because with AI and machine learning and the ability for our systems to infer, you will not get away with falsehoods for long. You will be found out, and with relatively few exceptions, there are always consequences for being found out. And for those exceptions, it’s just a matter of time before it catches up with them.
Is there anything else I didn’t ask you about that you think is important to share?
I would reiterate, you need to have a checklist of those questions. You should always be asking about your data. And if you want to do some more reading on this, a fantastic, totally free book from Dr. Hilary Mason and Mike Loukides is called The Ethics of Data Science. It’s on Amazon for the grand sum of $0.
If you are doing work in data, I think it’s mandatory reading. And even if you’re not doing work in data, you should still read it, so that you know what questions to ask and you have some starting checklists to work with.
And if people have questions to ask you, and want to know more about trust insights, where should they go?
Go to trustinsights.ai. That’s where you can find us and ask questions, and even feel free to join our free slack group if you want to have conversations with us.
What are you blogging about there?
Let’s see. Right now, we’re in the middle of a blog series on the first things that will go wrong with all of your AI projects and stuff. And I have a personal blog over at christopherspenn.com, where I do daily videos answering people’s questions like – What’s your go to process for analyzing business data? So I try to answer interesting people’s questions all the time.
Check out the full interview, with bonus content, here: