Hearst's VP of Data on Connecting the Data Dots
Rick McFarland likes to compare his work at Hearst Corporation's data services to the creation of the Interstate Highway System. Before the Interstate System, it was a bumpy ride from coast to coast, fraught with dead-ends, detours, and sluggish speeds. Improving the nation's roads had been a long-standing ideal, but it really wasn't until President Eisenhower championed the military and economic value of a robust interstate highway that funding was put in place and action was taken.
McFarland views his mission as Hearst's vice president of data services similarly. "We're trying to take a very complex road system and add in the infrastructure for interstate highways so we can connect Los Angeles to New York faster and more efficiently. You need to create the platforms for people to start to travel on." To create that platform, McFarland is primarily using cloud-based Amazon Web Services (AWS), allowing him to quickly build his highway system. "That's the infrastructure but you also have to create standards: We all have to agree to use cars and follow rules together. Those are the standards and governances."
Since starting with Hearst in 2013, McFarland has worked to create a central repository of data, develop standards, put the proper tools in place to work with the data, and facilitate collaboration between data services and publishing teams and among business units. "Right now, each individual business unit is housing data their own way, storing it in their own databases. The taxonomy that one magazine uses for clarification might be a totally different taxonomy from another magazine. So it makes it very hard for you to create connections across the different business units."
Previous to Hearst, McFarland spent five years at Amazon, where he managed analytics teams for the global marketing and Kindle departments. There, McFarland recognized the value of connecting data across an organization in order to build a large-scale data resource that can be utilized in innovative ways. McFarland was drawn to Hearst by the rich opportunities he sees in the type of data publishers have. "When I started looking at Hearst I noticed they had a lot of really fascinating data sources, and especially 'content data,' which is something I was very interested in at Amazon. I saw this huge plethora of data in very disparate sources, really spread out across the organization in small pockets. I thought to myself, ‘What a powerful data resource this could be if Hearst could bring it all together and leverage it at scale.
McFarland intends for Hearst’s data highways to create greater efficiencies and enable the sprawling organization to better analyze its content and customer data. Hearst CTO Phil Wiser has been a proponent for building up the company's data capability in a way that benefits and enables all the business units, says McFarland, and that the urgency around data has CEO-level endorsement.
McFarland emphasizes that many of his plans are still in the early stages and what he lays out in this Q&A is the leading edge of his work and Hearst's overall data strategies. Here he shares his thoughts on healthy data governance, how data can help us "find the customer," and some of the wisdom he gleaned from Jeff Bezos.
What opportunities does data present for the publishing industry?
I think data is a means to an end. What has happened with the publishing industry is that in the 80s and 90s we actually knew our customers very well. There was one channel they used and we knew our subscribers. And with the advent of the computer, that created another channel and people started migrating from the physical paper to online and we "lost the customer" for a little while.
We started having to get data from two different sources and try to collect that on the customer and find them again. Then in 2007 the iPhone came out and the device revolution happened. We had just found the customer online and we suddenly lost them again with all of the mobile products because that data was totally different from what we were used to.
It's what I call "the big data explosion." With every new device and every time there's a new product out there, there's an explosion of data and there's a loss of the customer. We have to find the customer again.
Data is going to continue to grow and we're going to continue to lose the customer, but we have to find ways to find them again and connect all the data dots together. The companies that do that are the ones that are going to win. The Googles, the Facebooks, the Apples-they're all connecting the data. They're connecting the dots and finding their customers again and finding ways to not lose them as the data gets bigger. Publishers need to do the same thing. The publishers that can see a customer across the plethora of devices and data are the ones that are going to come out on top. That's my philosophy: Find the customer.
How is data at Amazon different from data at Hearst?
Amazon has data on its customers, what they buy, data on recommendations, but it doesn't really have data on what's in the books or what's in the discs or anything like that, which is what publishers actually have. Publishers have subscriber data, but they also have all this other information on content and especially the history of that. That's the asset that publishers have that all other industries are kind of taking from publishers -- with tags, scraping publishers' website, reusing publishers' content -- which is really the value publishers bring: all the interesting articles you read everyday.
So that's what I mean when I say content data. We also have customer data and customer behavior data that's not really linked with the content. And wouldn't it be cool if you could actually link behavioral data with the content and say that "Rick" really likes to read these types of articles, give him more of those, and be able to connect that across television and newspapers?
Do you think publishing is experiencing a "data awakening" of sorts?
Amazon's a dotcom and has built itself up around data and really makes all of its decisions using data. I think publishing grew up in a different environment. I kind of view magazines as blogs -- they started off like a blog, where you have something you're passionate about and you write about it and you're guided by your instincts. And as these blogs mature and you start to formalize them and you start making advertising revenue, it becomes a business and you have to make decisions based on financial reasoning. But then we move into the new world where it's all digital and your business is now dependent on how you use and manipulate data.
There's a huge trend in the industry to start making business decisions using data. However, I think because publishing grew up in more of a creative environment, it has had a little bit more of a challenge moving towards that data-driven decision-making. That's not to say that some of the digital teams aren't. Troy Young has joined as the president of Hearst Magazines' digital media and his team is focused a lot more on data and computer science expertise. Publishers as a whole are starting to infuse data into their decisions.
What are you doing at Hearst to advance the data-driven mindset?
I find a really fun challenge in trying to take business owners that are creative and come at them as an engineer and help infuse data into the creative process. I like that challenge. I think the neat new challenge is data visualization and how to translate data so that people that aren't data-centric can start to use it. That's what's really exciting about the publishing industry: You've got to work a little harder from an analytical person's point-of-view to translate it.
That's been a challenge for me hiring as well. A lot of data science engineers that I work with are very good at analytics and data and computer science, but where they have a challenge is the translation to non-technical people. One of the hardest steps is that last step: taking an analysis and making it useful and not seem like it's so complicated. I like that challenge of bringing technology and data to publishers and editors.
Where do you see an opportunity to use data more on the creative side?
I think data can be used by editors to decide what to write about. When you write an article about Miley Cyrus and you get a lot of hits on it, you start to see there's a trend. Then you can break that article down and figure out what the core topics are behind it. You could end up writing about Miley Cyrus forever, but if you dug into that article and did some text processing and text analytics, you'd realized the core theme of all these articles was something deeper -- a latent concept, such as "young actress having fun." You can start getting ahead of that trend and make decisions using data for what you should write about.
I think a lot of these companies like Buzzfeed and the more nimble companies are using data to make decisions on what to write about. Trying to bring data from our vast data resources at Hearst to help editors in their decision-making is really my challenge.
What is a specific data tool you've implemented to enable this?
We've collected all of Hearst's historical content from newspapers, television, and magazines from 1995 onwards and put it into a central repository in a standardized format. Then we have a content mining tool that we expose internally to editors. So let's say you have two articles that you want to put on Cosmopolitan. We've looked at every article Hearst has written and we can model the behaviors of those articles. You can put the two articles into the tool and it'll come back with a score that says we think this article will create this much traffic on Facebook and this other article will create this much traffic, and you can pick the article with the higher model score.
It's all just model-based of course. I want to make sure the editor still has the ability to gut check it. But sometimes you boil it down to two choices and you want something to break the tie. So we built a model based on the universal corpus [the set of all Hearst articles] that helps predict how that article might perform in terms of Facbeook, or Twitter, or on a Cosmopolitan website.
Is it worth making historical content part of your data repository?
I'm a data guy, so I would say yes to anything that grows our first-party database and would make our content minable and searchable in the future. I think it would be really fascinating to be able to read an article and if you're interested in that topic you can kind of mine through the history.
That's what data services does for Hearst: I'm here for teams that want to grow their data and have better access to it, and I will help fund them, I'll help support them, and give them tools -- whatever they need to do that. If that helps grow our data resources in a centralized place, where we can all share and use it in a collaborative manner, I'm all for that.
What are you doing to hire and retrain for greater data intelligence?
At Amazon there was this concept of two-pizza-box teams, where Bezos said you want to prevent groupthink so you don't want a team that you can't feed with two pizzas. There was also the concept of efficiency at Amazon where if you hire the right people that can do lots of different things like a lean startup, you can really be much more agile.
Hearst, as a corporate group, is not really a big government model. It really promotes individual teams to do their own thing and help out where you can and not get in the way. I really like that collaborative support model. You can get a lot done with a small but nimble team. I try to hire individuals that are really capable across the board. They can get into the data but they can also be salesmen and get in front of clients and actually promote stuff. So that's kind of hard because you can typically find people that are one leg of that stool and then if you do find them you have to hire three more people.
The real goal here is to empower the individual teams. I'd rather the magazine team hire a ton of people that are top talent in data and I will pay for infrastructure that's shared across the organization and kind of be the spaceship that they fly in. I want the teams to hire their own people. So I give them the spaceship, they hire the people that work on the spaceship, and we all go to space together.
How do you overcome the data siloing that exists in many publishing companies?
I think that's maybe an easy answer. The reason it ends up the way it ends up is that each individual team builds their own data resources and they don't use a shared platform and a shared resource. I think the number one thing I would do is agree upon a standard throughout the company around your platform. For example, you could say we're all going to use AWS cloud or we'll only use this data resource and we'll store our data in a standard location using these governance rules. It really starts with creating a platform that the individual teams can share.
That's the first step, and then once people are sharing you have to have one person who's a bridge, who's role is not to work within each of the individual groups but to work across them. Kind of a matrix role. And their job is to make sure that as they work on projects that cross the groups, they're the steward for making sure everything fits together. My role at Hearst is to make sure that my team works with at least two business units, and our job is not really to do the business units' work but to make their work more efficient and find ways for the business units to collaborate.
You also need to have some executive sponsorship. You need to have one data steward for a company that has executive sponsorship that can bring it all together. Otherwise you end up with all these little microcosms of data and when you try to leverage your scale it's going to be really hard because you have to find ways to knit it together. You don't want to be ten years down the road and say, "You know what? We should all work together," but your data is so different that you can't.
How does data management play into publishers' core missions?
The goal of publishing is to feed content to a customer and keep that customer interested in what you're feeding them. That requires that you really know them and that you are maintaining a connection with them and making sure they are reading your content. If you have that, the advertising dollars will flow and all the other stuff will happen, but you really need to know your customer and keep an eye on them. Today that requires knowing a lot about data because the customer is everywhere.
I think a lot of publishers are chasing revenue. At Amazon, [Bezos] always said to us, "It's about the customer." That's the number one thing you think about, and all the rest will follow. You don't spend your day asking, "How do I make revenue?" A lot of people are sitting here trying to chase revenue they're losing on the web and that is following the hockey puck rather than getting in front of the puck.
You have to take a step back and ask, "How can I look at my customer in a more holistic way across all of the data resources?" The customer has been exploded into bits of data all across the internet, and you have to collect them back together. That's the real challenge publishers have and I try to get them thinking back to basics.
Related story: When Commerce and Content Converge