Strata Data Conference is where cutting-edge science and new business fundamentals intersect—and merge. It's a deep dive into emerging techniques and technologies. You'll dissect case studies, develop new skills through in-depth tutorials, share emerging best practices in data science, and imagine the future.
Formerly known as Strata + Hadoop World, the conference was created in 2012, when O'Reilly and Cloudera brought together their two successful big data conferences.
Program Chairs Doug Cutting (Chief Architect at Cloudera and founder of Apache Hadoop), Ben Lorica (Chief Data Scientist, O'Reilly), and entrepreneur Alistair Croll have created a program that covers the entire range of big data tools and technologies. Strata Data Conference covers current hot topics like AI and machine learning, and focuses on how to implement data strategies.
The data industry is growing fast, and Strata Data Conference has grown right along with it. We've added new sessions and tracks to reflect challenges that have emerged in the data field—including security, ubiquitous computing, collaboration, reproducibility, new interfaces, emerging architecture, building data teams, machine data—and much more.
Strata is the largest data conference series in the world, yet it’s kept the informal, collegial spirit that makes it one of the best places to connect and collaborate.
Experience Strata Data Conference
Inspiring keynotes and practical, information-rich sessions, tutorials, and training courses exploring the latest advances, case studies, and best practices
Networking opportunities with thousands of other business leaders, data professionals, designers, and developers
A vibrant "hallway track" for attendees, speakers, journalists, and vendors to debate and discuss important issues
Fun evening events, receptions, and more, giving you more face time with
attendees and speakers
A. Mishra, Solution Architect at Tata Consultancy Services
What people are saying
If you can only attend one event and want to maintain your edge in big data & machine learning, choose this one.
Val U., Data Engineer at eMAG IT Research
By far the best conference for everything data...from business to technical.
V. Verma, Computer Scientist at Adobe, affiliation
I’ve got more insights on big data, ML, data science, and AI by attending Strata. This amount of info can’t be acquired within 2-4 days in other ways.
Krishna Kamisetty, Sr. Software Engineer at Lockheed Martin
An excellent opportunity to learn about the emerging technologies, AI, big data, and analytical analysis. Thanks for providing so much learning in one place.
Conference Chairs
Ben Lorica is the chief data scientist at O'Reilly Media, Inc. He has applied business intelligence, data mining, machine learning and statistical analysis in a variety of settings including direct marketing, consumer and market research, targeted advertising, text mining, and financial engineering. His background includes stints with an investment management company, internet startups, and financial services.
Ben Lorica
Lean Analytics: Using Data to Build a Better, Startup
Zadok Krouz
09\05\2019
Data-driven product development doesn’t have to be scary and it doesn’t have to be all-in. There are a lot of ways that using data to build and grow our product is going to add to the bottom and top line, all without having to hire a data science team.
If an organization isn’t using data to develop and grow its products, then its days are probably numbered.
But while everyone talks about data, rarely does anyone explain what to do with all of it. So companies wind up skipping a data-driven approach because there isn’t any time. Or they use data incorrectly and get a false sense of potential success. Or they go totally data-driven and leave the human element out.
Let’s fix that, because every new product, every new feature, every growth experiment, should be using data to make decisions.
Data Science Is Not Rocket Science
The science of data is made out to be far more intimidating than it really is. It’s also overhyped as a kind of nerd quest that will make selling tons of product as simple as punching a magic algorithm into a computer.
The truth is it’s easier than ever to collect, analyze, and react to data — from sales data to performance data to marketing data.
Get Started: It doesn’t matter where you get your data or how you track it
I’ve been building products with data since 1999 — before Google Analytics, before Hubspot, before anyone had an API. I’ve gotten it pretty much down to a science at this point, and I can get a decent data analysis out of a rock and some string like MacGyver.
Right now, as you’re reading this, I’m using data to build two completely different product lines. One uses a lot of data, the other uses just a little. I’m going to use each of these as examples in the hopes that you can find your happy data place somewhere in the middle.
In the first scenario, let’s call it the “Lite” scenario, all I’ve got is a collection of unrelated web pages that show me basic usage stats. But one of those stats is revenue, and when I have revenue, I have the one single truth. I can trace everything else back to dollars.
I’ve got no APIs, so what I have to do is check in regularly to get daily, weekly, and monthly stats, plus any reference points I want to track for any experiments I want to run. All of this goes into one massive spreadsheet with a dozen tabs.
It’s not automated. At all. But once I have the structure down, all it takes a few minutes to maintain it. Plus it’s fun. I geek on this shit.
In the second scenario, the “Heavy” scenario, I’ve got a software platform that cost millions of dollars to build and is constantly being developed, upgraded, and maintained by a team of excellent software engineers. Everything is in the cloud, it’s totally flexible, APIs everywhere, and it even has a replicated read-only database that I can hit with SQL in real time and not screw everything up.
In the Heavy scenario, all I need to do is fire up something like TablePlus or SQL Server Management Console and run stored SQL statements to generate reports.
Data Day!
In both scenarios, I have a weekly Data Day. That’s when I spend an hour or two aggregating all my data and running analysis on every bit of it — which I’ll describe over the rest of the post.
In the Lite scenario, I’m logging into various websites and copying the most recent numbers into the spreadsheet. I don’t get revenue until the end of the month, so I’m doing a lot of extrapolating as to what that dollar figure will be so I can grade performance continually in “real time.” Then I have a monthly Revenue Data Day when I get the revenue numbers.
In the Heavy scenario, I’m getting revenue in real time, so I get to spend much more time on analysis.
Once I’ve collected all my data, my analysis serves to:
Catch errors — I’m looking for spikes in the data that suggest glitches in the software, the process, or some external market factor I don’t know about yet.
Catch opportunities — I’m looking for patterns that suggest my customers are doing something new and different.
Keep score — I’m comparing incoming data to my previously defined expectations for new products, new features, and any growth experiments that are currently underway.
Make plans— I’m rewriting goals, dreaming up new ideas for the product, and considering new experiments.
Here’s what I’m doing during Data Day analysis:
Sales Data: Getting more revenue
The first set of data I look at is sales. I total the entire revenue number first to get a sense if this was a good week or a bad week or an inconclusive week. This will color the rest of my analysis, dictating if I’m looking for problems or opportunities or both. If any number is way off, I’ll skip down to the end and do Forensics, then come back.
The next thing I want to understand is where those sales came from, so I trace the revenue back as far as I can by using the data to answer these questions:
How many customers were in a position to buy, but ultimately decided not to?
How many customers were in a position to buy, but didn’t get to the offering?
How many customers came into the “store,” but never got into a position to buy?
How many customers were made aware of the store, but never entered?
Then I take all of the breakdown and use it to confirm or adjust my goals for the month, the quarter, and the year.
I get the results of all this analysis to discuss with the executive team. If we need initiatives to adjust our focus, we use this analysis as our guide.
Performance Data: Increasing revenue and margin at the same time
The next step is to compare productivity against revenue to get performance. I need to confirm where we’re strong, where we’re weak, and that we’re burning efficiently as we grow.
These are like long-term growth experiments, except I’m looking at things we’re already doing and customers we already have. I’m looking for patterns in their usage that give me hints as to which features and which customers we should be focusing on.
This analysis sets up a lot of the Growth Experiments I’ll be going over next.
For example: At Spiffy, my performance data analysis led me to discover that a good number of our customers were declining a suggested and needed upgrade during their service, but then they’d add the same upgrade the next time they booked their service. We were missing some of those upgrades, right? The ones who forgot about it. So that led to an experiment to prompt those customers to add the upgrade when they book their next service so they won’t forget.
Growth Experiments: Expanding market share by building a better product
The experiment I mentioned just now is one of the most basic I can run. The likelihood is very high that the experiment will succeed, but we’ll test it first anyway because you never know what you’ll find out.
On the other end of the spectrum, I’m also in the middle of testing a new product offering with a large corporate partner that has much broader implications and is a lot trickier. More reward, more risk, same analysis.
I can use the same set of sales and usage data for most of the growth experiments I want to run.
To run these experiments, I narrow down a customer segment, in this case by location. The size of the sample should be small enough to not be painful if we mess up, but large enough to matter. In fact, I’ve already had to add locations because the sample size wasn’t statistically significant. Then I run the experiment.
I usually check in on growth experiments daily or even a couple times a day in the beginning, to flesh out facts like my sample size is too small. Then I try to get to success or failure as quickly as possible.
Marketing Data: Creating new customers using the same value proposition
While all the previous data analysis is about increasing the size of our market, marketing data analysis is about increasing the size of our megaphone to the market.
For the Heavy scenario, we’ve got Hubspot and MailChimp and Google Analytics and all the social media accounts and everything is integrated and automated. In the Lite scenario, I’m just grabbing metrics from the ad server and from Google Analytics and throwing them in one of the spreadsheets. Google’s reports leave a lot to be desired, so I do it myself.
Marketing data analysis is actually just another review of the sales and performance data, but this time I go all the way back to first interaction, or where the customer first became aware of the product offering. I’m looking at impressions, email opens, click-throughs, and conversions, all compared to ad spend. I’m adding the cost of acquiring the customer to the cost of serving the customer.
The main difference with this analysis is that if marketing data is telling me to change something, that usually means changing either the marketing channel or the messaging or both. It’s only in rare cases that marketing data analysis leads to a product change, usually when my customers are confirming a hunch I already had from doing all the prior analysis.
With marketing data, I’m also running experiments, but these are growth hacking experiments, not growth experiments. They’re A/B testing offers, discounts, and messaging. They’re varying the channels, the audience, and the spend. But it all eventually ties back to sales and revenue.
Forensics: Chasing down problems and issues
Forensics is the special projects part of Data Day. When I start to see patterns I like or I don’t like, especially when I don’t like, I pull the corresponding data and comb through it to figure out what to do.
This usually means drilling down to and eventually combing through individual records— maybe single transactions, or customers, or product specs, or even code.
For example: In the Heavy scenario, we noticed anecdotally that chargebacks were becoming an issue. We didn’t have any idea where the spike came from, so I spent a couple hours at the end of a data day pulling transaction data that correlated to chargeback data and drilling down into the individual transactions themselves.
Turns out it was a combination of a single product, a single feature, and a couple customers exploiting a loophole. Rather than spend thousands of dollars on sophisticated software to sniff out bad actors with machine learning, we just closed the loophole. Problem solved.
Thanks, Data Day!
When I think about saving thousands of dollars with a few hours of data analysis, it almost overshadows all the additional revenue these product changes generate without the days and weeks of guesswork trying to figure out how to grow.
Use Data to Build Better Schools
Andreas Schleicher is the head of the Program for International Student Assessment (PISA) at the Organization for Economic Co-operation and Development (OECD). His PISA test is used in nearly 70 nations and given to hundreds of thousands of 15-year-olds around the world. A vocal advocate for educational policy changes, Schleicher has helped to translate hard data into usable guidance for those shaping educational policy.
It’s all about what’s measureable — and that applies to the success of school systems as well. In this TED Talk: Use Data to Build Better Schools External link, Andreas Schleicher describes the PISA test, a global measurement system that compares student performance around the world and then uses data science techniques to extract valuable data and send it to schools, which helps to improve the systems that need it most. PISA shows how international comparisons have globalized the field of education. In a global economy, success is no longer measured by national improvement, but by international ranking. The weakness of traditional evaluation methods is that time in school or types of degrees are not a true measurement of how well students do with the education they receive, which is evidenced by the high numbers of unemployed graduates.
PISA changes this dynamic by directly measuring the knowledge and skills that people possess through an applied knowledge approach, rather than a reproduction of learned information. It tests the ability to extrapolate what has been learned, and apply it to situations that students have never encountered to solve problems, which is similar to real life after graduation. The value of PISA is in its ability to tell countries which countries are succeeding on the global landscape. This can lead to collaboration, mentoring, and duplicative efforts. The PISA outcomes show that “data can be more powerful than administrative control or financial subsidy” in creating successful school systems throughout the world.
to Build a Better CX
At a recent event in New York City, ‘Back to the Future with Digital, Omnichannel Experiences’ hosted by digital transformation consultant Marlabs Inc., CX experts discussed how omnichannel organizations should leverage data to transform the way they reach their customers.
Marketers are flush with data on consumer behavior - from their search activity on Google to the videos they watch on YouTube to their daily travel patterns tracked by various GPS apps.
In fact, some are even privy to kitchen table conversations overheard by smart home appliances like Amazon’s Alexa, where a brand mentioned in passing appears in banner ads the next time the name-dropper opens their Internet browser. Despite this repository of empirical customer insights, many organizations hesitate to act upon the data. Why? The simple answer is bias.
Either the data doesn’t square with what the brand has already assumed about its customers, or the inertia to manifest those insights is overpowering. But by ignoring what the data reveals about their customers, businesses miss opportunities to deliver market-proven, customer-centric experiences.
Sadly, Barnes & Noble may have overestimated Amazon’s threat to its market share. In fact, when Barnes & Noble first launched its e-commerce service in the mid-1990s, Amazon investors believed Amazon was doomed to extinction. Amidst the uproar, founder Jeff Bezos convened all 135 of his employees and made his famous refrain: “Don’t focus on the competitor; focus on the customer.” Just a few years later, the tables turned and Barnes & Noble was (and still is) fighting to recapture market share from Amazon.
Some businesses use data to buttress entrenched biases and justify the status quo, said Julie Lyle, an entrepreneur, investor and former CMO who’s consulted for Barnes & Noble, Prudential and Walmart. Lyle likens it to how a drunk person uses a lamppost. “Are you leaning up against [the data] and is it bolstering your argument for what you want to do anyway?” she said. “Or is it truly guiding you forward and helping light and illuminate the decision-making process?”
Marketers have to be opinionated, creative and knowledgeable about their customers, so they are especially prone to confirmation bias. Others are so fixated on pleasing the median customer or felling a competitor that they follow the data blindly by trying to introduce a new product without the requisite inventory or supply chain - or, they push for inauthentic rebrands that alienate their core customer base.
“I see marketers fall into that trap where they want so much to satisfy that customer need,” said Lyle. “We all want to be where they want us to be but we just can’t.”
Like many retailers, Barnes & Noble made the fatal mistake of going above its paygrade and competing with Amazon. By acquiring the inventory and real estate of a big-box retailer, the brand went from being the trusted corner bookstore to a superstore staffed by minimum-wage workers rather than knowledgeable booksellers earning $20-30 per hour. Per-unit profit margins on books are small given their high shipping costs, so a big-box business model is risky for a bookseller.
By the time Lyle was hired to revitalize the brand, there was no turning back. “They couldn’t go back to being the corner bookstore competitor because by then they had the mass infrastructure and they had to continue to support it,” said Lyle.
They started by using analytics to understand who purchased the loans and what they used the money for. In a traditionally male-dominated market where men were the primary breadwinners who made financial decisions for the household, the company was surprised to discover that its core customers were in fact middle-class women.
Let the data surprise you
Data can reveal unexpected insights into who your customers are, how they use your product, and how they interact with you at numerous touch points throughout the customer journey. One Marlabs. Inc client, a financial services company based in a developing country, wanted to boost profits on its loans and mortgages.
Armed with this insight, the company changed tack and pulled its banner ads from financial services websites and instead began advertising on the travel, tourism and events websites used by their newly discovered target market. Within two years, the company racked up a 70 percent year over year boost in revenue because it understood who its customers were, why they were buying, and how to target them.
“Women were out in the workforce earning their own money and they wanted to reinvest it in their family for entertainment, holidays and vacations,” said Chris Clegg, Digital CX Lead of Marlabs. “A lot of women were taking out these loans to take their families on trips.”
“This is really a shift in cognitive view,” said Alan Hart, podcast host and public speaker at Marketing Today. “And I think it’s interesting because with a lot of analytical tools that have come on board, it’s incumbent on the person doing the analysis to set the search parameters.” Using AI to iterate and reiterate processes using data requires a mindset shift among marketers who use the technology to apply the tools to search for solutions they never even thought of.
A few years ago, Marlabs had a client that was drilling for oil offshore. When an oil rig goes offline, the company loses a quarter of a million dollars in downtime, so the oil company used predictive analytics AI to help its engineers forecast breakdowns and diagnose what caused it. The AI discovered that the engineers were doing preemptive maintenance too often in an effort to suppress the costs of unscheduled maintenance.
“The AI was able to predict when things were going to break and it found that there were correlations that the engineers weren’t aware of,” said Clegg. “So by finding those correlations, they improved the reliability of the system, reduced the downtime for maintenance and their payback was hundreds of millions of dollars.”
The future of omnichannel and customer data
Research shows that customers are willing to share their data if companies are transparent about what they do with it and actually use the data to better meet customer needs. “The true commerce of e-commerce is convenience,” said Lyle. “That’s what you’re selling online.”
Customers don’t think of targeted advertising, predictive analytics and multimodal contact centers as “omnichannel,” - but they have been primed to expect the seamlessness of omnichannel, and to expect the customer experience and value delivery system to be the same whether they’re buying online or at a brick-and-mortar.
AI-powered analytics tools help businesses not only visualize the customer journey but convert those data points into recommendations for the business to improve its customer experience. In fact, the ability for an AI system to say, Here’s what the data says and here’s what to do about it makes it harder for businesses to justify not changing what they do just because they are biased.
Armed with this insight, the company changed tack and pulled its banner ads from financial services websites and instead began advertising on the travel, tourism and events websites used by their newly discovered target market. Within two years, the company racked up a 70 percent year over year boost in revenue because it understood who its customers were, why they were buying, and how to target them.
“I think the takeaway from us is don’t go in with a pre-built assumption around what users are doing,” said Clegg. “They had their minds open about what the data could reveal about their customers [...] and by responding to that data in a meaningful way they were able to become more effective at marketing.”
When Lyle was contracted to consult for Prudential’s Asia operations, also in a male-dominated market, data revealed that the primary purchaser of its insurance policies, was, in fact, 45-46 year old men, but the person driving that purchasing decision was his wife. “When the oldest child hit 12 years of age, that was when the wife began the panic of, I have to buy a car, I have to buy a college fund, what if something happens to the primary breadwinner?” said Lyle. “And she began the nag, and the nag drove the head of the household to sign the insurance policy.”
Prudential shifted 40 percent of its marketing budget towards a series of ten, 30-second Prudential-branded music videos on financial literacy targeted at children. The videos were introduced in schools as part of a financial literacy curriculum and aired on TV, and were purposely produced in English for a non English-speaking market. “We knew that the mothers would sit and watch children’s TV with their kids to learn English as a second language,” explained Lyle. “It was a common shared family experience.”
Prudential Asia went from representing just 23 percent of global revenues to 56 percent in two years, and was ranked number 69 in the top 2000 brands in Asia from not ranking at all. “This was at the height of the global financial crisis. And we used AI to do it,” said Lyle. “We used AI to find those true nuggets.”