Category Archives: big data

Are VC’s just stabbing around in the Big Data dark? – Part II

This post is a continuation from Part I of this blog, where I covered “Database/Platform” and “Querying / Analytics”.

Business Intelligence

look at meOnce we have some kind of standardized SQL-like languages for doing analysis of the data stored in the “big data” databases/platforms (see Part I of this blog), I think a lot of investors are concluding that people will want nice technologies to visualize and interact with this data, ergo, BI companies also make a good investment. I partially agree with this. The specific issue I have is that most of the BI companies getting funded today look exactly the same as all the old BI companies, and they all try and make the exact same set of claims (that’s a topic of another post, but in short, everyone says they are special because they are “beautiful”, “fast”, “easy”, “no IT”, “cloud”, etc). Nobody is able to stand out from a messaging point of view since everyone says the same thing, and nobody stands our from a product point of view as there are few real differentiators between any of the products. As a result, I’m not sure why anyone of those companies are likely going to be able to disrupt the existing BI firms. 

I do believe however, that there is huge potential in the BI sector as a result. Qliktech and Tableau were the last vendors who really innovated, and their market cap in the $Billions shows what can happen when you do something special here (I’m always on the look out for people who are doing innovative work, so please do ping me if you feel like I’m overlooking anyone). But, I don’t think it’s fair for me to say no one is innovating, without offering up my own solution for where the opportunity in BI lies, so I’ll try and cover here quickly where I see the future of BI. Firstly though, remember:

“An innovation that is disruptive allows a whole new population of consumers access to a product or service that was historically only accessible to consumers with a lot of money or a lot of skill.” – Clayton Christensen, The Innovator’s Dilemma

Despite the marketing hype, BI today is still limited to those with skill, and as a result, it is usually rolled out to a small subset (<20%) of a total company. So I personally think the big opportunity lies in making BI as simple to use as Google. Specifically, it should be a combination of Google Search PLUS Google Knowledge Graph. So I don’t just mean a search interface to BI (a NLP search interface for a BI tool sounds like a great concept, but is actually terrible when you see it implemented).

GKG forecast

So what I mean is a system that understands the actual meaning of the data (i.e. it understands the difference between a customer and a product, and what operations might be relevant to each), as well as the context of who you are and what you typed in to a search box. This should work the same way that GKG understands when I type in “forecast 94104”, I want to see the weather for San Francisco right there in my search results… not a link to a webpages that contain the keywords “forecast” and “94104”. This type of search has little to do with keywords, and a lot to do with semantics, and BI needs to have the same level of understanding going forward.

A BI system should also be able to leverage public data sets as easily as it can private data. There is a huge amount of data available online these days, and an intelligent system should be able to automatically leverage it without someone needing to identify the data, “ETL” it into the system, and model how it relates to your existing data sets. Data, whether private or public, should be automatically connected together where it makes sense to do so. Think of a graph of entities rather than tables/views and star schemas.

Note, I said the data would be automatically connected together, which means no modeling/joins/etc. No modeling and ETL work sounds like a recipe for a system with poor quality right? Sort of. I am skipping details for brevity here, but I am also suggesting that business analytics has pursued the vision of a “single version of the truth”  for a long time, but it’s an unobtainable goal and we need to think differently going forward. Chris Anderson in The Long Tail describes this best:

“These probabilistic systems [Google, Wikipedia, etc] aren’t perfect, but they are statistically optimized to excel over time and large numbers. They’re designed to scale, and to improve with size. And a little slop at the microscale is the price of such efficiency at the macroscale”

Business Analytics has always been the Britannica model: highly curated to be as accurate as possible, but exceptionally slow to react to changes in the business and always falling further and further behind. We need to start looking to speed over precision for analytics (which sounds scary to some – e.g. what about my financial reports?). The way we achieve that speed and flexibility is to build highly adaptive systems that are intelligent and can do much heavy lifting behind the scenes (hint: through Semantics + Machine Learning, instead of IT users configuring the system). For the first time ever, we have technology available that would be capable of actually pulling all those things together into a single product, which I believe would completely disrupt the BI market again. I’m sure people have other ideas, but that’s where I see the future of BI.

Overall: I think investments into the BI market can make a lot of sense, but only if there is something truly innovative there. This doesn’t mean it has the be the same vision that I have for where BI should go next, but if there’s no more to it than another “we make BI beautiful and simple”, then I’d skip it.

In Part III of this blog, I’ll cover the final category: Analytic Applications.

Are VC’s just stabbing around in the Big Data dark? – Part I

Big Data

Image Source: The Internet (i.e. I don’t remember where I found it)

For the past few years, we’ve seen Big Data after Big Data company getting funded. For a while it was my job to stay on top of all the new technologies, but I still do it because I’m interested in the space and where things are going. Unfortunately, I am almost always underwhelmed by what I am seeing getting funded, as there is significant amount of “me too” out there. It seemed like if you had “Big Data” in your pitch, and your founders are coming from Google, Facebook or a few other companies, you could probably raise a few million. Unfortunately, I believe a lot of these companies lack true enterprise experience, and end up with solutions chasing a problem. So what’s going on? Well, I can’t answer the investment rationale behind what has happened (even though I may have a few theories), so instead, I am going to try and categorize the big data market into a few high level segments, and talk about where I see the over-funded vs under-funded (i.e. opportunity!) areas.

Database / Platform

All aboard the big data bandwagonBetween NoSQL, NewSQL, Hadoop + affiliated technologies, etc, there are a plethora of products in this category (10gen/MongoDB, DataStax/Cassandra, Couchbase, MarkLogic, Redis, Cloudera & Hortonworks for Hadoop, Mapr, Neo4J, Basho/Riak, VoltDB, etc etc). And this is a very small list of a few of the more popular ones – there are hundreds more – and that’s without even including the MPP or In-Memory crowd either. It is easily the most over-funded category of “big data” (IMHO)… although the good news is that the rate of new investment here appears to have slowed finally.

I remember when these technologies first came out, I’d often try them out or even just try to understand under what scenarios each one was better suited for. There were lots of discussions about the CAP theorem then, and how each new product tied back to that. You’d see lots of blog posts/questions/etc on NoSQL A vs. NoSQL B, or “why I moved from NoSQL A to NoSQL B”, etc. But after a while, I would (and I think most people would) ignore a lot of the new products coming out because the amount of differentiation between each new company/product was asymptoting towards zero.

The market is tired with these products. Most people have had their fill with playing with new technologies, and want to get back to the business of actually building their own product. As a result, my personal belief is that the success of these companies will now be driven more by network effects than by the value of their underlying technology. Yes, developers want to use the best tools available to them, but the reality is that if there is no community, no information, no tools, etc behind something newly built, you probably aren’t going to invest your time and skills into it unless it is massively better (or cheaper) than what’s already out there. If it’s incrementally better, most will likely go with the safer “mature” options that already have traction. So I would see the established companies continuing to get bigger and bigger now. Of course, there will always be exceptions and new technologies can still break through, but the market today is not the same one as ~2008/2009.

Overall: I’d personally want to see some extremely compelling reason to fund another product in this space right now.

Querying / Analytics

skeptical hippoGiven the proliferation of database companies, we are beginning to see investment now go into the next layers up the stack. That is, since there are so many databases out there, there was a dearth of standardization in querying these databases, so SQL is getting much love again. Further, since many of these databases were designed for transactional processing and not analytics, we have also seen investment go towards the problem of analyzing “big” data with real-time performance instead of batch analysis. Some of the technologies I put into this category are Impala, Stinger, Spark/Shark, Apache Drill, Platfora (although they span the next category too), Google Big Query, and Hadapt. The query language and analytical processing problem is a hard and necessary one, and a huge market too, but I’m skeptical about it making a good investment for a startup right now. Let me explain why.

The combination of really powerful computing power at lower and lower costs, means a system that can do transactions and analytics, batch analysis and real-time, etc all in one technology stack, is the direction the market is swimming in. And all these vendors who own the underlying platforms are well aware of the need for querying, analytics and real-time performance, so most of them are furiously working towards building solutions. I believe they are the ones who will ultimately dominate, and a startup that tries to do just analytics (as a standalone platform) will have to fight upstream against this market direction.

Anyone who has been around in the analytics world should also know the reality of how hard it is to actually manage data by moving it between systems (integration, quality, transforms, etc), which is why getting 1 platform to do more and more in the same stack is an attractive value prop. At this point in time, Hadoop in particular, has created enough publicity (and value) that you (as NewCo) would have to fight really hard to get mind share with most managers/IT depts (i.e. you’d have to put a lot of marketing dollars to work now). I’d love to be proven wrong on this, as I think this market has huge potential while being under-served by startups (there are tons of existing companies with expensive solutions here, but few startups). But, I don’t know that I personally would want to bet on a company in this space unless they had some seriously interesting technology (which I haven’t seen yet).

You could argue: but wait, a startup could still build the best solution for an existing DB/platform, instead of rolling a completely new product. Sure they could, but then I would (especially if I was an investor) be worried about building a company that was completely dependent on somebody else’s platform (a la Zynga on Facebook). It might work for you for a while, but the platform may release something that changes the game on you. Perhaps in this category, you could argue this might be the case with Platfora building their product exclusively for Hadoop, and then Impala and Stinger coming out. To be fair to Platfora here, this is their message on how they see themselves relative to Impala, and nobody “owns” the Hadoop platform so you have a bit more control than a truly private platform (or at least you have a sense of control anyway).

Overall: I think this market has huge potential, and there aren’t a ton of startups here, but I’m not sure it would be a good area to invest in either. This is because (1) the incumbents are all working towards solutions, (2) it’s not clear which platform to bet on, (3) tying your company to 1 platform is very risky anyway, and (4) people are unlikely to buy standalone if the integrated solution from the DB/platform vendor is already good enough.

In Part II of this blog, I cover Business Intelligence. Part III of this blog will discuss Analytic Applications.