For the past few years, we’ve seen Big Data after Big Data company getting funded. For a while it was my job to stay on top of all the new technologies, but I still do it because I’m interested in the space and where things are going. Unfortunately, I am almost always underwhelmed by what I am seeing getting funded, as there is significant amount of “me too” out there. It seemed like if you had “Big Data” in your pitch, and your founders are coming from Google, Facebook or a few other companies, you could probably raise a few million. Unfortunately, I believe a lot of these companies lack true enterprise experience, and end up with solutions chasing a problem. So what’s going on? Well, I can’t answer the investment rationale behind what has happened (even though I may have a few theories), so instead, I am going to try and categorize the big data market into a few high level segments, and talk about where I see the over-funded vs under-funded (i.e. opportunity!) areas.
Database / Platform
Between NoSQL, NewSQL, Hadoop + affiliated technologies, etc, there are a plethora of products in this category (10gen/MongoDB, DataStax/Cassandra, Couchbase, MarkLogic, Redis, Cloudera & Hortonworks for Hadoop, Mapr, Neo4J, Basho/Riak, VoltDB, etc etc). And this is a very small list of a few of the more popular ones – there are hundreds more – and that’s without even including the MPP or In-Memory crowd either. It is easily the most over-funded category of “big data” (IMHO)… although the good news is that the rate of new investment here appears to have slowed finally.
I remember when these technologies first came out, I’d often try them out or even just try to understand under what scenarios each one was better suited for. There were lots of discussions about the CAP theorem then, and how each new product tied back to that. You’d see lots of blog posts/questions/etc on NoSQL A vs. NoSQL B, or “why I moved from NoSQL A to NoSQL B”, etc. But after a while, I would (and I think most people would) ignore a lot of the new products coming out because the amount of differentiation between each new company/product was asymptoting towards zero.
The market is tired with these products. Most people have had their fill with playing with new technologies, and want to get back to the business of actually building their own product. As a result, my personal belief is that the success of these companies will now be driven more by network effects than by the value of their underlying technology. Yes, developers want to use the best tools available to them, but the reality is that if there is no community, no information, no tools, etc behind something newly built, you probably aren’t going to invest your time and skills into it unless it is massively better (or cheaper) than what’s already out there. If it’s incrementally better, most will likely go with the safer “mature” options that already have traction. So I would see the established companies continuing to get bigger and bigger now. Of course, there will always be exceptions and new technologies can still break through, but the market today is not the same one as ~2008/2009.
Overall: I’d personally want to see some extremely compelling reason to fund another product in this space right now.
Querying / Analytics
Given the proliferation of database companies, we are beginning to see investment now go into the next layers up the stack. That is, since there are so many databases out there, there was a dearth of standardization in querying these databases, so SQL is getting much love again. Further, since many of these databases were designed for transactional processing and not analytics, we have also seen investment go towards the problem of analyzing “big” data with real-time performance instead of batch analysis. Some of the technologies I put into this category are Impala, Stinger, Spark/Shark, Apache Drill, Platfora (although they span the next category too), Google Big Query, and Hadapt. The query language and analytical processing problem is a hard and necessary one, and a huge market too, but I’m skeptical about it making a good investment for a startup right now. Let me explain why.
The combination of really powerful computing power at lower and lower costs, means a system that can do transactions and analytics, batch analysis and real-time, etc all in one technology stack, is the direction the market is swimming in. And all these vendors who own the underlying platforms are well aware of the need for querying, analytics and real-time performance, so most of them are furiously working towards building solutions. I believe they are the ones who will ultimately dominate, and a startup that tries to do just analytics (as a standalone platform) will have to fight upstream against this market direction.
Anyone who has been around in the analytics world should also know the reality of how hard it is to actually manage data by moving it between systems (integration, quality, transforms, etc), which is why getting 1 platform to do more and more in the same stack is an attractive value prop. At this point in time, Hadoop in particular, has created enough publicity (and value) that you (as NewCo) would have to fight really hard to get mind share with most managers/IT depts (i.e. you’d have to put a lot of marketing dollars to work now). I’d love to be proven wrong on this, as I think this market has huge potential while being under-served by startups (there are tons of existing companies with expensive solutions here, but few startups). But, I don’t know that I personally would want to bet on a company in this space unless they had some seriously interesting technology (which I haven’t seen yet).
You could argue: but wait, a startup could still build the best solution for an existing DB/platform, instead of rolling a completely new product. Sure they could, but then I would (especially if I was an investor) be worried about building a company that was completely dependent on somebody else’s platform (a la Zynga on Facebook). It might work for you for a while, but the platform may release something that changes the game on you. Perhaps in this category, you could argue this might be the case with Platfora building their product exclusively for Hadoop, and then Impala and Stinger coming out. To be fair to Platfora here, this is their message on how they see themselves relative to Impala, and nobody “owns” the Hadoop platform so you have a bit more control than a truly private platform (or at least you have a sense of control anyway).
Overall: I think this market has huge potential, and there aren’t a ton of startups here, but I’m not sure it would be a good area to invest in either. This is because (1) the incumbents are all working towards solutions, (2) it’s not clear which platform to bet on, (3) tying your company to 1 platform is very risky anyway, and (4) people are unlikely to buy standalone if the integrated solution from the DB/platform vendor is already good enough.
In Part II of this blog, I cover Business Intelligence. Part III of this blog will discuss Analytic Applications.