One common comment we hear from our customers is “My projects don’t involve search, at least not yet.” But in fact, many seemingly “non-search-based” applications that handle large volumes of data could benefit greatly by using a search engine.
Doug Cutting, creator of Hadoop and Lucene, said, “You know, people today think that search and big data are separate but in two or three years, everyone will wonder why we ever thought that.”
This is surely true.
In part one of this two-part blog series, I will explain why search and big data work so well together. Part two will focus on why search and big data thrive in the same environment.
You Need to Find Stuff
Any organization that has data will need to find things (rows, cells, files) inside the data, and you should load that data into a search engine and make it searchable.
Search engines can search over structured content (e.g. tables of data) better than relational databases, and they are so much faster and more flexible than other searching techniques. You can also search on portions of fields, such as addresses, names, patient notes, etc. about 1000x better and about 1,000,000x faster than using the “LIKE” keyword.
Really, for any sort of search for any sort of data, a search engine is best.
You Have Lots of Data
Search engines are scalable. They can handle tons of data quickly, easily, and fast. Search engines are more scalable than just about any other type of access systems.
Everything in Big Data today is all about “distributed” this, “sharded” that, i.e., dividing up jobs in pieces and spreading them over a cluster.
When I wrote my own search engine in 1993 (RetrievalWare), I also created a distributed, sharded search system. Yes, search engines have been sharded, distributed clusters of machines before the Cloud went mainstream and gave rise to the Big Data revolution.
Of course, the reasons why search engines are so scalable has to do with two things:
- the index structure, which is easily sharded
- the search mechanism, which is easily distributed to large clusters of machines
You Need Analytics
Most likely, every organization needs to analyze business intelligence and business analytics.
What I’m talking about (mostly) are simple things like histograms, counts, sums, averages, etc. The advantage of a search engine is that it can do this over hundreds of millions (billions!) of records in a second or two, which is 1000’s of times faster than any alternative.
Do you remember “online analytical processing” (OLAP)? Multi-dimensional hypercubes? There is no reason to use RDBMS technologies for any of that anymore. Search engines have made OLAP for business intelligence and business analytics, obsolete meaning search engines can execute searches for dashboards, business reports, exploratory analysis, online responsive analysis, and self-service analytics much faster and in a much more user-friendly manner than any other technology. Search engines coupled with visualization tools like ZoomData and Banana can provide all (or most) of the graphs, charts, bubble diagrams, etc. that you will ever need.
In the next part of this blog series, I will discuss three more reasons why search and big data are a perfect match.