CrateDB packs NoSQL flexibility, SQL familiarity

The open source database includes Elasticsearch-style full-text search, SQL querying, uncomplicated clustering, and unpack-and-go installation

CrateDB packs NoSQL flexibility, SQL familiarity
Credit: Pixabay

CrateDB, an open source, clustered database designed for missions like fast text search and analytics, released its first full 1.0 version last week after three years in development.

It's built upon several existing open source technologies -- Elasticsearch and Lucene, for instance -- but no direct knowledge of them is needed to deploy it, as CrateDB offers more than a repackaging of those products.

The database caught the attention of InfoWorld's Peter Wayner back in 2015 because it promised "a search engine like [Apache] Lucene [and 'its larger, scalable, and distributed cousin Elasticsearch'], but with the structure and querying ease of SQL."

The idea is to provide more than a full-text search system. CrateDB's use cases include big data analytics and scalable aggregations across large data sets. It allows querying via standard ANSI SQL, but it uses a distributed, horizontally scalable architecture, so that any number of nodes can be spun up and run side by side with minimal work.

CrateDB gets two major advantages from the NoSQL side. One is support for unstructured data via JSON documents and BLOB storage, with JSON data queryable through SQL as well. Another is support for high-speed writing, to make the database a suitable target for high-speed data ingestion a la Hadoop.

But CrateDB's biggest draw may be the setup process and the overall level of get-in-and-go usability. The only prerequisite is Java 8, or you can use Docker to run a provided container image. Nodes automatically discover each other as long as they're on a network that supports multicast. The web UI can bootstrap a cluster with sample data (courtesy of Twitter), and the command-line shell uses conventional SQL syntax for inserting and querying data. Also included is support for PostgreSQL's wire protocol, although any actual SQL commands sent through it need to adhere to CrateDB's implementation of SQL.

CrateDB's one of a flood of recent database products that all address specific issues that have sprung up: scalability, resiliency, mixing modalities (NoSQL vs. SQL, document vs. graph), high-speed writes, and so on. The philosophy behind such products generally runs like this: Existing solutions are too old, hidebound, or legacy-oriented to solve current and future problems, so we need a clean slate. The trick will be to see whether the benefits of the clean slate outweigh the difficulties of moving to it -- hence, CrateDB's emphasis on usability and quick starts.