SQL-powered MapD 3.0 woos enterprise developers

MapD 3.0 appeals to enterprises with native scale-out, high availability, and ODBC connectivity, but hybrid cloud deployments will have to wait

MapD, the SQL database and analytics platform that uses GPU acceleration for performance orders of magnitude ahead of CPU-based solutions, has been updated to version 3.0.

The update provides a mix of high-end and mundane additions. The high-end goodies consist of deep architectural changes that enable even greater performance gains in clustered environments. But the mundane items are no less important, as they’re aimed at making life easier for enterprise database developers—those most likely to use MapD.

Previous versions of MapD (not to be confused with Hadoop/Spark vendor MapR) were able to scale vertically but not horizontally. Users could add more GPUs to a box, but they couldn’t scale MapD across multiple GPU-equipped servers. An online demo shows version 3 allowing users to explore in real time an 11-billion-row database of ship movements across the continental United States using MapD’s web-based graphical dashboard app.

Version 3 adds a native shared-nothing distributed architecture to the database—a natural extension of the existing shared-nothing architecture MapD used to split processing across GPUs. Data is automatically sharded in round-robin fashion between physical nodes. MapD founder Todd Mostak noted in a phone call that it ought to be possible in the future to manually adjust sharding based on a given database key.

The big advantage to using multiple shared-nothing nodes, according to Mostak, isn’t only a linear speed-up in processing—although that happens. It also means a linear acceleration for ingesting data into the cluster, which is useful in lowering the bar to entry for database developers who want to try their data out on MapD.

Other features in MapD 3.0—chief among them high availability—are what you’d expect from a database aimed at enterprise customers. Nodes can be clustered into HA groups, with data synchronized between them via a distributed file system (typically GlusterFS) and a distributed log (through an Apache Kafka record stream or “topic”).

Another addition aimed at attracting a general database audience is a native ODBC driver. Third-party tools such as Tableau or Qlik Sense can now plug into MapD without the overhead of the previous JDBC-to-ODBC solution.

A hybrid architecture is not yet possible with MapD’s scale-out system. MapD has cloud instances available in Amazon Web Services, IBM Softlayer, and Google Cloud, but Mostak pointed out that MapD doesn’t currently support a scenario where nodes in an on-prem installation of MapD can be mixed with nodes from a cloud instance.

Most of MapD's customers, he explained, have “either-or” setups—either entirely on-prem or entirely in-cloud—with little to no demand to mix the two yet.

Next read this:

Serdar Yegulalp is a senior writer at InfoWorld, focused on machine learning, containerization, devops, the Python ecosystem, and periodic reviews.