Guest View: OLAP + OLTP = …PostgreSQL?

by Jeremy


Those features include parallel query processing, columnar storage (which greatly reduces the time required to scan and filter query data), and more mature query planning, optimization, and execution, which give you faster performance out of the box with less tuning.

Within the last 18 months, these technical shortcomings have largely been addressed. The most recent release of PostgreSQL, version 13, included improvements in parallel processing and query planning. More importantly, PostgreSQL, due to its wonderful extensibility and enthusiastic community, now benefits from a number of extensions, which enhance it with greater parallel processing and columnar storage. So technically, we now see PostgreSQL performing on-par with most of the commercial data warehouse systems, but at 50% to 90% lower annual cost.

Trend three: Goodbye OLAP, hello HTAP
The definition of a data warehouse is changing. Traditionally, an analytic system would copy data from one or more sources, via extract-transform-load (ETL) programs, into a separate data warehouse database. The problems with this approach are a) cost of maintaining a separate database, b) reporting latency and c) massive effort maintaining ETL scripts. Sometimes minutes, hours, or even days pass before the data is ETL’d into the data warehouse, which means people are analyzing data that might be outdated already.

The solution to these problems is hybrid transactional/analytical processing (HTAP). Instead of having separate databases for transactions and for analytics, you have a single database, which serves both needs. In the database management system market, we see two main paths to HTAP — you either add some “T” to an OLAP database, or more commonly we see OLTP databases being enhanced with some “A.”

In PostgreSQL for instance, HTAP is primarily enabled by columnar indexes. Some of the extensions I mentioned earlier enhance PostgreSQL with columnar indexes on its base tables, which accelerate query performance by greatly reducing I/O. This is much easier for developers, and it gives businesses the ability to analyze data that is fresher, even in real-time.



Source link

Related Posts

Leave a Comment