
|
What You Need to Know

|

|
The data warehouse database management system (DBMS) market continues to show intense, increasing competition. Interestingly, 2007 saw the market continuing to embrace the appliance solution while many organizations are deploying technically elegant solutions, a larger part of the market is following the mantra of "easy to install, rapid time-to-productivity and 80% of needs met." IBM, Oracle and Teradata continue to battle for the largest part of the market with increased marketing and new functionality. Microsoft has entered the fray with a more competitive DBMS, which has seen a rapid uptake in midsize businesses especially important, as they will grow into large companies. Although some organizations are using Netezza for larger deployments, it continues to be opportunistic in delivery to organizations needing point solutions for specific analytic or end-user group needs. The march toward a mission-critical status for the data warehouse continues, with data warehouses serving in an increasingly mixed workload capacity. "Deep mining" analysts, and business analysts running less-structured but equally complex queries and fast-running tactical queries, are all competing for CPU, memory and disk access. Each have differing service-level expectations, while at the same time, data latency is changing from batch to continuous demand. Ignore marketing claims and base your decisions on customer references and proofs of concept to ensure that claims made by vendors will hold true in a real-life environment more specifically, your own environment. Although this is a mature market with the full attention of large vendors seeking to make inroads with scale and innovation, smaller entrants often deliver a more focused, innovative solution (evidenced by the fact that IBM, Oracle and HP have all learned their lesson and deployed an appliance solution of some type).

|
|


|
Magic Quadrant

|

|
Figure 1. Magic Quadrant for Data Warehouse Database Management Systems, 2007
Source: Gartner (September 2007)

The data warehouse DBMS market has evolved from a traditional information store supporting business intelligence (BI) users and tools into an analytics infrastructure repository of the enterprise. And organizations are adding additional workload from online transaction processing (OLTP) applications, as well as by increasing the frequency of data loading. In many cases this is at the level of, or approaching, continuous loading. The customer base of the entire market is demanding more from the vendors overall, which is leading to an increase in white space in the top right corner of the Magic Quadrant as these needs and requirements are initially not met. The focus of the market has moved from execution (supporting real-time applications) to a vision of mission-criticality. Mission-critical systems are those that support the generation of revenue or execute cost control in support of business processes such that without them, for a period longer than one hour, they must be replaced by personnel manually performing formerly automated tasks to prevent loss of revenue or unacceptable increased costs. This has resulted in some vendors improving their vision but actually moving left when compared with last year. An example of this is moving the computational capability closer to the storage level (for example, vertical DBMSs such as Sand technology, Sybase IQ and Vertica support efficient sub-selection of stored data and reduced input/output [I/O]). In addition, the size of the database is becoming less important. In the past, buyers believed that the vendor with the largest database was the leader. Today, smaller data warehouses of 5TB to 20TB are commonly solving organizations' analytic needs. Overall, these issues are making it more difficult for IT to meet the customer's vision of data warehouse use today, including the customer's required service-level agreements (SLAs) (see "Mission-Critical Data Warehouses Demand New SLA").
We continue to see new vendors entering the market and mature vendors offering new solutions. For the coming year, we will be watching several new vendors of DBMS engines that currently meet some, but not yet all, of the criteria required to be included here EnterpriseDB GridSQL, i-lluminate, ParAccel and Vertica. The lessons learned from DATAllegro, Netezza and Teradata about the advantages of preconfigured, balanced "appliance" solutions (see "Data Warehouse Appliances Are More Than Just Plug and Play") have caught the interest of some more traditional vendors, such as Bull (DataWarehouse Parallel Server using DATAllegro), HP (Neoview), IBM (Balanced Warehouse), Oracle (Oracle Optimized Warehouse for Dell and EMC) and Sun (using the Greenplum DBMS with the Sun Fire X4500 data server). The latter two are examples of a hardware company partnering with a software company to produce a data warehouse appliance.

Data Warehouse Mixed Workloads
The traditional data warehouse workload of queries and reporting is changing to a data warehouse with several distinct workloads:
- Continuous (near-real-time) data loading similar to an OLTP workload (due to the updating of indexes and other optimization structures in the data warehouse) which forces issues in summary and aggregate management to support dashboards and pre-built reports.
- Batch data loading continues to persist as the market matures and begins to realize that not all data is required for "right time" latency and that some information, being less volatile, does not need records refreshed as frequently as the more dynamic real-time data elements.
- Large numbers of standard reports ranging in the thousands per day, requiring Structured Query Language (SQL) tuning, index creation, new types of storage partitioning and other types of optimization structures in the data warehouse.
- Tactical business analytics in which business process professionals with limited query language experience utilize pre-built analytic data objects with aggregated data (pre-joins) and designated dimensional drill downs (summary). They rely on a BI architect to develop commonly used cubes or tables.
- An increasing number of true ad hoc query users (data miners) with a random, unpredictable use of the data, implying a lack of ability to specifically tune for these queries.
- The use of analytics and BI-oriented functionality in OLTP applications, creating a highly tactical use of the data warehouse as a source of information for the OLTP applications requiring high-performance queries. This is one force driving the requirement of high availability in the data warehouse.
These six workload types are creating more issues for vendors than the actual size of the data warehouse, even manifesting in databases smaller than 1TB. In addition to service-level expectations (see "Mission-Critical Data Warehouses Demand New SLAs"), the size and duration of "useful" data for each community often differs significantly, forcing every aspect of the data warehouse environment to become involved from I/O channel balancing through disk management and into memory and processor allocation. Through 2010, mixed workload performance will remain the single most important performance issue in data warehousing. As a direct effect of the complex mixed workload, with continuous loading and the increase in automated transactions from the functional analytics in OLTP, the transactional DBMSs may be able to erode the performance edge that was formerly attributed to specialized data warehouse DBMS solutions.

Optimization and Performance
Another effect of this growing complexity and size is that organizations are reporting differing estimates of the resources required to support the enterprise data warehouse and accompanying warehouse-dependent data marts. Some of this variation can be attributed to the mixing of resources required to support the physical warehouse (such as storage management, database reorganization and resource balancing) vs. the resources needed to support logical modeling, business process understanding, and optimization and performance enhancements. However, all data warehouses require optimization techniques. In some cases, combinations of logical designs create physical approaches in the database, such as automated summary-type tables and different types of physical views.
In other cases, optimization requires the design and deployment of physical tables bulk-loaded from extraction, transformation and loading (ETL) processes in micro-batches throughout the day, or even intra-hour. DBMS systems formerly classified as "general purpose" (such as DB2, Oracle and SQL Server), which once required manually maintained solutions, can be run with fewer administrative and database administrator (DBA) staff as vendors automate many of the physical DBMS support functions. At the same time, even the specialized data warehouse DBMSs are beginning to require more automated system management. As the size of the "source system extracted" data grows beyond the 5TB to 10TB level, the use of industry standard best practices in data warehouse design becomes extremely important. The need for additional people and storage (for structures used to tune the data warehouse design) should not be seen as a negative, but as part of the process of increasing the value of information. In fact, optimization techniques and access layers should be viewed as information semantic layers that increase the usability of the data in the warehouse. Multiple application types are accessing the data warehouse, and application optimization should not always be considered a cost of the data warehouse. These applications can issue queries in rapid succession far faster and with significantly more demand than human end-user queries issued manually. For example, high-level summary tables should be considered an application development burden, as opposed to low-level summary tables that can be used by different applications.
A growing issue has been the amount of storage required to optimize a data warehouse as measured as a factor of the source system extracted data. It is important to note that since optimization techniques can, and should, be used on all DBMSs, all data warehouses are larger than the originally loaded data volumes. The ratio of optimization requirements in a data warehouse to its source system extracted data volume changes throughout the life of the data warehouse. Initially, data warehouses exhibit a large optimization to source system extracted data ratio when many different applications drive multiple optimization strategies. This is also initially true, as the size of the data warehouse is small and therefore the ratio is numerically higher. As the data warehouse ages, the ratio begins to drop because larger-detail data volumes do not necessarily require additional optimization space. In addition, as the size of the overall data warehouse grows, the amount of storage for indexes begins to level off, causing the ratio to drop. In all cases, optimization requires diligence on behalf of the DBA staff to manage the storage growth. This includes working closely with the business units to determine the real need for detail records vs. summary records in the data warehouse. Many clients report situations involving detail history stored in the data warehouse for more than 10 years when, in reality, it is needed for only five or less. Also, with more organizations deploying data warehouses, there is a greater variance in the skills of individuals implementing even the best warehouse platform can be negated by an inferior data architecture.

New Developments and Best Practices
A recent development (beginning in 2006 and continuing into 2007) has seen the growing popularity of "distributed data warehouses." These utilize an unvaried logical model with multiple physical locations. The data is logically divided into domains (for example, all records for a single region in exactly the same model as the next region, or all customer data in one place and all products in another) and distributed without duplicate records across the various locations. Reasons given for this approach vary from the creation of physical data security zones to global operations requiring 24/7, time-zone-based analytics. This approach should not be confused with a federated design, which combines discrete logical models using translation tables and remains a data warehouse worst practice.
The practice of performing a POC with the "shortlist" of vendors during the selection phase of the data warehouse infrastructure has become a best practice. This is especially important when considering one or more of the newer entrants to this market. We recommend POCs utilizing as much real source system extracted data from the operational systems as is reasonable to use. We also recommend performing the POC with as many users as possible, creating a data warehouse workload approaching the environment to be used in production. In addition, we recommend not giving all the sample queries to the vendors ahead of the actual POC. Save some of the more complex queries to use during the POC to be certain the DBMS has not been "pre-tuned" for your queries. Finally, we recommend data loading as part of the POC, even if that is not one of the important requirements of the system. If continuous loading is a requirement, it must be part of the POC, adding to the query and reporting load. If batch loading is desired, then understanding the vendor's requirements and/or restrictions is important, as the size of the window for loading in most organizations is diminishing as the data warehouse becomes 24/7.
Beginning in 2006, primarily with Kognitio (see "Magic Quadrant for Data Warehouse Database Management Systems, 2006"), was the concept of data warehousing as a managed service. The DBMS vendor would develop and run the data warehouse for the customer. This began at the business unit level, where the business unit purchased a managed service instead of using the organization's data warehouse, and usually with the assistance of the IT organization. Normally our references reported that the IT organization did not have the bandwidth to support the specific application due to overloading of the data warehouse and/or the IT staff. In 2007, we have seen an increase in the use of this model, with one reference using it for the entire organization's data warehouse. Greenplum now has several references also using this model, with Greenplum managing the data warehouse at its site. We expect the use of this model to increase over the next few years, especially for single business units or a specific data warehouse application. Further, we believe this model will develop into a software as a service (SaaS) model over the next few years for organizations in the small and midsize business category that lack the expertise and funds to support their own data warehouse.

Due to the wider acceptance and increasing variety of data warehouse-driven applications and workload variations, data mart proliferation has become a primary concern. A data mart is defined as an application-specific analytic repository of any size, normally with a specific, smaller group of users (see "Of Data Warehouses, Operational Data Stores, Data Marts and Data 'Outhouses'"). An application can be a traditional set of applications (for example, SAP) or a workload-specific application (such as business analysts using a subset of the data for data mining purposes). Data marts can be used to optimize the enterprise data warehouse (EDW) by offloading part of the workload to the data mart, returning greater performance to the EDW.
In addition, over the past year, we have seen a growing number of data marts specifically used for analytics. This is not only due to the workload that analytics can place on the EDW, but also because some of the data warehouse DBMS engines excel in analytic application. Specifically, the column-oriented DBMS engines such as ParAccel, Sand/DNA Analytics, Sybase IQ analytics server and Vertica have shown superior performance in analytic applications over the more traditional row-based DBMSs. Several references reported an improvement in performance of up to 100 to 1. However, when used in a complex query workload for example, with many columns returned in a SELECT (that is, complete rows containing all the columns) with complex joins (that is, greater than 10 to tables in a single SELECT) this can result in column-based DBMSs performing no better, and on occasion worse, than a row-based DBMS. Column-oriented DBMSs are normally not a good choice for an EDW.

Market Definition/Description
The data warehouse DBMS market consists of the vendors supplying DBMS products that provide the database infrastructure of the data warehouse.
For the purpose of this definition, a DBMS is a complete software system that supports and manages a logical database(s) in storage. Enterprise data warehouse DBMSs are those systems that, in addition to supporting the relational data model (extended to support new structures and data types such as materialized views and XML), also support data availability to independent front-end application software, and include mechanisms to isolate work-load requirements and control various parameters of end-user access within a single instance of the data. It is important to note that a DBMS cannot "be" a data warehouse. It is the platform on which a data warehouse (solution/data architecture) is deployed. This market is specific to DBMSs that are used as a platform for an enterprise data warehouse.
An enterprise data warehouse is one in which two or more disparate data sources are brought together in an integrated, time variant repository. Its logical design includes the flexibility to introduce additional disparate data without significant modification of its existing entity design. An enterprise data warehouse can be of any size, though Gartner defines a small data warehouse as less than 5TB; a medium data warehouse as 5TB to 20TB; and a large data warehouse as greater than 20TB. For purposes of measuring the size of a data warehouse database, we define data as source system extracted data, excluding all data warehouse design-specific structures (such as indexes, cubes, stars and summary tables). Source system extracted data is the actual row/byte count of data extracted from all sources.

Inclusion and Exclusion Criteria
- Vendors in this market must have DBMS software that has been in general availability for at least one year.
- Vendors must have generated revenue from a minimum of 10 distinct organizations with data warehouse DBMSs in production.
- Customers in production must have deployed enterprise-scale data warehouses that integrate data from at least two operational source systems for more than one end-user community (such as separate business lines or differing levels of analytics).
- Support for these data warehouse DBMS products must be available from the vendor community-supported open-source software (OSS) products are not included.
- Data warehouse DBMS systems or products that support an integrated front-end tool, but can also open their DBMS to competing applications, are included if access is achieved via open-access technology, as opposed to custom-built application programming interfaces (APIs).
- Vendors participating in the data warehouse DBMS market must demonstrate their ability to deliver the necessary infrastructure and services to support an enterprise data warehouse.
- Vendors must be commonly considered by Gartner clients as an option for supporting an enterprise data warehouse.
- Products that include unique file management systems embedded in the front-end tools, or that exclusively support an integrated front-end tool, do not qualify for this market.

In this version of the Data Warehouse DBMS Magic Quadrant, we have added Greenplum. This vendor was formed by the merger of two DBMS vendors (Metapa and Didera) and has been quietly adding clients on several different hardware platforms for about two years. As mentioned in the 2006 version, it has a massively parallel processing (MPP) DBMS based on the PostgreSQL OSS DBMS and, with its partnership with Sun, has been gaining clients rapidly.


Ability to execute is primarily concerned with the ability and maturity of the product and the organization. These criteria also consider the portability of the product and its ability to run and scale in different operating environments, giving the customer a range of options. This also includes the differentiation between data warehouse DBMS solutions and data warehouse appliances. The ability to execute criteria are critical to the level of satisfaction and success the customer has attained with the product, and so customer references are weighted heavily throughout these criteria.
- Product and service includes the technical attributes of the DBMS. We include scalability, manageability, security, high availability/disaster recovery, support of mixed workloads and data loading. These attributes are measured across a variety of database sizes and workloads. Also, we consider the resources necessary to manage the data warehouse, especially as the data warehouse scales to larger sizes and more complex workloads.
- Overall viability includes the corporate aspects of ability to execute, such as the skill level of the personnel, financial stability, R&D investment, and merger and acquisition activity. This also includes management's ability to be responsive to market changes and, therefore, the ability of the company to survive through market difficulties (critical to the long-term survival of the vendor).
- Under sales execution and pricing, we examine the price and different pricing models of the DBMS, the ability of the sales force to manage accounts and if the sales team is compensated appropriately in line with the corporate marketing initiatives. We also include the channel partnerships here, and the ability of the vendor to create and use the partner model.
- Market responsiveness and track record covers the issue of references (for example, how many, what sizes, what configurations and workload mix). Also included is the ability of the vendor to adapt to market changes and its history of being flexible to market dynamics.
- Market execution explores how well the vendor understands and builds its products in response to customers' needs, in addition to targeting offerings to these needs and to the needs of the market in general. This criterion includes the completeness of the vendor's offering as well.
- Customer support and professional services are evaluated as part of the customer experience criterion, together with input from customer references as described earlier. Also included is the track record for proof of concepts and customer perceptions of the product, as well as aspects of customer loyalty to a given vendor. This demonstrates customer tolerance of vendor practices and may indicate satisfaction.
- Operations cover the alignment of the company's operations, as well as whether and how they enhance the ability of the company to deliver.
Table 1. Ability to Execute Evaluation Criteria
Product/Service |
high |
Overall Viability (Business Unit, Financial, Strategy, Organization) |
high |
Sales Execution/Pricing |
standard |
Market Responsiveness and Track Record |
high |
Marketing Execution |
standard |
Customer Experience |
standard |
Operations |
low |
Source: Gartner

Completeness of vision encompasses the ability of the vendor to understand the functionality necessary to support the data warehouse workload design, the product strategy designed to meet market requirements, and the ability to understand overall market trends and influence or lead the market when necessary. A visionary leadership role is necessary for the long-term viability of the product and the company. A vendor's vision is enhanced by its willingness to extend its influence throughout the market by working with independent, third-party application software vendors who deliver data warehouse-driven solutions (such as BI). A successful vendor will be able not only to understand the competitive landscape of data warehouse, but also to shape the future of this field. However, Gartner clients are cautioned to be wary of vendors with extremely good vision (including the communication of that vision) but with low execution capability. Data warehouses are mission-critical and poor execution will begin to hurt the overall viability of the organization.
- Market understanding covers the ability of the vendor to understand and shape the market and show leadership in the data warehouse DBMS market. In addition to examining the core competencies of the vendor in the data warehouse DBMS market, we also consider the awareness of the vendor of new trends in the market.
- Marketing strategy refers to the vendor's marketing messages and its ability to choose appropriate target markets and third-party software vendor partnerships to enhance the marketability of the product. For example, does the vendor encourage and support independent software vendors (ISVs) in its effort to support the DBMS in native mode?
- An important criterion for vision is the sales strategy. This encompasses all the channels and partnerships developed to assist with sales. This is especially important for younger organizations, allowing them to greatly increase their presence in the market while maintaining a lower cost of sales. This criterion also includes the company's ability to communicate its vision to its field organization and, therefore, to clients and prospects.
- Offering (product) strategy covers the areas of portability and packaging of the products. Vendors must demonstrate a strategy that enables customers to choose what they need to build a complete data warehouse solution. We also consider the availability of the vendor's DBMS as a data warehouse appliance.
- The business model covers how the vendor's model of a target market combines with product offerings and pricing, and whether it has the ability to produce profits with this model based on the packaging and offerings.
- Specifically for the data warehouse DBMS market, we do not believe that vertical industry strategy is a major focus, but it does affect the ability of the vendor to understand its clients. Specific models for the data warehouse, however, belong in a discussion of applications.
- Innovation is a major criterion for evaluating the vision of data warehouse DBMS vendors in developing new functionality, spending in R&D, pushing the market in new directions and pushing the envelope in the market. This also includes the vendor's ability to innovate and develop new functionality in the DBMS, specifically for the data warehouse. Increasingly, users are expecting the DBMS to become more self-managing and self-tuning, reducing the resources involved in optimizing the data warehouse, especially as the mixed workload increases.
- The worldwide reach of the organization and its geographic strategy is evaluated considering its ability to leverage the resources in geographic regions, as well as subsidiaries and partners in other geographies. This is becoming increasingly important as regionally distributed data warehouses increase (as discussed in the Market Overview). A vendor's success increasingly depends on its ability to market and support its data warehouse DBMS in a geographically dispersed area, using subsidiaries or distributors. This criterion also includes the ability of the vendor to support clients throughout the world, around the clock, in many languages.
Table 2. Completeness of Vision Evaluation Criteria
Market Understanding |
high |
Marketing Strategy |
standard |
Sales Strategy |
standard |
Offering (Product) Strategy |
high |
Business Model |
standard |
Vertical/Industry Strategy |
low |
Innovation |
high |
Geographic Strategy |
high |
Source: Gartner

The Leaders quadrant for data warehouse DBMS contains those vendors that demonstrate the greatest degree of support for data warehouses of all sizes (small to large), with large numbers of concurrent users and a high degree of mixed data warehousing workloads. These vendors lead the market in data warehousing by consistently demonstrating customer satisfaction, strong support and professional services, as well as longevity in the data warehouse DBMS market, with strong hardware alliances. Because of this track record, Leaders also represent the lowest risk for successful data warehouse implementations. Additionally, the maturity of this market demands that Leaders maintain a strong vision regarding the key points emerging during the past year: mixed workload management for end-user service-level satisfaction and data volume management.

This quadrant typically represents vendors with strong offerings for the client base. They have the market presence in the data warehouse DBMS space but have not yet shown or proved their vision or leadership in the market. Challengers generally have a highly capable execution model. Ease of implementation, clarity of message and end-client engagement all contribute to making these vendors successful. Challengers show a wide variety of data warehousing implementations across different sizes of data warehouses with mixed workloads. Organizations often demonstrate concern regarding vendors' ability to deliver at the enterprise level in cases where growing data volume or high end-user counts are involved. This includes offerings with a weaker marketing message, but products exhibit the potential to move into the Leader's quadrant by demonstrating strong, new client acceptance.

Visionaries are those vendors that represent a forward-thinking approach to managing the hardware, software and end-user aspects of the data warehouse. Visionaries frequently suffer from a lack of global or even strong regional presence. They normally exhibit a smaller market share. New entrants with exceptional technology may appear in this quadrant very early after their general availability release, but more typically, unique or exceptional technology will emerge in this quadrant after several quarters of general availability. The Visionaries quadrant is often populated by new entrants that have new architectures and functionality that is yet unproven in the market. The requirement for production customers and general availability of at least one year indicates they must be more than a startup with a good idea. Vendors must demonstrate customers in production proving the value of the new functionality and architecture. Frequently, Visionaries will drive the leaders toward new concepts and engineering enhancements.

A Niche Player has low market share or low market appeal. Frequently, a niche player provides an exceptional data warehouse DBMS product, but it is isolated or limited to a specific end-user community, a specific geography or a specific vertical industry. Although the solution itself may be without limitations, market adoption is limited. This quadrant contains vendors in two categories: smaller vendors with data warehouse DBMS products that lack the customer base or smaller vendors with a data warehouse DBMS that lacks the functionality of leaders. They typically offer smaller, specialized solutions that are used for specific data warehouse applications depending on the needs of the client. This quadrant also includes new data warehouse DBMS products that lack general customer acceptance or proven functionality to move beyond niche status. This is the starting point for many new entrants.

Vendor Strengths and Cautions
- DATAllegro continues to demonstrate the ability to support large configurations of data (50TB to 100TB) at price/performance points at the same level as, or below, the competition. DATAllegro's references report excellent performance in a relatively complex workload of queries and reporting.
- DATAllegro has continued to improve its software. It has released version 2 of the DBMS and recently, with version 3, has become more hardware independent, taking advantage of new industry advancements. V3 uses standard Dell and EMC technology, making use of blade technology.
- Growth has been consistent, and the company currently has about 15 to 20 customers. Although it is not growing as fast as the competition, its customers are typically larger data warehouse implementations. The company is also developing several channel partnerships (especially in the telco space), and Bull has just announced its DataWarehouse Parallel Server, utilizing the DATAllegro DBMS on Bull's NovaScale hardware, to be sold in Europe.
- DATAllegro is based on Ingres, an OSS DBMS and one of the original relational DBMS (RDBMS) products of the 1990s. Not only has DATAllegro added value to the Ingres DBMS through many enhancements to the functionality (now supplied as part of the Ingres OSS DBMS), but its has also added a software layer over the top of Ingres, creating the MPP architecture to parallelize queries across the processors and to manage the workload of the appliance.

- DATAllegro is gaining new customers, though at a slower rate than other young companies of the same age. It will need to increase market awareness in 2008 and show additional growth in the customer base to prove its long-term viability. The Bull partnership will certainly assist in Europe.
- Ingres could be an issue with potential customers. Although Ingres has a small but solid revenue stream, it has been struggling to add new customers. In addition, the majority of its customers are OLTP, and even with the enhancements for data warehousing (provided primarily by DATAllegro), Ingres does not have many data warehouse implementations. The worst case scenario is that DATAllegro would have to assume responsibility for development of the core DBMS, making DATAllegro similar to many of its competitors with a proprietary DBMS.
- As with other small, startup vendors, there is additional risk with a decision to use the DATAllegro appliance.

- Greenplum, our newest entrant to the Magic Quadrant, has been quietly adding production clients over the past two years, and now has more than 20. As an MPP data warehouse DBMS based on PostgreSQL, it runs on Linux and Unix. As a stand-alone DBMS engineered for data warehousing, it has demonstrated scalability in production to hundreds of terabytes and internally to over a petabyte (1,000TB). It has also demonstrated the ability to run and manage the mixed workload in a number of references.
- Just over a year ago, Greenplum announced its first partnership with Sun to supply a data warehouse appliance on the x4500 server (see "Sun's Low-Priced DW Appliance Seeks to Disrupt Market"). Not only has this made Greenplum visible as a DBMS, but it has already produced production references. These initial clients report performance equal to Teradata in a live application environment, at a fraction of the cost.
- A strength of Greenplum's is its strategy of selling through partners and system integrators. This relieves it from growing a sales force (an expensive proposition for a small startup vendor) and allows it to concentrate on development and support. As in the case of Sun, the partners provide the balanced, packaged configuration of hardware with the Greenplum DBMS as an appliance offering, along with the single point of service required of an appliance.
- The company's use of an OSS DBMS as the core work engine also helps to reduce costs while it concentrates on the management software surrounding the data warehouse and the optimization features necessary for a complex, mixed workload environment.

- As with all new startup companies, there is risk inherent with Greenplum growing sales, support, marketing and other functions while trying to grow a customer base. This risk may be reduced for Greenplum thanks to its sales model of selling through partners and needing to grow a "feet-on-the-street" sales force.
- Greenplum must add additional partners and not depend on only one (Sun). Then if it loses a partner, the impact on revenue will be minimized. Further, we expect to see the major hardware vendors (including HP and Sun) having more than one data warehouse appliance.
- Greenplum also faces competition from other appliance vendors such as DATAllegro, HP Neoview and Netezza. It needs to clearly distinguish itself from these other appliances.

- IBM can engage organizations that desire a preloaded solution or those that want to build out their own hardware environment. IBM's DB2 Warehouse is a software-only solution. Its data warehouse appliance solution, the Balanced Warehouse, is a combined server and storage hardware solution (using the System p server with AIX, or the System x server with Linux), DB2 Warehouse (using the IBM DB2 9 DBMS) with service and support sold in Balanced Configuration Units (BCU).
- IBM's professional services continue to support most data warehouse DBMS platforms and include specific training and support for IBM data warehouse DBMS offerings, with worldwide support for IBM warehouse solutions.
- The DB2 Warehouse includes additional workload management software, data transformations in the data warehouse, integration with SAS and SPSS supporting data mining and data visualization capabilities, OLAP support for data warehouse modeling with bridges to common BI tools, and logical and physical data partitioning.
- IBM has hundreds of customers running data warehouses in the medium size of source system extracted data ranges, primarily the System z and System p. IBM also has customer references running data warehouses with DB2 Warehouse in the large range of source system extracted data (see Market Overview).

- IBM DB2 warehouse total volumes range from 1.3 times to as much as 5 times the size the data warehouse range of the source system extracted data, depending on several characteristics of the data warehouse and the expertise of the data warehouse staff. As explained in the market overview section, poor design and/or smaller database sizes can lead to larger ratios.
- Based on Gartner end-user inquiries, organizations generally are not adopting IBM products for the data warehouse unless DB2 is already present in the client site. Gartner does receive inquiries from current DB2 customers seeking data warehouse alternatives.
- The IBM Software Group also continues to face the issue of "co-opetition." The hardware divisions not only sell hardware for other DBMS platforms, but are also beginning to openly support other DBMSs (see "MySQL Will Open IBM System i to New Applications and Customers") and even partner with other DBMSs with competing appliance offerings.

- Kognitio comes from a strong DBMS appliance background (Whitecross) that has a track record of solid performance. References report that performance is excellent, with large mutiterabyte databases in an analytics and reporting environment for many users (ranging in the thousands).
- Kognitio is the original data warehouse DBMS to be primarily used as a managed service, with most of its clients buying its data warehousing services from Kognitio, while Kognitio hosts the database. Recently, we have seen more activity in this model. There are two reasons for this increased activity: 1) Some business units are dissatisfied with the IT department buying a managed service for their BI needs; and 2) Some small companies cannot afford their own data warehouse infrastructure but have large volumes of data to process for analytics. This is a growing market.
- Kognitio is taking this model to other managed services vendors and we believe that in 2008, we will see additional sources of managed services on Kognitio (with the added advantage of Kognitio breaking out of the Europe only market).
- In 2007, Kognitio gained several large clients installing the system on-site, as opposed to taking it as a managed service. These installations are large, analytic data warehouses. This demonstrates Kognitio's ability to supply a data warehouse DBMS capable of competing with many of the market incumbents.

- Today, Kognitio remains primarily a European-only vendor with most of its accounts in the U.K. This greatly limits its growth potential, unless it develops new regional markets (especially in North America) and add partners in other regions or worldwide.
- Kognitio is normally not a candidate for supporting an on-premises, enterprise-scale data warehouse. Today it is primarily used as a data warehouse (managed by Kognitio), and most IT organizations want to have their own data warehouse, not purchased as a service. If the industry moves toward more managed services for warehousing, this could become a strength over time.

- The use of SQL Server 2005 for data warehousing is accelerating. It has now been generally available for two years. At first, most early adopters were in OLTP. Now, we see from inquiries that SQL Server is also being used in data warehousing, especially for databases up to 5TB or 6TB in size.
- Microsoft offers value for the price paid. The purchase of SQL Server 2005 Enterprise Edition includes SQL Server Analysis Server (SSAS), SQL Server Reporting Services (SSRS) and SQL Server Integration Services (SSIS), which means OLAP, reporting and data integration for ETL are included in the low starting price.
- SQL Server 2005 scales from small warehouses to midsize ones without a great deal of effort. As data warehousing becomes more prominent in growing midsize businesses, SQL Server is expected to grow with the business relative to data warehousing.
- Worldwide support from Microsoft is extensive (including partners, value-added resellers, third-party software and tools, and the wide availability of the SQL Server skill base), and with the recent purchase of companies such as ProClarity, it is increasing its focus on BI as a core enterprise application.
- SQL Server 2008 has been announced for release in the first half of 2008 (and is beta testing now), with many new enhancements for data warehousing, demonstrating Microsoft's intent to be a major presence in this market.

- Microsoft has a short history in the large data warehouse category for SQL Server. But it does have large enterprise warehouse references for SQL Server, and these require server environment management as well as database management to achieve success. This also leads to a small body of best practices and a skills base in large implementations. It will take two to three more years before organizations building enterprise mission-critical data warehouses regularly consider SQL Server as competitive solution.
- When appropriately including the storage requirements of SQL Server Analysis Server cubes in the total warehouse size, SQL Server data warehouse total volumes can range from two to as much as five or six times the size of the source system extracted data. With proper use of SQL Server Analysis Server cubes, this ratio can be reduced to somewhere nearer the lower end of this ratio. However, customers are often using SQL Server Analysis Server cubes when they are not required, artificially increasing storage requirements to nearer the top end of this range.
- SQL Server only runs on Windows Server, and therefore lacks the portability of many of its competitors. Many IT organizations do not consider SQL Server since they are not willing to run Windows Server in the datacenter environment.

- MySQL has continued to mature new functionality, growth of professional services, a growing sales force, an alliance with IBM, and the addition of many new third-party software vendors. With its new MySQL Enterprise offering (an installable system from a set of discs like most other DBMSs), it has seen rapid market acceptance. Many clients are beginning to use MySQL as a data warehouse engine for small data warehouses, up to about 200GB to 500GB in size. However, many data warehouse implementations begin small and grow over time. MySQL will see the same growth as its scalability is proven over time.
- MySQL has several references with mutiterabyte data warehouses in production using a technique MySQL calls "sharding." This technique splits the database into smaller pieces of less than a terabyte. Although this requires more resources to manage the database and associated storage, it does represent another step in the direction of large data warehouse capabilities.
- MySQL still maintains a low price point a free license with support subscriptions ranging from $599 per year per server to $40,000 per year (for the unlimited server license of MySQL Enterprise).
- The recent announcement by IBM and MySQL to port the MySQL DBMS to the System i opens MySQL to many new clients and can be expected to be used here for OLTP and data warehousing on the System i (see "MySQL Will Open IBM System i to New Applications and Customers"). Similarly, BrightHouse, Infobright's column-oriented engine, uses MySQL to create an analytic data warehouse solution. These possibilities are due to the architecture of the MySQL DBMS, allowing MySQL to work with multiple storage engines.
- As number of downloads is not relevant to market growth (you cannot distinguish between experimental and educational downloads versus downloads for production), the increasing number of clients purchasing support services and MySQL Enterprise has led to revenue doubling year over year.

- MySQL continues to lack references for data warehousing that break the 1TB barrier in a single instance of the DBMS (see "sharding," mentioned earlier). To become a strong player in the overall DBMS market, and specifically the data warehouse DBMS market, it will need to spend 2008 concentrating on developing these accounts as referencable data warehousing customers with a range from 1TB to 5TB. In addition, it will need to begin to demonstrate scaling above the 10TB range in a mixed workload to dispel the perception of a lack of scalability of MySQL.
- The company is facing increased competition from some of the new entrants using OSS DBMS technology, such as EnterpriseDB (just beginning to support Data Warehousing with EnterpriseDB GridSQL), ParAccel and Vertica all of which are using PostgeSQL as a base.
- Currently, MySQL still lacks many of the special features necessary to be a serious contender for large data warehouses. For example, current production version 5.0 does not have partitioning, which is due for version 5.1. Although MySQL has some basic functionality for workload management (such as storing query statistics), it will need to add more control and automatic management functionality to handle large data warehouses and the mixed workload.
- The low entry cost of using MySQL does not always equate to low total cost of ownership (TCO), as the cost to manage a large data warehouse without the broad availability of management tools (as with the larger, more mature data warehouse DBMSs) leads to the use of resources to perform these management tasks manually.

- Netezza offers an appliance solution that largely eliminates the need to balance hardware and software implementations in the environment. Because of this, Netezza actually plays in two segments of the market an add-on for existing warehouses as an appliance-based data mart or as an enterprise warehouse.
- Netezza has remained tightly focused on its product and the market it serves for more than four years. Each time Netezza has announced a new research and development focus, the company has pursued the effort with efficiency. It began delivering features (such as zone maps and short query bias) to address the data warehousing mixed workload with Release 3 of its software in December 2006. The latest performance enhancement is the Netezza Developer Network a group of third-party vendors (such as SAS Institute) developing software to run on the Snippet Processing Unit (SPU), moving DBMS and analytic functionality closer to the storage.
- Netezza's disruptive lower prices, combined with its performance, continues to challenge Teradata and (when hardware requirements are included) even Microsoft on overall price and performance. Netezza's price and performance simplicity also challenges the other DBMS vendors, such as IBM, Oracle and Sybase.
- Netezza has a strong track record of new customers, with more than 100 customers at the end of its fiscal 2007. And as of July 2007, it was trading as a public company.

- Customers report that queries spend very little time in the processing queue awaiting resources however, no customer has reported more than 20 concurrent queries. Netezza references report conflicting performance regarding a full query queue. Some customers report performance efficiency so high the query queue remains empty. But other customers report gradually degraded performance as the query queue grows. Meanwhile, some Netezza references claim that the query processing is completed so quickly that the queue never exceeds 20 queries.
- With more than 100 customers, Netezza has reached a large enough customer base to generate a solid support and maintenance base. However, this is still a fraction of its mainstream competition (IBM, Oracle and Microsoft number in the thousands or even tens of thousands) and even Teradata is an order of magnitude larger. Netezza will need to continue its strong growth and, most importantly, become profitable.
- Netezza presents a conundrum an appliance presenting a solution to hardware and software balancing that is beneficial to implementation and overall management efforts for the warehouse environment. However, organizations that exceed the current Netezza appliance volume can face issues in upgrading to larger configurations. Older Netezza configurations will need to be replaced. Recently, Netezza added the ability to add new racks, making the system field upgradable. It also offers storage on demand where the customer can purchase a larger system than needed, paying only for what they need today and purchase additional space as needed.

- Worldwide support and customer experience make Oracle a solid choice for those organizations seeking access to a wide experience base.
- Oracle Real Application Clusters (RAC) with Automatic Storage Management (ASM) is becoming accepted as an enterprise-level DBMS platform for data warehousing capable of supporting large data warehouses (defined in the Market Definition as those bigger than 20TB). The scale-out configuration allows for flexibility (adding servers and storage without downtime) while providing a base for the high availability required by the new data warehouse SLAs being implemented.
- With the release of Oracle Database 11g, enhanced materialized view and cube management (notably transparent SQL access and incremental update) increases Oracle's capability to deploy end-user optimization layers with features not found in other DBMSs. It also includes enhancements to Oracle's partitioning option, including a Partition Advisor that will suggest types of partitioning to enhance performance based on the database schema. Although this new version does not have a great deal of experience in the market (its general release came in August 2007), early references show the usefulness, performance and resource savings of these features as expected (see "Oracle Database 11g Could See Early Adoption").
- Oracle supplies Oracle Optimized Warehouse Reference Configurations for several different server vendors as a "bill of materials" for building a balanced data warehouse platform, including servers, storage and DBMS software based on desired data warehouse requirements. Recently, Oracle announced the availability of the Oracle Optimized Warehouse for Dell and EMC an actual data warehouse appliance (see "Data Warehouse Appliances Are More Than Just Plug-And-Play") with a balanced, prepackaged configuration of Dell servers, EMC storage and the Oracle DBMS software packaged, sold and supported by Dell.
- Oracle is one of the most portable data warehouse platforms on the market (running on most hardware with Linux, Unix or Windows) and includes a free ETL tool (Oracle Warehouse Builder) with optional Data Quality.

- Oracle requires manual management of the optimization and storage needs in the data warehouse. Oracle has many data warehouse references that report source system extracted data volumes for small, midsize and large enterprise data warehouses. As expected (see Market Overview), these references report a range of storage sizes, primarily resulting from optimization, from five times the source system extracted data to as low as 1.5 times the source system extracted data.
- Oracle's pricing and contract practices continue to present issues for customers. One issue is the high renewal costs for maintenance, as Oracle may charge the 22% maintenance fee on a higher base than the original contract. Another issue is knowing which features are priced as part of the DBMS and which are chargeable options. Organizations are encouraged to remain aware of which options are licensed and priced separately, such as the Management Packs. Be sure to discuss support and contract negotiations with Oracle references.

- Sand Technology is a small DBMS vendor with a long history 23 years, and seven in the DBMS market. Its DBMS is a column DBMS, much like both Sybase IQ and the new startup Vertica. It has tried many different marketing approaches over the years, including being branded as an archive engine, due to the high compression ratios achieved with column storage DBMSs. In fact, due to its use of tokens in addition to the column store, it achieves greater compression than other DBMSs.
- Sand now has a partnership with SAP as a near-line data store (as opposed to an offline archive) for the SAP Business Information Warehouse (BW) in large installations where the size of the BW has grown to a degree that it is affecting performance. By integrating with Netweaver (SAP's middleware), an SQL query to the BW can be routed automatically to either the BW or the Sand near-line storage engine. The performance degradation is minimal and the transparency to the end user is an excellent feature.
- Sand continues have a loyal client base and with the new clients being slowly added from SAP, will keep it in the market, although in the Niche Players quadrant.

- Due to Sand's small size, just over 60 customers, it will continue to struggle against the larger vendors and venture-funded startups that can invest more in R&D, marketing and sales.
- Over the next several years, watch for SAP to acquire Sand as a near-line storage engine specifically for the SAP BW, as well as integrating the Sand DBMS as an analytic engine (due to the analytic engine capabilities of a column-based DBMS, as described in the Market Overview). This could be an issue for those Sand clients not using it with SAP, as SAP tends to focus on its own products.
- Although the SAP relationship is working well, having only one major partnership can be a risk. If the partnership should dissolve, Sand will be left without a major channel.
- Sand, as a column-based DBMS, faces the challenge of proving performance in an enterprise data warehouse environment (see Market Overview).

- Sybase has rebranded Sybase IQ as "Sybase IQ, the analytic server." Due to the column database structure, Sybase IQ achieves excellent data compression, ranging from 2x compression to, in some cases, 5x compression, depending on the structure of the data. Because analytics typically makes use of fewer columns but larger numbers of rows, Sybase IQ performs very well for analytic applications. The company has been consistently winning POCs with analytic applications, sometimes crushing the competition with greater performance by a factor of 100. This makes Sybase IQ an extremely desirable DBMS platform for an analytic data mart to optimize and enhance an organization's overall data warehouse architecture.
- Sybase has focused its sales force on the areas in which it stands out Sybase IQ analytic server and Mobility. The company has seen revenue growth for Sybase IQ as high as 40% quarter over quarter. This focus and rebranding has allowed Sybase to gain new customers. Its strong financial stability (14 straight quarters of revenue growth) has also helped dispel prospects' perception that Sybase is in decline and counter the negative marketing from the competition.
- Sybase IQ has also demonstrated an ability to support a more complete data warehouse environment. After entering an organization as an analytic server, it is able to prove the DBMS and then up-sell to a more complete solution.
- Recent alliances with the IBM System p division has also given Sybase a sorely needed new sales channel, which has been lacking in the past. It has also publicly discussed plans for an Analytic Data Warehouse Appliance with IBM System p.
- Sybase has added its ETL tool to the Sybase IQ analytic server, expanding its capabilities and possibly leading to additional prospects for its Data Integration products.

- Sybase continues to struggle with the image of being a DBMS vendor. Although it has made great strides in changing the market perception, it still needs to work hard to change the perception that it is small and that it could be an acquisition target or may sell off parts of the business. This requires Sybase to have a strong marketing program, which costs money.
- Sybase faces potential competition from new column-based DBMS vendors such as ParAccel and Vertica, both using the OSS DBMS PostgreSQL as a base. In addition, MySQL has entered the column-based DBMS competition with Infobright's column-oriented engine (BrightHouse), creating an analytic data warehouse solution.
- Sybase IQ, as a column-based DBMS, faces the challenge of proving performance in an enterprise data warehouse environment (see Market Overview).

- Teradata has been in the data warehouse business for more than 27 years, delivering only data warehouse solutions that have always been a data warehouse appliance. Now, with the acceptance of the term appliance, it will benefit from being the first with its longevity and track record. It is well positioned to lead in functionality for large, complex, mixed workload environments for several years, allowing it time to develop products that can compete with all the vendors in the market at the lower levels of complexity and database size.
- Teradata has about 1,000 customers, with annual revenue of about $1.6 billion, all from data warehousing solutions. It has specific strengths (for example, strong penetration, data models and professional services) in the vertical markets such as retail, financial and banking, telco and manufacturing. Teradata also continues to show solid growth, with more than 15 quarters of revenue growth, specifically from data warehousing.
- Because of Teradata's architecture, it is well positioned to support the new, modern mixed workload, as proven with both its Active Data Warehouse and Dual-Active Data Warehouse. Also, its complete solutions including the Teradata Data Warehouse data models and professional services dedicated to data warehousing set it apart from the rest of the market. Finally, it has had a SUSE Linux version in production for more than year and, with the new dual-core 5500 model, is now well positioned to take advantage of the price competitiveness of Linux with most new sales going to the Linux platform and clients reporting better performance on Linux rather than with MP-RAS (Teradata's proprietary Unix system).
- Teradata's management software is a clear strength, as it manages the entire data warehouse environment from the operating system to the workload, with software to manage the mixed workload, including a priority scheduling manager to prioritize the workload by application and/or groups of users.

- The primary concern for Teradata for the next three years is the fierce competition from the mature DBMSs (IBM DB2, Microsoft and Oracle) as they become stronger in supporting a mixed workload in the sub-10TB range. We are seeing from our inquires that this is causing smaller Teradata clients to rethink their strategy and use their OLTP DBMS of choice for the data warehouse. This will pressure Teradata into creating new offerings at the lower end of the database size to prevent this erosion, and also to allow Teradata to make the shortlist for new sub-10TB data warehouses.
- The second level of competition comes from the newcomers (DATAllegro, Greenplum and Netezza) that have shown they can compete at many sizes of databases, including hundreds of terabytes, at a price point well below that of Teradata. This will force Teradata to extend its product line to the lower end, both in size and price. This could lead to a period of reduced profitability as it adjusts to a lower price point.
- As Teradata leaves NCR, there will additional challenges as it creates its own company and image. It will need to prove it can "make it" on its own, as in the past before being acquired by NCR. It will also need to deal with a sales force that is accustomed to selling high-priced solutions, as the price points drop due to competition. This transition will have to happen quickly so it can stay in the competition for new data warehouses.
The Magic Quadrant is copyrighted
10 October 2007 by Gartner, Inc. and is reused with permission. The Magic Quadrant is a graphical representation of a marketplace at and for a specific time period. It depicts Gartner’s analysis of how certain vendors measure against criteria for that marketplace, as defined by Gartner. Gartner does not endorse any vendor, product or service depicted in the Magic Quadrant, and does not advise technology users to select only those vendors placed in the “Leaders” quadrant. The Magic Quadrant is intended solely as a research tool, and is not meant to be a specific guide to action. Gartner disclaims all warranties, express or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
© 2007 Gartner, Inc. and/or its Affiliates. All Rights Reserved. Reproduction and distribution of this publication in any form without prior written permission is forbidden. The information contained herein has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartner's research may discuss legal issues related to the information technology business, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner shall have no liability for errors, omissions or inadequacies in the information contained herein or for interpretations thereof. The opinions expressed herein are subject to change without notice.
|
|

|
|
|


|
|
automated storage management |

|
|
Balanced Configuration Unit |

|
|
business intelligence |

|
|
Business Information Warehouse |

|
|
database administrator |

|
|
database management system |

|
|
data warehouse |

|
|
enterprise data warehouse |

|
|
extraction, transformation and loading |

|
|
input/output |

|
|
Java Database Connectivity |

|
|
Open Database Connectivity |

|
|
online transaction processing |

|
|
open-source software |

|
|
real application clusters |

|
|
relational database management system |

|
|
Structured Query Language |
|
|
|