Tuesday, June 5, 2007

PostgreSQL More Functional for Commercial Open Source SaaS

At Lumen Software we selected PostgreSQL as the embedded database for our Lumenation environment and applications four years ago. This means we run on a LAPP stack rather than a LAMP stack. This decision has proven to be the right choice as other options such as MySQL even today do not match PostgreSQL’s capabilities. MySQL has made great strides in functionality in recent years and is NOT a bad solution by any means. However, currently, it still lacks some of the more advanced features we need within the Lumenation environment. PostgreSQL provided the database functionality and performance we were used to from commercial solutions. The limitations in MySQL made it actually a very easy decision to choose PostgreSQL.

Here is a very short list of the missing features in MySQL when we made our decision four years ago:
• No Sub-queries
• No Views
• No Triggers
• No Procedures and Functions (except if developed in C++)
• Record Locking for whole table but not for record only

We have re-evaluated MySQL several times over the years. We attended the MySQL conference last year and heard about many changes MySQL was going make during the coming year, so we just recently assembled the team for another review (early 2007). In summary, we’re sticking with PostgreSQL. Below are the findings of this current evaluation.

• Record locking in MySQL can only be set for the whole table. PostgreSQL however is able to record lock at the record level. This is a crucial factor in commercial, heavy traffic transaction based applications. For example, if a user accesses a record for update, in MySQL all other users will wait until that user has updated their information.

• PostgreSQL has Schemas, Triggers and Checks and MySQL does not. These allow us to create logic at the database level which guarantees data integrity.

• PostgreSQL permits the creation of User Types such INT, REAL, etc., as arrays, arrays of arrays, etc. It also uses it as the Type for a Database Field. Developers are able to store and work in these database objects without conversion to regular data types. This allows developers to work with objects on an application level, read and write directly to the database. This provides better performance and data integrity.

• MySQL provides no Inheritance of tables. This allows us to use an Object Oriented Method while developing a Database Structure. This is used when creating and maintenance databases structures.

• There are no sequences in MySQL, only auto_increment. PostgreSQL supports sequences, and this is a very useful tool if you need to store «Master-Detail» joined data. This is also useful if you need to obtain a unique sequence of numbers and guarantee they are unique.

In general our team felt PostgreSQL was more stable and a better solution for complex applications. MySQL is better suited for less complex web applications. Cost was also a factor. MySQL is not free when used commercially. PostgreSQL is under the BSD license, so it may be distributed freely. Because of this we were also able to create a very easy to use installation process for our clients. And, after all, that’s what it’s all about. Providing the best tools for our customers.

6 comments:

Unknown said...

Hi! I think you may want to assemble another team to do your reviews... :)

"so we just recently assembled the team for another review (early 2007). In summary, we’re sticking with PostgreSQL. Below are the findings of this current evaluation.

• Record locking in MySQL can only be set for the whole table. PostgreSQL however is able to record lock at the record level. This is a crucial factor in commercial, heavy traffic transaction based applications. For example, if a user accesses a record for update, in MySQL all other users will wait until that user has updated their information."

This is not true. InnoDB storage engine uses MVCC and row-level locking.

"• PostgreSQL has Schemas, Triggers and Checks and MySQL does not. These allow us to create logic at the database level which guarantees data integrity."

Again, not true. By "Schemas", I believe you refer to INFORMATION_SCHEMA? If so, MySQL has had this for a while now (however, it does not "allow us to create logic at the database level which guarantees integrity"). Triggers are fully supported by MySQL, but CHECK constraints are not.

"• PostgreSQL permits the creation of User Types such INT, REAL, etc., as arrays, arrays of arrays, etc. It also uses it as the Type for a Database Field. Developers are able to store and work in these database objects without conversion to regular data types. This allows developers to work with objects on an application level, read and write directly to the database. This provides better performance and data integrity."

While it is true that MySQL does not support user defined data types, it is patently untrue that UDDT have anything to do with performance.

"• MySQL provides no Inheritance of tables. This allows us to use an Object Oriented Method while developing a Database Structure. This is used when creating and maintenance databases structures."

This doesn't make any sense to me. "Inheritance of tables"? You mean relationships? If so, yes, absolutely, MySQL provides FOREIGN KEY CONSTRAINTS for relating an inheritance of data keys. Heck, if you want an object-oriented database, use Cache, not PG! :)

"• There are no sequences in MySQL, only auto_increment. PostgreSQL supports sequences, and this is a very useful tool if you need to store «Master-Detail» joined data. This is also useful if you need to obtain a unique sequence of numbers and guarantee they are unique."

This is completely bogus. AUTO_INCREMENT keys provide uniqueness up to 2^64 using BIGINT. Master-detail relationships having nothing to do with the data type, and everything to do with foreign key constraints, which MySQL fully supports. In essence, statements like these make your eval team look kind of amateurish.

"In general our team felt PostgreSQL was more stable and a better solution for complex applications."

Without benchmarks and real data, statements like this are simply silly.

"MySQL is better suited for less complex web applications. Cost was also a factor. MySQL is not free when used commercially. PostgreSQL is under the BSD license, so it may be distributed freely. Because of this we were also able to create a very easy to use installation process for our clients. And, after all, that’s what it’s all about. Providing the best tools for our customers."

This, IMHO, is the real reason you went with PostgreSQL. You want to charge for your software and not pay for the software you are using internally.

Cheers,

Jay

rasputnik said...

So why do you think people still prefer MySQL?

I ask because you said you had re-evaluated it a few times, so I'm assuming you had a reason to want to switch.

rasputnik said...

Jay: the guy says he felt it was better. That's not 'just silly', that IMO is the reason most people choose one over the other.

The MySQL licensing model is much more hassle to understand than PostgreSQLs. If you understand it, well done - but it puts off a lot of people (and we don't all want to 'steal' code as most GPLers seem to think).

Unknown said...

@Dick:

Hi! You said:

"The MySQL licensing model is much more hassle to understand than PostgreSQLs. If you understand it, well done - but it puts off a lot of people (and we don't all want to 'steal' code as most GPLers seem to think)."

Please provide us examples of how MySQL's licensing model (GPL for open source distribution, commercial license for embedded, non-open source) is a hassle to understand. We can't make the licensing more "hassle-free" unless we get specific examples of what is hard to understand. Help us be better! :)

Also, I have expanded on my comments above here:

http://www.jpipes.com/index.php?/archives/167-Faulty-Statements-on-Lumen-Software-Decision-Irk-Me.html

Cheers,

Jay

Tad Gordon said...

Jay,

Thanks very much for the detailed comments. I always enjoy interactions with people who know and love open source.

Some of the features mentioned in the article relate to selecting the RDMS in 2003. In 2003 the stable version of MySQL was version 4. And that version was very poor in features. Nested selects were not allowed, (could not make statements like SELECT ... FROM (SELECT ... FROM ) ...). There was limited use on stored procedures, foreign keys, no triggers, etc. The latest stable version of MySQL does have some of those features, but it still doesn't measure up to what PostgreSQL has to offer.

> Hi! I think you may want to assemble another team to do your
> reviews... :)
>
> "so we just recently assembled the team for another review (early
> 2007). In summary, we’re sticking with PostgreSQL. Below are the
> findings of this current evaluation.
>
> • Record locking in MySQL can only be set for the whole table.
> PostgreSQL however is able to record lock at the record level. This
> is a crucial factor in commercial, heavy traffic transaction based
> applications. For example, if a user accesses a record for update,
> in MySQL all other users will wait until that user has updated
> their information."
>
> This is not true. InnoDB storage engine uses MVCC and row-level
> locking.

Your comment is true, InnoDB in MySQL 5 supports row level locking, but you should not underestimate the importance of record vs. table locking for high volumes of insert, update, delete operations.

> "• PostgreSQL has Schemas, Triggers and Checks and MySQL does not.
> These allow us to create logic at the database level which
> guarantees data integrity."
> Again, not true. By "Schemas", I believe you refer to
> INFORMATION_SCHEMA? If so, MySQL has had this for a while now
> (however, it does not "allow us to create logic at the database
> level which guarantees integrity"). Triggers are fully supported by
> MySQL, but CHECK constraints are not.

Here is why schemes are important for us. In Lumenation we have over 600 tables divided into over 30 modules. Each module has it own set of functions, views, sequences, etc. How easy would it be to work with those tables if they were one giant list? Furthermore, if we had database applications from 2003 with MySQL 4, which has no triggers and limited foreign key functionality, etc... and now I am migrating to MySQL 5 and want to use foreign keys. When I try to create them, it will error off due to data integrity. I have to extensively clean my data before migrating. The better the RDMS allows to incorporate business logic on the database level the easier it is to write sophisticated commercial applications. Therefore triggers, foreign keys, checks, user-defined data types, rules, sequences are all handy to have.

> "• PostgreSQL permits the creation of User Types such INT, REAL,
> etc., as arrays, arrays of arrays, etc. It also uses it as the Type
> for a Database Field. Developers are able to store and work in
> these database objects without conversion to regular data types.
> This allows developers to work with objects on an application
> level, read and write directly to the database. This provides
> better performance and data integrity."
>
> While it is true that MySQL does not support user defined data
> types, it is patently untrue that UDDT have anything to do with
> performance.

IMHO, if a developer has flexible data types it will make development easier and business logic more straight forward. And there will be no need to emulate a custom data type by default types. Let me illustrate the point by an example. There is a contrib module for PostgreSQL for implementing Full Text Search - TSearch2. Once installed, it will create custom data type TSVector, which is a sentence broke down into words by word stems. Additionally, it will create custom comparison operator @@ for use in the select statements. Now, as a developer, it is easy for me to understand and work with this module because it follows the rules of logic. Would it be easy if it was done using the standard data types? I do not think so. Besides, since TSearch2 was mentioned, the last time I googled, I could not find a decent full text search library for MySQL that would do a bit more then imitate LIKE operator with some hints. But according to your comments, one might think, who need this extra function, the only thing the extra features do is slow down the database performance. Nowadays, it is not just the performance that is important, but also the functionality of applications.

> "• MySQL provides no Inheritance of tables. This allows us to use
> an Object Oriented Method while developing a Database Structure.
> This is used when creating and maintenance databases structures."

> This doesn't make any sense to me. "Inheritance of tables"? You
> mean relationships? If so, yes, absolutely, MySQL provides FOREIGN
> KEY CONSTRAINTS for relating an inheritance of data keys. Heck, if
> you want an object-oriented database, use Cache, not PG! :)

IMHO, I agree :)

> "• There are no sequences in MySQL, only auto_increment. PostgreSQL
> supports sequences, and this is a very useful tool if you need to
> store «Master-Detail» joined data. This is also useful if you need
> to obtain a unique sequence of numbers and guarantee they are unique."
>
> This is completely bogus. AUTO_INCREMENT keys provide uniqueness up
> to 264 using BIGINT. Master-detail relationships having nothing to
> do with the data type, and everything to do with foreign key
> constraints, which MySQL fully supports. In essence, statements
> like these make your eval team look kind of amateurish.

It is not the point that MySQL cannot generate sequences using AUTO_INCREMENT type, but the usefulness of sequences in PostgreSQL. If in PostgreSQL I can get next value of the sequence, in MySQL I need to emulate it by creating an additional field or breaking sql into two transactions. "Master-detail" was used as an example of such task where it can be used. For example, in one transaction I can insert into both the master and the detail table, using sequence feature cur_value which returns current value of the sequence that was given to the master table (and if something's wrong I can roll back the entire transaction). Additionally, if I need the value to be incremented by a value other than 1 or based on a function, I can easily do it in PostgreSQL, and it would be harder to do in MySQL. The comment that it is harder to use sequences is not true because you can define it as SERIAL type, which creates the sequence on the background. The word SERIAL shorter then AUTO_INCREMENT and in this sense it is even easier.

> "In general our team felt PostgreSQL was more stable and a better
> solution for complex applications."
>
> Without benchmarks and real data, statements like this are simply
> silly.

It is our experience and of course "IMHO" while we were creating Data Warehousing systems with large volumes of data and complex queries, PosgreSQL was showing excellent performance. I do not argue that for simple tasks MySQL may offer faster performance.

> "MySQL is better suited for less complex web applications. Cost was
> also a factor. MySQL is not free when used commercially. PostgreSQL
> is under the BSD license, so it may be distributed freely. Because
> of this we were also able to create a very easy to use installation
> process for our clients. And, after all, that’s what it’s all
> about. Providing the best tools for our customers."
>
> This, IMHO, is the real reason you went with PostgreSQL. You want
> to charge for your software and not pay for the software you are
> using internally.

No question that it is also an important fact. Why should we pay more if we can use the same or better product for free? Why should I buy a Toyota Camry (new models even come with GPS), when I can get a Land Cruiser free of charge?

The discussions what is better may go on and on, with multiple points adding up on both sides. In this post we shared our experience why we think PostgreSQL is more suited for SaaS model.

Thanks, Jay.

Unknown said...

@Jay:

My understanding of MySQLs INFORMATION_SCHEMA is that it is similar to PostgreSQL's pg_catalog system database - a catalog of system data regarding databases, tables and other objects. This is NOT to be confused with PGs support for SQL's schema concept, which is effectively a namespace within a database. Each object in the db belongs to a schema. This greatly eases the organisation of objects in a complex database by grouping them into logical schemas.


Table inheritance is an optional clause in CREATE TABLE, and it has nothing to do with FK relationships. From the PG manual:

"The optional INHERITS clause specifies a list of tables from which the new table automatically inherits all columns. Use of INHERITS creates a persistent relationship between the new child table and its parent table(s). Schema modifications to the parent(s) normally propagate to children as well, and by default the data
of the child table is included in scans of the parent(s). "


AUTO_INCREMENT is not equivalent to sequences. Sequences are db objects in their own right, typically (but not exclusively) used to generate column ids. They have their own 'next value', interval and min/max parameters and can be updated 'manually' using normal SQL.

No mention has been made of transaction support either. PG has had this from Day 1, and MySQL's recently-added support is still deficient for serious use. Not being able to COMMIT/ROLLBACK DDL statements makes running complex db build/patch scripts unacceptably risky.


In the end it comes down to choosing the right tools for the job, and despite your (mostly incorrect) assertions it sounds like Tad's team made a well-considered choice based on their criteria. In fact I think it is rather insulting to suggest their only criteria was that they were unwilling to pay for the software.

BTW Tad, I have a data-migration tip using PostgreSQL's Query Rewrite (Rules) system on my blog if you're interested :) http://voidstar.wordpress.com/2007/01/26/quick-tip-for-database-upgrade-migration-hell/

sim