After we migrated to PostgreSQL last week, some commenters asked for more detail about technical rationale. I said I'd write something up, and since at this writing I'm sitting in a hotel room in Florida with nothing to do until tomorrow's (Monday's) MAVEN launch window starts, here it is.
The rest of this is more in the way of rough notes on the ups and downs than it is a coherent essay. It's not intended as propaganda for PostgreSQL. Right now I'm pretty happy we decided to put the effort into using it, but be sure to ask me what I think in six months.
(Note that I use "MySQL" herein as a shorthand for "MySQL and its derivatives/forks." No disrespect is intended for anybody's project or trademark. We've been using MariaDB specifically for a while, which I suspect is a good idea for people planning to stay on the MySQL train.)
Ok, so it's a million little things. MariaDB was starting to scare us. We
had this really bad afternoon after Randy realized that a straightforward
reporting query was yielding results that were super-explicitly excluded by
a condition in the WHERE
clause. ("I hate blue things," we says. "Here
are thousands of blue things!" says the database.) We kept getting into
situations where it seemed obvious that a halfway intelligent query planner
would have behaved much differently.
As I have mentioned before, we were sitting around drinking beer in the mountains one day and said well, what about PostgreSQL?
We're not doing any kind of crazy volume. We're working at a level where most of our data can be relational, or at least sort of relational, and we get to do things like join a bunch of tables all the time. There are features it's nice to have in a database if you're going to treat it like an actual relational database. (I have, for example, never had anything but a bad time with MySQL foreign keys. I am positively giddy about materialized views.)
PostgreSQL has a couple of features which make writing an ORM much nicer.
(RETURNING *
on inserts and updates, for example.) The one place
it seems to fall down is that it doesn't have a simple equivalent of
DESCRIBE
.
There's better GUI tooling for the MySQL family (phpMyAdmin is frustrating in a number of ways but good enough that everyone uses it anyway; the Mac users in the house seem to swear by Sequel Pro), but the stock psql CLI is actually pretty good.
The PostgreSQL query planner seems pretty smart.
Case sensitivity. String equality and LIKE
being case sensitive bit us
all over the place after the deploy, but this is clearly a saner default
approach. If you're migrating from MySQL and your users rely on search
features that look at a bunch of strings, you'll want to spend some extra
time on this, because it's one of those places that syntax doesn't really
change and nothing overtly breaks, but people are going to notice right away
that they aren't finding the stuff they used to.
We really need to look into the full text search stuff. And get a better handle on indexes on expressions.
Character encoding is hell.
I haven't used the regex syntax yet, but I'm sure I will sooner or later, if only in one-offs.
Not having a query cache has highlighted some places we were being pretty sub-optimal (read: stupid and/or the wrong kind of lazy). These have pretty much been solvable by chucking stuff at Memcached. For the one really tricky thing, consistent stock numbers on product listings, Casey wrote a little materialized view and a couple of triggers. It's rad. We will almost certainly solve a few other problems the same way.
Documentation and community. The PostgreSQL manual is a thing of quiet beauty. The mailing lists are, so far, pretty good.
Types. If there's an interesting technical story in a project like this, it's probably that we subjected a bunch of PHP code to a stronger, more deterministic type system and immediately discovered entire classes of bugs in our existing work. Many readers are presently saying "well, duh," but I think it's worth emphasizing that even if you're writing schlocky web code in some hacked-together terminally unhip gets-no-respect dynlang, you might still get a lot of mileage out of a more rigorous data store.
I used to think systems that wanted me to be real specific about types were kind of creepy and fascist in outlook, or at best sort of officious and bureaucratic. I'm not sure that's untrue of a number of them, but I'm starting to realize I need all the help I can get when it comes to consistency and precision, which puts the whole question in a different light. I also didn't used to care about unit tests, and now I clutch at them as a drowning man clutches a rope, because I write code in languages that offer almost no built-in mechanisms for formally specifying behavior.
(Shorter alternative to preceding paragraph: When I was younger, I didn't know anything.)
Using PDO against PostgreSQL gives us PHP native types for database values where before we were getting strings for everything (it may be possible to coax something like this behavior out of the MySQL drivers, but we basically stumbled into it here).
PostgreSQL has a boolean type, and does not think that true
is equivalent to 1
.
PostgreSQL cares about NOT NULL
constraints in a way that MySQL does not.
In dealing with this, we changed a certain amount of schema. Just about every time it
came up, we found ourselves fixing a bug we hadn't known existed (or hadn't had any
luck tracking down in the past).
(A strange bug: With prepared queries, PDO will resolutely insist that you're trying to use a null when you give it an empty string for a parameter. This is infuriating to diagnose.)
PostgreSQL hews substantially closer to the ANSI SQL standard than does
MySQL. You can get ANSI-like syntax out of MySQL by setting
SQL_MODE
, but this doesn't mean ANSI-compatible behavior.
SET SQL_MODE='ANSI'
was helpful because it let us do this project in a
couple of phases. We had to change up a bunch of queries, but the vast
majority of them just needed syntax changes and could continue running
against the existing MariaDB install. In exactly 63 places (I just checked)
we had to conditionalize something on type of database. A bunch of those
are reporting queries, which tend to be longer and more complicated; a few
are in libraries that generate SQL. All around, this doesn't seem too bad.
I guess that's about it for now. I'll amend this post if anything else comes to mind.
What do you miss about DESCRIBE, and how do you use it? I ask because we're working on a related feature for PostgreSQL 9.4, and your use case would be helpful. Thanks!
--Josh Berkus, PostgreSQL Project
Nothing super complicated: We just use the results to generate a bit of PHP that describes the structure of a table (column names and types, default values, null constraints, primary-key-or-not). All of this can be extracted from the
pg_*
catalogs with a more verbose query, but it didn't seem like there was a concise equivalent for getting the resultset we wanted. (I'll say here that maybe we missed something obvious, though I didn't think so at the time.)Thanks for asking. :)
Have you tried the standard
information_schema
tables? They're available in both PostgreSQL and MySQL with very similar semantics.I need to give this another look. I feel like there's some reason we didn't use
information_schema
for both databases, but if so I don't remember offhand what it was.Once I got used to the postgres catalog, I found it much more informative than information_schema. I do, however, miss 'show create table' or 'show create whatever'.
Now I know why database admins are an odd bunch of folks. (Odd being a good thing. If it wasn't for db admin or sys admin, I wouldn't be able to do my job).
I have to wonder why such a large company (I know you do multiples of revenue of other companies in the sector) didn't have a support contract with Oracle for MySQL. They are cheap in the big scheme of things and it seems like most of your gripes could have been solved by opening support cases.
Disclaimer, I do MySQL support for Oracle, so your reasons for switching give me even more questions.
I'll start with Postgresql is a very good product that I like a lot.
That being said, you guys should avoid using stored procedures with postgres. Using them can make it nearly impossible to migrate to any thing else in the future. Stored procedures can make some things super fast and awesome, but they can be a trap.
Keep up the good work there.
Stored procedures in postgres can be a very good solution for gaining tons of performance because postgres has a bit more transaction overhead than a dumb database like mysql. Using a stored procedure gets around this by putting a bunch of work into a single transaction.
I disagree that it makes things less portable, it just makes things less portable to a database that doesn't also have a decent procedure language. If you were to port this data to oracle or even mssql, you simply write a proc in that database to produce the same data and press on.
The point is that moving from mysql to postgres isn't super difficult (though it is time consuming), but moving from postgres back to mysql is pretty much impossible.
If / when you use the Regex syntax, you might want to try out RegExBuddy and RegExMagic from Just Great Software which compliment each other. The do basically all flavours and make things much easier. I don't have any affiliation with them.
If you want, you can try Adminer, it's simpler but more powerful than phpMyAdmin!
And to make things easier, it's a single 300kb php file.
Adminer
HeidiSQL has been great for my for pulling data out of MySQL on an ad-hoc basic
Did you use any of the replication features in MySQL and are you planning to do so on PostgreSQL?
Over the years we've worked with several different replication techniques available to MySQL, mysql-cluster, master-slave, DRBD, remote replicas on a time delays, etc. Each option has its own set of advantages over the next. However, the dominating theme across the board was brittleness. We like our systems to be as close to maintenance free as possible. At any given point a system should be able to restart and come back up cleanly, ready to serve.
As for replication in PostgreSQL, we're using the built-in streaming replication available since 9.1. We have multiple nodes in the datacenter ready to serve as master should the primary go down. Standby nodes are configured to provide cascading replication to other backup and non-production systems outside the datacenter.
Thanks for the followup. Glad the move has worked out for you.
I've had similar experiences with changing one fundamental thing challenging a lot of assumptions you hadn't even realized you'd made in seemingly unrelated areas.
Edit: Which storage engine were you using under MariaDB?
Pretty sure it was all InnoDB.
Yup.
I'm glad to see PDO is being used and a good thing. Around here, university students, professors, web develops, etc. mostly have little or no knowledge about PDO, and thus shun it as if it is some kludgey basement hackery not meant for proper development.
Generally, I feel like it's a pretty decent library. It feels more or less like a clone of the Perl DBI, which I've always liked.
Edit: There are definitely gotchas, though. The aforementioned empty-strings-get-treated-like-nulls glitch, for example. You need to be really aware of configuration like the oh-so-confusingly named
ORACLE_ATTR_NULLS
. Behavior varies a lot more widely between database drivers than you'd really want it to. So on and so forth. Still, in the context of PHP, you learn to expect this kind of thing.Hey now, I'm an Oracle DBA and I'm not weird. (or am I?)