This is the second part in a series of blog posts describing PostgreSQL analogs of common Oracle queries
One of the most intricate Oracle specific constructions is "START WITH ... CONNECT BY". According to Oracle's documentation, the syntax is:
SELECT [query] [START WITH initial_condition] CONNECT BY [nocycle] condition.
This statement is commonly used to traverse hierarchical data in the parent-child order. It's easier to illustrate how it works with an example.
Consider a table that stores opponents moves in a game of chess. Each table row contain coordinates (in algebraic notation) of a single move by whites and the move in response by blacks, as well as a column that references a preceding move, making it possible to keep multiple continuations of a specific move for the post-game analysis.
CREATE TABLE moves(id integer, parent integer, white varchar(10), black varchar(10));
The following statements describe 2 variants of a very short game, the first one leading to the early checkmate (known as a scholar's mate), and the second one to the position where black successfully avoids being checkmated.
INSERT INTO moves VALUES(1, 0, 'e4', 'e5'); INSERT INTO moves VALUES(2, 1, 'Qh5', 'Nc6'); INSERT INTO moves VALUES(3, 2, 'Bc4', 'g6'); INSERT INTO moves VALUES(4, 3, 'Qf3', 'Nf6'); -- checkmate is avoided INSERT INTO moves VALUES(5, 2, 'Bc4', 'Nf6'); INSERT INTO moves VALUES(6, 5, 'Qxf7#', NULL); -- blacks being checkmated
Let's build an Oracle query showing a sequence of moves that leads to the checkmate:
SELECT DISTINCT id AS final_move_id, LTRIM(SYS_CONNECT_BY_PATH(NVL(white,'')||':'||NVL(black,''),';'),';')||';' AS moves, LEVEL AS mate_in FROM moves WHERE white LIKE '%#' OR black LIKE '%#' START WITH id = 1 CONNECT BY PRIOR id = parent;
FINAL_MOVE_ID | MOVES | MATE_IN |
|---|---|---|
6 | e4:e5;Qh5:Nc6;Bc4:Nf6;Qxf7#:; | 4 |
The query instructs Oracle to look for a checkmate:
As a result, Oracle goes from one row to another only if the parent column of the new row contains the id of the current row, accumulating all visited rows in a result set. The SYS_CONNECT_BY_PATH clause produces a string out of the specified columns of the visited rows, connecting each (parent, child) pair by the designated character (';' in our case).
Being Oracle SQL extension, CONNECT BY is not available in PostgreSQL. Recent versions of PostgreSQL implement Common Table Expressions (CTE), SQL-standard way of dealing with hierarchical data. Here's one possible rewrite of the query above for PostgreSQL using recursive CTEs:
Read more...
Postgres-XC has been around for a while, it is primarily developed by NTT and EnterpriseDB. It has a small community but a dedicated engineering/hacker backing. Postgres-XC is interesting because it keeps reasonably up to date with the latest Postgres (1.0 is set to be based on 9.1 of PostgreSQL) but provides a shared nothing clustering architecture. This type of infrastructure is one of the holy grails of web based applications.
Should Postgres-XC deliver on its promises (hint: it does), you will be able to scale out (as opposed to up which Postgres already does extremely well) at an almost 1 to 1 ratio. This means that instead of having to purchase 2 large machines at 10-12k a piece you could purchase 4 machines at 1.5k a piece and achieve similar performance (theoretically, I need to test this). It also means that scaling out in the "cloud" will be easier.
I invite everyone interested in PostgreSQL to take a look at Postgres-XC. It is going to 1.0 soon and it needs community members to help flesh out the warts that haven't been found yet.
Another Postgres fork that has recently appeared is tPostgres. tPostgres (doesn't that look wrong at the beginning of a sentence?) is set to do to Microsoft SQL what EnterpriseDB did to Oracle, with one minor, small, interesting, exception: tPostgres is Open Source. Further Microsoft SQL is more in line with PostgreSQL in the types of workloads you usually see it performing. Imagine a tPostgres with Postgres-XC. Imagine an open source way to easily port Microsoft SQL apps to PostgreSQL.
Now don't get me wrong, the latest versions of Microsoft SQL are actually good products. Yes, I did just say that. However, they are not Open Source, they are expensive (comparatively) and let's get real, we want everyone to run Postgres.
Unfortunately tPostgres is only just announced and they are literally at the beginning of building their community but as it is being initiated by Denis Lussier (co-founder of EnterpriseDB), I imagine that he will come through with something very interesting indeed.
That video represents why I would put on the conferences. They were fun. We had a good time.
If you are looking for other Postgres conferences there are the following:
Personally, I would suggest staying local and attending or help organize a local PUG day for PostgreSQL. PUG days are the best in small conferences. You are meeting with many locals, quite a few contributors usually show up, and you get to go home at night. The content is always top notch and chances are you know many of the people there. There are many. We recently had them in NYC, DC/Maryland, and Austin. There is a Denver PgDay on the 26th of October (no website yet) as well.
Why does this matter? It doesn't really. I am just rambling because my sister asked me today something that surprised me, "What is UNIX?". I had to just kind of stare at the screen for a moment. Of course she asked me this as she was happily proclaiming that she received an iPhone for her birthday. How far we have come.
I explained what UNIX was, the basic history, it's involvement in the Internet and it occurred to me that for me, there was one very specific point in life that my professional world went from, "huh.... give me my 7.50/hr" to, "Hey, I can actually become educated in something useful.". It was the mental absorption of this book.
That book, allowed me to learn Unix, which allowed me to learn Linux (back when SLS was king), which brought me to Postgres95, which brought me to PostgreSQL, which brought me to co-writing this book, which lead me to be a major contributor to PostgreSQL not only through my work with the Fundraising group (via SPI)but also . I would also bring up the conferences but those are already mentioned today.
While waxing nostalgia I am reminded of a recent blog post by Bruce Momjian where he mentions, "Postgres adoption is probably five years behind Linux's adoption.". I would agree with him, and would add that a lot of it is directly contributed to our development model. Many in the community have argued for years that time based releases of PostgreSQL would help development, many others... have argued for years that this is a bad idea. Many of those opponents of time based releasing, and one very influential one at that (TGL) are now starting to come around. More on that later, I have work to do!
We are still actively working on PgNext: The Next PostgreSQL Conference. The folks on the organizing team have been an invaluable resource at helping us determine the direction of the conference. We have also been receiving a lot of emails thanking us for the selection of Denver as the location, many of them from new attendees. If you haven't submitted a talk yet, now is the time!
The FKLocks patch was unfortunately pushed to 9.3 due to some outstanding issues not the least of which was a performance regression under normal FK use. This is a large patch that team member Alvaro Herrera has been working on for a very long time. It is a patch that has the potential to greatly increase the performance of foreign keys. It has been a lesson in patience, evaluation of sponsored work (it was partially, and only partially sponsored), and resource allocation. Hopefully we can be done with this soon.
We have been working on a large Oracle 8i conversion to PostgreSQL. Our customers were not concerned with the data conversion: there are tools like ora2pg and oracle foreign data wrapper to accomplish this. They do, however, have a significant number of queries that needs to be converted.
Apparently, most queries from Oracle and PostgreSQL look similar; after all, both are relational database systems, as opposed to Cassandra or MongoDB, seeking to adhere to the same standards. Unfortunately, Oracle is known for the non-standard syntax extensions that are widely used by DB developers. Fortunately, the PostgreSQL community went great lengths not only to rely on SQL standards, but to use them as a source for the new powerful features.
How do recent versions of PostgreSQL stack up against Oracle syntax extensions, you may ask. The answer is very well; in fact, there is hardly anything in Oracle 8i syntax that cannot be emulated by PostgreSQL. Let's consider a couple of examples, starting from the one of the most peculiar syntax construction Oracle is known for, outer joins:
Oracle versions before 9 used a non-standard syntax for left and right outer joins; support for full outer joins was missing altogether. Suppose we have a schema to track orders:
CREATE TABLE users(id INTEGER, email VARCHAR2(200));
CREATE TABLE products(id INTEGER, name VARCHAR2(200), price FLOAT(126));
CREATE TABLE orders(id INTEGER, userid INTEGER, productid INTEGER,
quantity INTEGER);
A typical LEFT JOIN, returning all users that haven't made any orders, looks like this:
SELECT u.id FROM users u, orders o WHERE o.userid IS NULL AND u.id = o.userid (+);
Oracle's OUTER JOIN is represented by a special WHERE clause with a '+' sign added. The '+' denotes the nullable side of the join: we are looking for users that don't have any rows associated with their ids in the orders table. The PostgreSQL equivalent of this query will be a straightforward:
SELECT u.id FROM users u LEFT JOIN orders.o ON (u.id = o.userid) WHERE o.userid IS NULL;
Clearly, the rule is simple: if you see a '(+)' in the Oracle's WHERE clause - it's really an outer join; if a '+' sign is on the left side of the binary operator - it's a RIGHT JOIN and vice versa.
Let's consider an Oracle construction that is a little more complex to emulate, Oracle's rownum column. This is a special pseudo-column that numbers rows returned by an Oracle query. For instance, to get positions of all products in the product list ordered by price, one can issue the following Oracle query:
SELECT rownum as position, id, name, price FROM (SELECT * FROM products ORDER BY price);
If 'products' stores data in no particular order, the rownum value would be different from the product id.
Unlike Oracle, PostgreSQL doesn't have a rownum column, so what would we do with the query above? Turns out, we can emulate it with the help of window functions, a feature available since PostgreSQL 8.4. These functions are capable of performing calculations over related group of rows called partitions. In our example, we need to apply the function called row_number() over the whole result set (denoted by the empty PARTITION BY clause), using ordering by price to compute the row number:
SELECT row_number() OVER (ORDER BY price) as position, id, name, price FROM products ORDER BY price;
Note that there is a subtle problem in the queries above. Different positions will be assigned to the items with equal price points. While there is no easy way to fix this in Oracle 8i, we can assign equal positions to the items with no price differences by switching the window function from row_number() to rank() in the PostgreSQL case:
SELECT rank() OVER (ORDER BY price) as position, id, name, price FROM products ORDER BY price
There is another common use case of Oracle's rownum column, to limit the number of rows returned from a query like this (used to extract the 5 most expensive products):
SELECT * FROM (SELECT * from products ORDER BY price DESC) WHERE rownum <= 5;
We need neither window functions, nor subqueries in the PostgreSQL equivalent of this query:
SELECT * FROM products ORDER BY price DESC LIMIT 5;
Easy, right?
You are probably wondering why am I comparing the recent versions of PostgreSQL to Oracle 8i, being almost 15 years old now? The reason is, there are a number of customers still running such an old products not ready to shell out hundreds of thousands for an upgrade to a new Oracle major version. PostgreSQL might be the best and cost-effective way to get their data to a modern relational database system and, of course, upgrades to new major versions are free (and can be performed in-place with pg_upgrade).
There are more examples I'd like to demonstrate (including the conversion of Oracle's CONNECT BY clause), but this post is already getting too long, so I'm wrapping up for now. Stay tuned for further posts!
This is something I have seen many times now: a customer calls us because they
lost some data and they want help recovering.
Now you must be wondering: surely if they lost data they can just recover from
their last backup, right? Right — they had that. However we know that
pg_dump takes a while to run and is stressful on the server, so it's normally
run just once a day or so. What happens if you've been running almost a full
work day since your last backup? It's a lot of data to lose.
Come on, admit it: you've always wanted to display the infomask bits from a tuple header in a human-readable manner, but you've never gotten around to it and you still keep htup.h in display while you peek around tuples.
Fortunately, that time is now past! Here's a short and simple recipe to decode the bits for your reading pleasure. Gone is the htup.h cheat sheet. Here's what you need: