Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Discovering less-known Postgres v12 features (cybertec-postgresql.com)
243 points by samsk on Nov 28, 2019 | hide | past | favorite | 30 comments


It was added in 11, but I want to mention again how great of an addition the websearch_to_tsquery() function was. You can now get a respectable search system immediately in PostgreSQL without having to implement any sort of query-parsing or special data indexing yourself. All you need to do is add a column/index for fulltext search, which you can even do now using the generated columns added in 12 (instead of a trigger).

Then just pass user-entered queries through websearch_to_tsquery() and have access to multiple search capabilities easily. It supports searching for alternatives with "or", phrases in quotes, and excluding terms with a minus sign, which is all syntax that a lot of people are already used to. For example it can handle a query like:

    blizzard or overwatch -"hong kong"
That would find anything with "blizzard" or "overwatch" in it, but exclude anything with "hong kong" (if you wanted to avoid results about that recent controversy).

It's really useful, and makes it so that you can add reasonable search functionality with almost no work at all.


That is really cool and definitely a massive timesaver (I've written my own primitive query NLP for SQLite fstext indices and can definitely appreciate this feature!)

That said, (and only because your example reminded me) I absolutely hate the NLP `or` operator. I feel like any language to offer an `or` operator needs to require the simultaneous usage of a grouping operator (parentheses, etc.) to clarify the order of operations. A space is an implicit `and` so I never know if the `or` is working on directly adjacent keywords or the query as a whole.


Inlining CTEs is a big deal but they also finally have a way to see the progress of an index build without a PhD in postgresology!

Check it out here: https://www.postgresql.org/docs/12/progress-reporting.html


I've found CTEs to be one of the few ways to reliably force Postgres evaluation order, which is sometimes quite poor (I have specific queries that run 100x faster with CTE), so retaining old behaviour with materialized is something I didn't know about in 12.


How this should be done was the big roadblock, the actual inlining was a relatively simple patch.


My #1 request for indexes from PostgreSQL or 3rd party tool is concurrently rebuilding indexes.

I have had several cases where query performance dropped significantly (unfortunately in managed AWS Aurora DB, so I have limited inspection abilities).

Index rebuilds is the solution but it locks the entire table.

You can get a concurrent rebuild by creating a new index concurrently, then transactionally drop the old one and rename the new one. It's error prone though.


You will need to pick a new #1 request since PostgreSQL 12 finally delivered REINDEX CONCURRENTLY. :)


!!!!!

Wonderful! (Now I only have to wait 12 months for AWS Aurora to upgrade.)


PostgreSQL 12 has REINDEX CONCURRENTLY.


> You can get a concurrent rebuild by creating a new index concurrently, then transactionally drop the old one and rename the new one. It's error prone though.

actually you can't since you can't create/drop indexes concurrently inside a transaction.

btw. my #1 request would be a better materialized view (which supports indexes) and also makes the performane of some stuff really better. unfortunatly you can't refresh it without a full seq scan. it would be great if postgres would have an auto updating materialized view.


You can certainly drop, or at least rename, inside a transacion. Only the rename part needs to transactional.

> it would be great if postgres would have an auto updating materialized view.

100%. Check out Incremental View Maintainence https://wiki.postgresql.org/wiki/Incremental_View_Maintenanc...

There's been some work done on it, but it's a pretty difficult problem. If/when it comes to PostgreSQL it'll come for only very simple queries.


Thanks for the interesting link on IVM - I hadn’t heard about that work. Crikey, it’s a humbling one to solve generally reading through that page.

I have a case of this where I do the incremental updates myself. In my case, figuring out which rows have changed isn’t that tough, I’ve got time stamps to lean on that simplify the process. More complex is that even plucking the data for a single row is slow if you’re not careful. I have a number of large subqueries that Postgres doesn’t optimise well if I just restrict the rows once in the outer query. Instead I need to be very specific with my restrictions in a number of duplicated locations throughout the query to get fast incremental updates. It works well now it’s up and running but it’s a little clumsy to set up.


It gets much easier if you are willing to perform "extra" query work to update the view.

IVM is taking a sophisticated approach that results in a minimum of required work, even in cases of (some) aggregations.


I wouldn't recommend dropping large indexes within a transaction. Use DROP INDEX CONCURRENTLY instead, Without CONCURRENTLY it has to lock the table which can cause issues in high traffic databases.

But this is mostly a moot point since 12 has REINDEX CONCURRENTLY.


> it has to lock the table which can cause issues in high traffic databases

In my case, I'm okay with a seconds-long lock while an index is dropped. An hours-long lock while an index is created is the real show stopper.

> But this is mostly a moot point since 12 has REINDEX CONCURRENTLY.

Fantastic.


This is worth a lot - thanks for the pointer!


SQL/JSON path expressions look very powerful for making queries easier to understand, they’re a part of the SQL spec I hadn’t seen yet.

I’m a little sad though to see yet another JSON location mini-language appear. We’ve got JSONPointer (IETF standard for the web, used by jsonschema), JSONPath, the more expressive but cross language JMESPath, the CLI favorite jq, and presumably a billion more. It’s wearing me out.


One nice thing about the CTE inlining its that materialised CTEs were stopping parallelism from being used. WITH is useful in big OLAP queries, where parallelisms should really help.


OTOH, it slowed down one of my queries from a few ms to 3s just by upgrading. So it's not some magic absolute benefit.

I had to go and add "materialized" to a few places.


Yeah, if the planner estimates are bad the inlining can produce a worse query plan.


Are there any good advanced SQL books out there?


I found Dimitri Fontaine's https://theartofpostgresql.com/ advanced. Learned lots of new tricks out of it.


Looks nice and it's on discount because of Black Friday. Thanks!


Probably not what you're asking, but this book is fantastic: https://sql-performance-explained.com/

Edit: and topically, free for the weekend! https://twitter.com/MarkusWinand/status/1199982940471607297


Not what I were asking about but definitely something that I need. Thanks a lot for the reference and wow, lucky timing!


As someone who spent the past week working on database abstractions, I'm pretty sure this is impossible - you can find Basic SQL books, and you can find Advanced Postgres / Advanced MySQL / Advanced Oracle :P

The basic book would pretty much just cover "select * from table". Everything beyond that, for instance "select * from table limit 10", is implementation-specific...

(I'm not even exaggerating: https://en.wikibooks.org/wiki/SQL_Dialects_Reference/Select_... )


That's what I was afraid of but still asked if there is Advanced ANSI SQL book.

Thanks!


Wasn't aware of log_transaction_sample_rate - should be useful for not spamming the logs.


Site blocking the page with an empty overlay named "BorlabsCookie". No thanks.


The cookie acceptance on this site is as good as it gets IMO. Clear, detailed, privacy policies linked, with the ability to accept a reasonable, minimal subset.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: