PostGreSQL vs Microsoft SQL Server (comparaison) – partie 3 : comparaison extrêmement détaillées des fonctionnalités


Dans un article de Pankil Shah, de la firme EnterpriseDB  qui vend une version payante de PostGreSQL, cet auteur essaye de comparer Microsoft SQL Server à PostGreSQL…

https://www.enterprisedb.com/blog/microsoft-sql-server-mssql-vs-postgresql-comparison-details-what-differences

Plusieurs fausses informations sont sont données dans cet article…
C’est pourquoi j’ai décidé de publier un correctif à ces informations.

Publié en 2021-04-01

Qui suis-je ?

Actuellement, j’ai 60 ans et je suis dans le domaine de la base de données depuis le tout début. Dans mon pays (France) je suis considéré comme un expert en SGBDR et pour le langage SQL. J’ai travaillé, depuis que je suis sur le marché, avec les SGBDR Ingres (CA), DB2 (LUW), InterBase (Borland), Gupta SQL, Informix, Oracle, Sybase SQL Server, Sybase ASA (Watcom), RDB (Digital), MS SQL Server, PostGreSQL et MySQL …

Après avoir travaillé pour de nombreuses sociétés informatiques ou éditeurs de soft, je décide de fonder la mienne en 2007 appelé SQL Spot sur la Côte d’Azur. D’ailleurs, je voyage en France, en Europe et plus encore, pour donner mes conseils, évaluer et auditer des petites ou grandes entreprises qui ont besoin d’être aidées pour décider, perfectionner et tirer pleinement parti de leur SGBDR … En même temps, j’ai donné des cours dans les écoles d’ingénieurs ainsi qu’à l’université

Je suis toujours actif et en tant qu’homme d’expérience, je sais prendre une distance suffisante pour évaluer les risques et les bénéfices d’un système particulier.

Certaines personnes pensent que j’agis contre PostGreSQL … Mais comme le dit le dicton « qui aime bien, châtie bien« . PostGreSQL doit avoir sa place au soleil, mais sa juste place. Le considérer comme égal à un Oracle, un IBM DB2 ou un SQL Server peut s’avérer catastrophique dans certains cas, et en particulier lorsque les études préalables tant techniques que fonctionnelles n’ont pas été sérieusement établies.

Certains intégristes de PostGreSQL ont réussi à introduire leur SGBD favori sans étude préalable ou sans preuve de concept, benchmarks ou simples tests, et cela s’est souvent traduit par des aventures catastrophiques avec un retour en arrière coûteux (cas de l’assurance retraite Agic Arco en France par exemple), un abandon de la solution après des années de misère (cas du CAF en France par exemple qui est revenu récemment au sein d’Oracle), ou d’Altavia dans lequel un nouveau CTO a tenté de migrer de SQL Server vers PostGreSQL sans succès…), voire des solutions coûteuses en termes de fonctionnement, comme c’est le cas du site français « Le Bon Coin » qui utilise plus de 70 serveurs PostGreSQL et donc près d’une centaine de bases de données où toutes les autres n’utilisent qu’un seul serveur et une seule base de données. « Le Bon Coin » ayant prudemment opté pour SQL Server pour la partie BI de son service analytique (avec un seul serveur) contre toute solution basée sur PostGreSQL …
Mais il est clair que dans certaines entreprises, quelques ayatollahs PostGreSQL en viennent à imposer une solution inadéquate, même si cela coûte plus cher que ce qu’elle est censée apporter, et de manière intentionnelle, afin de garder un pré carré et devenir le roi des données, au détriment des utilisateurs et de la qualité de service que l’on est en droit d’attendre d’une bonne informatique d’entreprise !

C’est donc pour ne pas avoir de déceptions et surtout pour décider en toute connaissance de cause, que je vous invite à lire ce rectificatif, qui détaille les différences entre PostGreSQL et Microsoft SQL Server afin d’orienter votre choix.

Enfin, je tiens à dire que je n’ai aucun lien professionnel avec Microsoft (qui ne m’a jamais envoyé de client par exemple…) et que je n’ai aucune action boursière MS et que je n’ai jamais vendu de produit MS, donc je ne profite en aucune manière de Microsoft. Je suis un simple utilisateur professionnel des produits Microsoft, expert en bases de données et spécialisé dans SQL Server … Oui, j’ai été Microsoft MVP (Most Valuable Professional) pendant de nombreuses années, récemment terminé, et dans ce cadre j’ai été invité à Redmond (mais j’ai payé mon propre billet d’avion), pour être informé, tout d’abord, des nouveautés de Microsoft SQL Server, exactement comme le fait la communauté PG, avec les « PG days », auxquels j’ai participé par le passé.

Quels sont les différences entre PostgreSQL et SQL Server en termes de licences ? Comparaison entre PostgreSQL et MS SQL Server sur le modèle de licence

L’auteur ne donne le prix des licences que pour deux versions de SQL Server (Standard and Enterprise).
Mais il existe au moins autres version plus une dans le cloud.
L’une des version, SQL Server Express est gratute mais limitée à 10 Go de données relationnelles par base, avec un maximum de 32760 databases. Le volume des données non relationnelles (FILESTREAM, Filetable…), stockées par le base, n’étant pas limité.
Une autre version appelée Web et qui ne peut être utilisée qu’en mode « SLA », nécessite de payer quelques 50 $ per month. Les limitations portent sur le hardware de la machine hôte (max 4 sockets 16 cores / 32 avec l’hyperthreading et 64 Go de RAM pour les données relationnelles de type tables + 16 Go de tables « in memory » que PostGreSQL ne propose pas + 16 Go d’indexation verticale (columnstore) que PostGreSQL ne propose pas non plus).

EnterpriseDB, commercialise une version payante de PostGreSQL qui corrige certains inconvénients de la version gratuite, comme le manque de tag (ou indicateur) de requête …

Une autre importante différence que l’auteur a omis de préciser est que SQL Server est fourni avec une pile complète de services opérationnels tels que :

Sans devoir payer un quelconque extra….

Aucune de ces solutions (ETL, reporting, BI, datamining) n’existe dans PostGreSQL. Mais certaines solutions existent en tant que logiciel libre, avec quelques limitations qui les rendent impropres à l’utilisation en entreprise (par exemple, l’ETL « libre » Talend est bridé au niveau des fonctionnalités et requiert une version payante lorsque vous vous trouvez au pied du mur avec une importante quantité de données dont vous aimeriez bien que tout cela aille nettement plus vite – parallélisme en particulier…).
Évidemment l’intégration de toutes ces solutions disparates venant de différents éditeur pose souvent quelques problèmes inconnus dans le monde de la suite Microsoft (ce qui en a fait de tout temps sa force)…

Lequel de PostgreSQL où SQL Server est le plus simple à utiliser ? Comparaison de la facilité d’utilisation entre  PostgreSQL et MS SQL Server

Pour être complet, l’origine de SQL Server vient de Sybase SQL Server (né en 1983, d’INGRES!), Et dérivé en deux branches, l’une conservée par Sybase renommée ASE (Adaptive Server Enterprise) et l’autre acquise par MicrosoftS SQL Server. Il est vrai que la première version Microsoft de SQL Server a été réalisée en 1989.

L’auteur semble dire que SQL Server n’a pas de fonctionnalité relationnelle d’objet … C’est faux. Le stockage des objets peut être effectué par codage dans un langage .net (CLR) et intégré en tant que types de données de n’importe quelle base de données, et ces objets peuvent avoir des méthodes pour les faire fonctionner. Tous les objets SQL CLR créés résident dans la base de données et non à l’extérieur … Enfin, tous les objets peuvent être indexés pourvu qu’ils soient « byte ordered ».

De plus, SQL Server dispose de certaines fonctionnalités noSQL que PostGreSQL n’a pas : tables de graphe, « big table » (jusqu’à 30 000 colonnes), tables « in memory » (pour les tables paires clé-valeur mais pas seulement …) et table pour le stockage de document via FILESTREAM / FileTable avec recherche Fulltext et sémantique.
Aucune de ces fonctionnalités ne sont disponibles dans PostGreSQL.
En sus SQL Server possède un type de données XML qui peut être intégralement indexé. L’indexation du XML n’existe pas dans PostGreSQL !

Quels sont les différences de syntaxe entre PostgreSQL et SQL Server ? Comparaison de la syntaxe entre PostgreSQL et MS SQL Server

L’auteur montre une ancienne syntaxe héritée de Sybase, avec une certaine mauvaise foi, pour indiquer que les syntaxes de SQL Server sont anormales et non portables, ce qui est clairement faux!

En vérité SQL Server eput effectuer la requête suivante

SELECT col1, col2

Sans les crochets, et les alias peuvent être créés avec le mot-clé AS (ou sans). Les anciennes syntaxes présentées par l’auteur résident pour des raisons de compatibilité, mais dans ma vie (40 ans d’utilisation du SGBDR), je n’ai jamais vu aucun client l’utiliser depuis plus de 20 ans!

En sus, la syntaxe ci-dessous donnée par l’auteur:

SELECT AVG(col1)=avg1

N’existe pas dans SQL Server et lève une exception :

<em>Msg 102, Niveau 15, État 1, Ligne 2</em>
<em>Incorrect syntax near '='.</em>

Ce qui prouve que l’auteur ne vérifie même pas ses propres écrits … La syntaxe correcte à l’ancienne et rarement utilisée étant

SELECT avg1=AVG(col1) ...

Enfin, lorsque vous travaillez avec des dates, vous pouvez utiliser la fonction standard ISO SQL CURRENT_TIMESTAMP (sans parenthèses comme l’exige la norme).

Quel sont les différences en termes de  types de données entre PostgreSQL et  SQL Server ? Comparaison des types de données de PostgreSQL et MS SQL Server

L’auteur a commis des erreurs, en donnant des types de données correspondants qui ne sont plus utilisés … En particulier, le type de données TEXT est obsolète et doit être remplacé par VARCHAR (max) pour le codage de style ASCII ou NVARCHAR (max) pour le codage de style UNICODE. Le DOUBLE PRECISION mentionnée par l’auteur n’existe pas dans SQL Server avec ce nom, vous devez donc utiliser un REAL.
En matière d’UUID ou de GUID, PostGreSQL n’a pas de type de données correspondant et le stocke dans un CHAR (16) ce qui présente des inconvénients majeurs : utiliser plus d’espace que nécessaire et ne pas satisfaire toutes les fonctionnalités qu’un tel type de données réclame, comme le tri ou la comparaison par inégalités …

Pire, comme nous le verrons plus tard, le transtypage implicite de données de PostGreSQL entraîne des anomalies qui peuvent être dangereuses …

Une autre différence significative concernant les types est que le type « bit » de SQL Server (qui agit comme booléen) est vraiment un bit, pas un octet et que plusieurs bits utilisent le même octet jusqu’à concurrence de huit.C’est-à-dire que l’utilisation de types booléens dans SQL Server est huit fois moins gourmand en taille que dans PostGreSQL … et certaines fonctionnalités sont radicalement plus rapides lors de l’utilisation d’un type de données bit dans SQL Server plutôt qu’un booléen dans PostGreSQL.

Quelles sont les différences en termes se sensibilité à la casse entre PostgreSQL et SQL Server ? Comparaison des collations entre PostgreSQL et MS SQL Server

De grandes erreurs et incompréhension sont délivées dans ce paragraphe.Tout d’abord, MS SQL Server n’est pas sensible ni insensible à la casse. La sensibilité par défaut pour la casse, l’accent, le kanatype, la largeur et d’autres fonctionnalités pour les classements de chaînes, doit être décidée lors de l’installation d’une instance de SQL Server. Dans SQL Server, les paramètres de classement peuvent être définis au niveau d l’instance (appelé «cluster» dans PG), au niveau de la base de données lors de la création, au niveau de la colonne lors de la création ou de la modification d’une table ou d’une vue, et enfin, l’opérateur de collation COLLATE peut être appelé dans toute expression de chaîne de toute requête, ce que PostGreSQL ne prend pas en charge.

Les collations de SQL Server prennent en charge 68 langues différentes et peuvent être CS ou CI (sensibilité où non à la casse), AS ou AI (sensibilité ou non aux accents et caractères diacritique), sensible à la largeur ou non (WS), sensible au type de kanatype ou non (KS) pour le japonais et également sensible ou pas aux variations idéographique (VSS), ou plus simplement binaire (BIN ou BIN2) ou encore SC pour certains caractères supplémentaires comme les smileys ou, enfin, l’encodage UTF-8. Les classements PostgreSQL étant limités à CI / CS et AI / AS.
PostGreSQL n’a pas autant de fonctionnalités par rapport aux collations et à l’opérateur COLLATE.

Il existe plus de 5 500 classements dans SQL Server par rapport au très faible nombre de classements de PostgreSQL. Mais PostgreSQL a récemment ajouté les collations ICU, mais ces collations sont gravement buggés et donc inexploitable ! Par exemple, l’opérateur SQL « LIKE » avec un classement ICU PG n’exécute pas la requête et donne une étonnante erreur de classement « non déterministe » !

C’est un réel problème si vous devez utiliser PostGreSQL avec des langues Latines qui possèdent des accents (Europe, Amérique du Sud, Afrique du Nord…. un grand nombre de personnes !)

Le problèmes est aussi que les collations classiques basées sur l’OS linux peuvent corrompre vos index !

En bref, l’utilisation des collations dans PostGreSQL se situe entre choper la peste ou le choléra…

Quel sont les différences de types d’index entre PostgreSQL et SQL Server ? Comparaison des types d’index entre PostgreSQL et MS SQL Server

Premièrement il n’y a pas d’index CLUSTERED dans PostGreSQL, comme celui de SQL Server ou l’IOT d’Oracle Database. Les index clusterisés présentent de grands avantages en termes de stockage de données (moins d’octets, car les pages feuilles d’index clusterisées sont stockées dans la table et non à l’extérieur – et donc on évite une redondance) et aussi en termes de performances pures : en particulier dans les jointures, car chaque index non cluster possède la valeur de la clé de l’index cluster (souvent la PK) et par conséquent ne nécessitent plus aucun accès à la table pour effectuer la partie «rechercher et jointure» d’une requête …

L’auteur oublie de dire que PostGreSQL a récemment ajouté la clause INCLUDE à ses index BTree traditionnels comme cela a été fait dans Microsoft SQL Server depuis de nombreuses années (2005) afin d’assurer la couverture de la requête par l’index …

Oui, PG a l’avantage de permettre une indexation par hachage …. PostGreSQL avait également recommandé de ne pas les utiliser (performances de recherche égales au BTree,, lenteur de construction, non journalisation…)  ! Les index de hachage existent également dans SQL Server mais sont dédiés aux tables «en mémoire».

Les types d’index GiST, GIN et BRIN n’ont jamais été implémentés autrement que dans PostGreSQL …. Demandez-vous la raison …!

SQL Server dispose également d’index verticaux (index columnstore, depuis la version 2012), que l’auteur a omis d’indiquer, et qui sont destinés à l’indexation de très grandes tables. De tels index sont indispensables dès qu’il faut atteindre des centaines de millions de lignes car les données stockées dans de tels index sont toujours compressées et ce type d’index peut être accédé pour chercher en mode « batch »: blocs de lignes de données accessibles en parallèle, plutôt que dans le mode ligne par ligne traditionnel qui subsiste dans les index classiques tels que PostGreSQL les utilise … Les index verticaux existent dans la version Citus de PostGreSQL (Citus a été acquis par Microsoft) mais ils ne fonctionnent pas aussi vite que ceux de SQL Server et sont dédiés à l’OLAP et non aux bases de données OLTP. Une autre solution consiste à changer votre version de PostGreSQL en une version payante et coûteuse telle que celle vendue par Fujitsu …. De plus, l’index columnstore de SQL Server peut prendre en charge 1024 colonnes dans la clé d’index (l’index BTree traditionnel est limité à 32 colonnes dans SQL Server et PostGreSQL) et être en mode « clustered » c’est à dire contenir toute la table et par conséquent indexé toutes les colonnes (jusqu’à concurrence de 1024).

PostGreSQL ne proposent aucune indexation verticale.

Pour les tables « in memory » SQL Server propose des index de type hash et a ajouté un index de type « range », plus sophistiqué, appelé BWTree, que PostGreSQL ne propose pas car PostgreSQL ne supporte pas le concept de tables « In Memory ».

For XML, SQL Server give 4 types of indexes (PRIMARY, FOR PATH, FOR VALUE, FOR PROPERTY) that covers full XML documents, and not only a single partial element that your have to redounds in order to dispose an atomic index value, like this is the case in PostGreSQL! In the facts, PostgreSQL has no XML index at all…

Pour le XML, SQL Server propose 4 types d’index (PRIMARY, FOR PATH, FOR VALUE, FOR PROPERTY) qui couvre l’intégralité de l’indexation du contenur des documents XML, et pas seulement un petit élément partiel que vous devez redonder afin de disposer d’une valeur atomique, comme c’est le cas dans PostGreSQL! Dans les faits, PostgreSQL ne propose aucune solution pour indexer le XML …

Quelles sont les différences en termes de réplications entre PostGreSQL et SQL Server ? Comparaison de la réplication entre PostgreSQL et MS SQL Server

L’auteur se trompe complétement concernant ce sujet, en confondant ce qui est du ressort de la haute disponibilité (qu’il ne mentionne d’ailleurs pas dans son article) et ce qui relève de l’évolutivé (réplications de certaines informations filtrées par lignes et colonnes et envoyées aux autres bases dans d’autres instances de SQL Server).

Pour la réplication des données (qui n’est pas utilisé dans le cadre de la haute disponibilité) SQL Server propose 8 modes afin de propager les données :

  • La réplication transactionnelle (pour lire et écrire des informations de tables d’une base et les envoyer à d’autres bases sans possibilité de retour
  • La réplication point à point (peer to peer), se base sur la réplication transactionnelle (pour lire et écrire des informations de différentes bases envoyées à d’autres bases, sans possibilité d’écritures en retour)
  • La réplication de fusion (merge – pour lire et écrire des informations de différentes bases envoyées à d’autres bases, avec possibilité d’écritures en retour)
  • La réplication de cliché de tables (snapshot – permet de lire et écrire dans des bases sources et reproduire les données des tables dans des bases cibles, sans retour)
  • La réplication avec Oracle (cas particulier)
  • Service Broker (pour communiquer des informations provenant des différentes bases et envoyées à d’autres bases n’ayant pas forcément le même schema de tables, avec retour)
  • L’utilisation de déclencheurs et de serveur liés (que je ne recommande pas…)
  • L’outil MS ActiveSync (que je ne recommande pas non plus…)

En matière de réplication des données PostGreSQL propose peu de solutions et les performances de ces solutions sont assez faibles en comparaison.

La réplication des données est une fonctionnalité dans laquelle vous choisissez les tables, ou les lignes et colonnes des tables que vous voulez répliquer depuis une base à une autre présente sur une autre instance. Aussi pouvez-vous répliquer des petites parties de l’informations de certaines tables, et SQL Server permet aussi de répliquer l’exécution de procédures stockées à la place des valeurs des lignes.

De nombreux gros sites web utilisent la réplication transactionnelles pour accroire la surface d’attaque afin d’assurer un service de données a plusieurs centaines d’utilisateurs simultanés. Cette technique doit être combinée avec les DPV (Distributed Partitioned View, autrement dit, vues partitionnées distribuées,  encore une fonctionnalité inconnue de PostGreSQL…).  En France, CDiscount, Ventes Privées (maintenant Veepee), FNAC et beaucoup d’autres site web marchant du top 10 utilisent ces principes sur des clusters Microsoft dans une exécution en parallèle…

Pour la haute disponibilité (non utilisable pour répliquer des données depuis certaines bases à d’autres), SQL Server dispose de 3 modes :

  • Mirroring (un « miroir » de bases de données) : par base de données, une copie complète, synchronisée ou non d’une base source vers une base d’une autre instance (LAN ou WAN) avec la possibilité d’une basculement automatique et transparent (ces copies de bases de données sont appelées « réplicas » et ne peuvent pas être lues ni écrites). Cette fonctionnalité a été considérée comme déprécié au profit de la technologie AlwaysOn, mais on peut encore l’utiliser dans des cas basiques.
  • Log Shipping (l’envoi régulier du journal des transactions) : par base de données, une copie asynchrone de la base envoyée à une autre instance  (LAN or WAN) sans qu’il soit possible d’effectuer un basculement automatique. Je dois dire que PostGreSQL est parfaitement capable de cela, mais dans MS SQL Server divers assistants aident le DBA à mettre cela en place en moins d’une minute, incluant un système de diagnostic et de télémétrie !
  • AlwaysOn : par groupe de bases de données (appelés « Availability Group » – Groupe de Disponibilité ou AG) une copie de l’ensemble des bases du groupe, synchronisées ou non, envoyée à une ou plusieurs instances de SQL Server avec la possibilité d’un basculement de toutes les bases du groupe automatique et simultané. Ces copies de bases de données sont appelées « réplicas »et peuvent être placées dans un mode qui les rendent lisibles afin de router certaines requêtes SELECT ou assurer des sauvegardes.

Je dois avouer que je n’ai jamais vu aucune solution PostGreSQL de haute disponibilité qui avaient une possibilité de basculement automatique, être totalement transparente pour les applications, comme le fait Microsoft SQL Server. De plus certaines requêtes PostGreSQL casserons le mécanisme de réplication (par exemple ALTER TABLE, DROP TABLE, TRUNCATE, DROP INDEX, CREATE TABLESPACE…).

Microsoft AlwaysOn technology qui inclut les « groupes de disponibilité » combine un cluster de basculement et une réplication transactionnelles basé sur le binaire résultant de l’application des transactions (et non sur la réplication logique de la transaction, comme le fait PostGreSQL) en mode synchrone ou asynchrone de façon à permettre :

  • d’assumer la continuité de service de manière automatique et transparente par le biais de « listener » avec des nœuds locaux, paramétrés en réplication synchrone et basculement automatique;
  • d’assumer la continuité de service de manière manuelle avec des nœuds locaux, paramétrés en réplication synchrone ou asynchrone sans basculement automatique;
  • d’assumer la reprise de service en cas de désastre avec des nœuds distants, paramétrés en réplication asynchrone avec basculement manuelle;
  • d’assurer une exposition plus large de la surface d’attaque des données avec certaines nœuds synchrones ou asynchrones paramétrés en lecture.

Cette technologie peut aussi être utilisée sous Linux mais d’une manière différente et certains nœuds peuvent résider dans le cloud Azure.

Actuellement SQL Server avec AlwaysOn peut avoir jusqu’à 8 nœuds avec un maximum de 3 synchrones et en basculement automatique et les autres doivent être en mode asynchrone à basculement manuel.
Le basculement automatique en mode synchrone garantie qu’il n’y aura aucune perte d’aucune sorte, pas même si vous rajoutez des des espaces de stockage (FILEGROUPs dans SQL Server)… La réplication de flux transactionnel de PostGreSQL, même en mode synchrone, est incapable de garantir qu’il n’y aura aucune perte et certaines comandes SQL ne peuvent en aucun cas être répliquées sur les noeuds esclaves…

À titre d’exemple, nous avons défini la haute disponibilité des serveurs SQL de production de la société de logistique mondiale Geodis (solution CargoWise), avec plus de 130 bases répliquées sur 5 nœuds (instances dans SQL Server) avec un volume de données de 14 To (2014).

AlwaysOn pour Géodis (CargoWise)

Solution SQL Server AlwaysOn pour Géodis (CargoWise)

Je me souviens que par le passé, dans un document pdf, le DBA PostGreSQL du site web « Le Bon Coin » était fier d’avoir résolu un dysfonctionnement de la solution la haute disponibilité qu’il avait mis en place pour PostGreSQL en quelques jours… Mais il avait oublié d’indiquer que les bases de données du site « le Bon Coin » étaient donc rester sans filet de sécurité pendant ces quelques mêmes jours !

La réplication ransactionnelle par flux (streaming replication) de PostGreSQL nécessite aussi quelques modifications de la base de données afin d’être capable de s’exécuter, telle que la présence d’une clé primaire dans toutes les tables… Tandis que dans Microsoft SQL Server aucune modification de quelque nature que ce soit n’est nécessaire pour implémenter la solution AlwaysOn…

D’autres différences majeures se font jour entre la solution de PostGreSQL et celle de Microsoft SQL Server :

  • L’initialisation automatique (automatic seeding) utilisée par les serveurs SQL dans le cadre de la mise en place d’AlwaysOn envoie automatiquement la base à répliquer aux réplicas secondaires. Aucune automatisation de ce genre n’existe dans la mise en place de la réplication « streaming » de PostGreSQL et il en résulte une perte de temps, d’autant plus importante que les bases sont grosses et nombreuses.
  • La technologie AlwaysOn utilise la compression pour de nombreuses opérations telles que l’initialisation des bases à répliquer, comme l’envoi de des données à valider sur les réplicas distants. Il n’existe aucun moyen de compression dans PostGreSQL pour la réplication de flux transactionnel. Il en résulte un trafic élevé dans le réseau quand le processus de réplication est en œuvre surtout lorsqu’il y a de nombreuses bases de données.
  • Un chiffrement spécifique à AlwaysOn est activé par défaut pour la réplication dans SQL Server mais la réplication par flux transactionnel dans PostGreSQL n’est pas capable de chiffrer les données à transmettre… Bien entendu il est possible d’utiliser un VPN ou SSL, mais dans certaines circonstance, tel que la santé ou la finance ou les données sont très sensibles, il est préférable d’utiliser un chiffrement interne spécifique…
  • PostGreSQL ne peut répliquer que l’intégralité des bases sur le « cluster ». Il n’est pas possible de répliquer une seule base parmi d’autres, ni même un sous ensembles des bases du cluster. SQL Server peut non seulement répliquer une base de données d’une instance en comportant plusieurs, mais aussi un ou plusieurs groupes de bases de données qui doivent basculer de manière synchrone. Enfin, SQL Server AlwaysOn offre la possibilité de dispatcher le basculement des différents groupes de disponibilité sur des les différents réplicas.
  • PostGreSQL n’est capable de basculer que dans le cas ou le moteur PostGreSQL tombe en panne. Aucune possibilité n’est offerte pour basculer automatiquement lors d’un crash global de la machine physique ou de la VM, ni en cas de problème au niveau de la base… En revanche MS SQL Server grâce au cluster Windows de basculement, détecte les problématique à tous les niveaux : machine physique ou VM, instance ou base…

Pourquoi la réplication de « streaming » PostGreSQL est-elle si compliquée et si limitée ?

La technologie utilisée dans PostGreSQL en matière de réplication de flux transactionnel est l’équivalent de la réplication transactionnelle de données implémentée dans SQL Server et qui ne sert pasq dans le cadre de la haute disponibilité. Ce mode de réplication est basé sur la logique (la commande SQL) et non sur les aspects physique de la transaction. La réplication logique possède quelques chausses-trappes. Pour faire simple, la réplication au niveau logique d’un simple UPDATE qui capture la dateh/heure courante ne générera pas la même valeur sur tous les réplicas, du fait de la propagation de la commande SQL sur les serveurs esclaves qui n’est jamais absolument instantanée (ne serait-ce que le temps réseau), même dans la cadre d’une solution dite synchrone…
Un problème similaire apparaît lorsque l’on génère un UUID (ou GUID) pour une clé.
La réplication transactionnelle basé sur le binaire, copie des octets qui ont toujours les mêmes valeurs de la source vers la cible et ne verrouille strictement rien, parce que le processus ne réplique pas une transaction, mais en réplique le résultat, les conséquenbce de la transaction. Ce point subtil fait toute la différence !
C’est un des éléments qui a conduit Uber de quitter PostGreSQL pour aller vers MySQL…  combiné au comportement anormal du MVCC dans le processus de réplication.

Pour passer outre ces problèmes, Microsoft SQL Server a opté pour une réplication physique (binaires) introduite avec le version 2005 de SQL Server (dans le cadre du mirroring), qui copie les portions du journal de transactions qui stocke les modifications binaires entreprises sur les pages de tables et d’index. Ceci garantie que les valeurs seront toujours strictement les mêmes sur toutes les bases répliquées, quelque soit le moment de la réplication qu’elle soit synchrone ou asynchrone et quelque soit les valeurs calculées, en particulier lorsque des calculs non déterministe sont lancés.

Quelles sont les différences de clustering entre PostgreSQL et SQL Server ? Comparaison du clustering entre PostgreSQL et MS SQL Server

Ici encore, l’auteur dit des choses fausses… SQL Server n’a aucune capacité de clustering actif/actif.

Dans les faits, il est possible d’avoir plusieurs instances de SQL Server (le nom « instance » fait référence au terme « cluster » dans PG) sur une machine et d’avoir certaines en mode actif et d’autres en mode passif (dans ce cas, le service SQL Server de certaines instances est désactivé). Avec un disque partagé accessible par toutes les instances de SQL Server, un crash d’un ordinateur est récupéré manuellement ou automatiquement par une autre instance qui accède au partage de disque, lors du redémarrage de l’instance de service SQL Server vers laquelle on bascule… Mais c’est une vieille méthode pour faire ce que nous appelons la haute disponibilité et la façon la plus réaliste de la faire est d’utiliser la technologie AlwaysOn comme indiqué dans le paragraphe ci-dessus.

À l’exception de la réplication des données de fusion, il n’y a aucun moyen d’avoir une base de données complète accessible en écriture par de nombreuses instances de serveur SQL s’exécutant sur différentes machines. Mais avec AlwaysOn, il est possible d’avoir une base de données active (lecture/écriture) et de nombreuses autres répliques lisibles.

Quelles sont les différences en termes de déclencheur (trigger) entre PostgreSQL et SQL Server? Comparaison des déclencheurs (triggers) entre PostgreSQL et MSSQL

SQL Server et PostGreSQL prennent tous deux en charge les déclencheurs PER STATEMENT. C’est nouveau dans PG (2018), mais ancien dans SQL Server depuis presque 1999… Mais la principale différence est que les triggers SQL Server peuvent modifier directement les données de la table ciblée sans une certaine « contorsion » du code afin d’éviter l’erreur de table mutante… Les déclencheurs PostGreSQL étant sujets à une récursivité incontrôlée dans ce cas. SQL Server contrôle la récursivité de deux manières : au niveau de la base de données (pour la récursivité) et au niveau du serveur (pour la ré-entrée, par exemple pour ce que j’appelle le déclencheur « ping-pong »).

CREATE TRIGGER E_IU_CUSTOMER
ON T_CUSTOMER_CMR
FOR INSERT, UPDATE
AS
IF NOT UPDATE(CMR_NAME)
   RETURN;
UPDATE T_CUSTOMER_CMR
SET    CMR_NAME = UPPER(CMR_NAME)
WHERE  CMR_ID IN (SELECT CMR_ID FROM inserted);

L’exemple ci-avant présente un déclencheur SQL Server qui force les noms des clients a être stockés en majuscule chaque fois qu’une mise à jour (INSERT ou UPDATE) intervient..

PostGreSQL et SQL Server ont tous deux des déclencheurs DDL, mais la façon dont ils sont implémentés dans SQL Server est plus précise (environ 460 événements ou groupes d’événements à deux niveaux différents : portée du serveur – 201 – et portée de la base de données – 262), entièrement utilisable et utilise un code le plus simple. A titre d’exemple, interdire une CREATE TABLE qui ne répond pas à certains critères, est facile dans SQL Server et utilise 6 lignes de code :

CREATE TRIGGER E_DDL_CREATE_TABLE
ON DATABASE
FOR CREATE_TABLE
AS
IF EVENTDATA().value('(/EVENT_INSTANCE/ObjectName)[1]',
                     'sysname') NOT LIKE 'T?_%' ESCAPE '?'
   ROLLBACK;

… mais dans PostGreSQL vous devez coder deux objets distincts et pisser 20 lignes de code dotées d’une logique plus complexe :

CREATE FUNCTION e_ddl_create_table_func()
RETURNS event_trigger
LANGUAGE plpgsql
AS $$
DECLARE obj record; 
BEGIN
   FOR obj IN SELECT * 
              FROM pg_event_trigger_ddl_commands() 
              WHERE command_tag in ('CREATE TABLE')
   LOOP
      IF NOT (parse_ident(obj.object_identity))[2] LIKE 't?_%' ESCAPE '?'
      THEN
         raise EXCEPTION 'The table name must begin with t_';
      END IF;
   END LOOP;
END;
$$;

CREATE EVENT TRIGGER trg_create_table ON ddl_command_end
   WHEN TAG IN ('CREATE TABLE')
   EXECUTE PROCEDURE e_ddl_create_table_func();

Dans SQL Server, lorsque vous avez plusieurs déclencheurs pour le même objet et le même événement, vous pouvez contrôler l’ordre dans lequel les déclencheurs se déclenchent (sp_settriggerorder). Dans PostGreSQL, l’ordre de déclenchement des déclencheurs est fixé par le nom (par ordre alphabétique) ! Pas tout à fait facile lorsque plusieurs contributions doivent définir leurs propres déclencheurs (rappelez-vous, de nombreux outils utilisent et abusent des déclencheurs, car c’est le seul moyen de capturer les modifications de données dans PostGreSQL). En revanche, la plupart des fonctionnalités de SQL Server qui nécessitent de suivre une modification des données utilisent le journal des transactions, ce qui est beaucoup plus léger que l’utilisation d’un déclencheur…

Quelles sont les différences en terme de requêtes entre PostgreSQL et SQL Server ? Compaison des requêtes entre in PostgreSQL et MS SQL Server

Làa encore l’auteur omet certains éléments de comparaison…

SQL Server peut également ajouter au SQL standard, des types avancés et des types définis par l’utilisateur, des extensions et des modules personnalisés, ainsi que des options supplémentaires pour les déclencheurs et d’autres fonctionnalités, en codant ces sujets dans un langage .net (C#, Python, Ruby, Scheme, C++ …).

Et le support du langage JSON est inclus de manière native.

mais… PostGreSQL n’a toujours pas de commande MERGE statement (qui fait partit intégrante de la norme delpuis 1999 !).

 

###

Quelles sont les différences en terme d’indexation textuelle entre PostgreSQL et SQL Server? Compaison de la recherche « full-text » entre PostgreSQL et MS SQL Server

The author says that « SQL Server offers full-text search as an optional component » which is a lie…. FullText indexing is a full part integrated in SQL Server by Microsoft since the 2008 version of SQL Server!

PostGreSQL indexing is completely proprietary while SQL Server acts in the ISO standard SQL way with the CONTAINS statement.

But SQL Server have some more features that PostgreSQL does not have:

  • indexing a great amount of different file format (all the MS office type of files plus .txt, xml, rtf, html, pdf…)
  • indexing documents stored as files with the FILESTREAM/Filetable storage
  • using semantic indexing
  • indexing meta tags in electronics documents
  • searching synonyms has no limitations like PG has
  • easy searching of expand or contracted terms (acronym)

What are the regular expression differences between PostgreSQL and SQL Server? Compare regular expressions in PostgreSQL vs. MSSQL

It is true that MS SQL Server has no natively regex support, this was done on purpose for security reasons (essentially because of DOS attacks…), but:

  • the LIKE operator has some REGEX functionalities ( […], ^…);
  • the sensitivity (or not) for characters strings like case, accents (and all diacritics characters like ligatures) kanatype, width (2 = ² ?) can be set while using the LIKE operator combined with a collation, which is required in the standard ISO SQL;
  • REGEX full functionality can be operated by adding a .net DLL written in C# by Microsoft as an UDF (User Defined Function)

But, and this is very worrying, the combination of a case-insensitive collation (ICU collation in PostGreSQL) and the LIKE operator, consistently causes a particularly astounding error (ERROR: nondeterministic collations are not supported for LIKE ...) which makes such search unusable!

I don’t know what is the concept of « nondeterministic » collation, because a collation is a simple surjective mathematic application between two sets, one containing the original chars and the second the transformed chars… But the result is in facts that PostGreSQL does not have fully feature to retrieve information when you need case or accent insensitive searches!

But the main reason for the lack of a built-in function, like the standard requires (SIMILAR), is that it facilitates denial of service attacks! Having a standard function known to everyone is a catastrophic entry point that makes attacking PostgreSQL servers easier by SQL injection! In SQL Server, via the CLR.net DLL that give the regex features, you can create a user defined function with a customized name… which is more difficult to find for a hacker. The PG documentation is also clear about that point….
Problem you cannot remove easily all the regex stuff of PG like SIMILAR, ~, ~*, !~, !~*, substring, regexp_replace, regexp_split_to_array. Trying to remove the regex part of the substring function, is just a nightmare… Of course, you can do it because you can get the source code…. But how many years will it take to do so?

What are the partitioning differences between PostgreSQL and SQL Server? Compare the partitioning in PostgreSQL vs. MSSQL

One major difference between PostGreSQL and Microsoft SQL Server in the topic of partitioning, is that, the partition mechanism is generic and complete… Resulting in saving, time, money and security!

When you are creating partitions in PostGreSQL you have to create one new table (inherited) for each partition of each table… If you have 30 tables and need to divide the data by month over 3 years, you need to create 1080 inherited PostGreSQL tables! What wasted time….

In SQL Server partitioning system, there is only two objects to create to manage all the partitions you want on all the targeted tables:

  • a partition function (CREATE PARTITION FUNCTION);
  • a partition schema (CREATE PARTITION SCHEME);

Finally you must do an alter table for each tables or indexes involved in the partitions to indicate that they have to store the rows in the partitioned system when they don’t have been stored in partition.

Adding a partition in SQL Server for all the tables of your partitioned data, results by launching two single commands (ALTER PARTITION SCHEME…, ALTER PARTITION FUNCTION…).

For instance, I have to create the Operational Data Store of E.Leclerc accounting (which is an equivalent of Walmart in France) and we decided to partition about 30 tables by store, and with about 300 stores, the result is near 9000 units of storage… This has been taken only 10 minutes to do!

As an example, consider the orders made on a website that we want to partition by date:

  • For recent invoices, with a partition per quarter up to year n-1 compared to the current year.
  • For old invoices (years n-2 and even older), a partitioning per year will be carried out.

This is summarized by the following figure:

Partition principles in SQL Server

Principle of partitioning in SQL Server

 

PostGreSQL solution’s based on Oracle way of partitioning has several gotchas, because the partition mechanism is based on designing intervals:

  • You can « forget » some values of key partitioning.
  • You can create partition overlapping

SQL Server solution systematically offers continuity of partitioning, so there are no possibilities to forget some key partitioning values or to have an overlap between two partitions….

By the way, let’s talk Ted Codd, the Relational DB creator, about partitioning…
Rule 11: Distribution independence: The end-user must not be able to see that the data is distributed over various locations. Users should always get the impression that the data is located at one site only.
What do you think of this rule in relation to the fact that, in PG, you need to design new tables to create partitioning? A table is a logical location of data!

Some more problems remains in PostGreSQL solution:

  • PostGreSQL does not allow to partition an existing table. This is a severe limitation, because you cannot always knows, when designing the database, if this or that table must be partionned. Usually you have to wait for the growth of the database size!
  • PostGreSQL cannot create global indexes on partitioned tables. Those indexes are widely used, because many queries does not have the partition criteria in the WHERE clause…
  • PostGreSQL does not offers support for « splitting » or « merging » partitions using dedicated commands;
  • PostGreSQL cannot make a partition non-writable (as an example for archive purpose);
  • PostGreSQL cannot compress a partition (as an example for lukewarm data).

What are the table scalability differences between PostgreSQL and SQL Server? Compare the table scalability in PostgreSQL vs. MSSQL

The author forget to say that SQL Server has a system called Data Partitioned Views (DPV)  that allow horizontal partitioning and give a way to run parallels instances of SQL Server to offer a wider attack surface for the data. As an example, fnac.com (our « French » amazon) uses a farm of servers running simultaneously.

But this topic must rely to the paragraph « What are the replication differences between PostgreSQL and SQL Server? Compare replication in PostgreSQL vs. MSSQL »

What are the compliance differences between PostgreSQL and SQL Server? Compare the compliance in PostgreSQL vs. MSSQL

The author seems to indicates that SQL Server has no compliance for HIPAA, GDPR, and PCI. in the facts SQL Server has a higher level of compliance for many of theses requirement.

As an instance, I would mention fourth features that PostGreSQL don’t have and which are very great for securing the databases:

  • Transparent Data Encryption : cyphering the data in the files (tables, indexes and transaction log (even in tempdb which is used for temporary tables) and of courses all the backups done on the database
  • End-to-end encryption: allows clients to encrypt sensitive data inside client applications and never reveal the encryption keys to the Database Engine
  • Extensible Key Management: encryption keys for data and key encryption are created in transient key containers called HSM residing outside the machine in a secure electronic device called HSM (Hardware Security Module)
  • Dynamic Data Masking: limits sensitive data exposure by masking it to non-privileged users.

There are many more features that I can talk about for this topic that make the difference, but those based on encryption features are the most sensitive ones…

What are the NoSQL capability differences between PostgreSQL and SQL Server? Compare the NoSQL capabilities in PostgreSQL vs. MSSQL

By omitting some crucial information, the author is lying about the ability to do NoSQL…

First SQL has JSON features exactly like PostGreSQL.

As I said, earlier :

  • Second, Microsoft SQL Server have graph table that PostGreSQL don’t have
  • Third, Microsoft SQL Server have « in memory » table that PostGreSQL don’t have
  • Fourth, Microsoft SQL Server have « big table » up to 30000 columns and vertical indexing (columnstore index) that PostGreSQL cannot… Plus SQL Server Big Data clusters
  • Fifth, Microsoft SQL Server can create « key value » pair table (in memory) that PostGreSQL don’t have

So PostGreSQL has very few NoSQL features, only JSON… While SQL Server has the major NoSQL features !

What are the security differences between PostgreSQL and SQL server? Compare the security in PostgreSQL vs. MSSQL

There are some important differences between PostgreSQL and Microsoft SQL Server in the matter of security.

First PostgreSQL has never been designed in though by dissociating the security level of the operations on the « cluster » (such as creating a new database, controlling the connections, backuping…) and the commands at the database level.

As an example, a backup (that does not exists in PostGreSQL which does a dump…) can be controlled at the database level, but in PG, is outside the scope of the database, and also outside the security of the PG cluster! Conversely SQL Server use a security entity named « connection » for the instance operations, and another entity security named SQL user at the database level, in which a connection can be linked to only one SQL user in each database.

The complete list of SQL server’s privileges at all levels can be view on a poster in pdf format. As you can see, the sharpness of privileges in SQL Server has no comparison with the coarse privileges in PG.

I must say that PostGreSQL has some efforts to do in the matter of security:

  • CVE-2019-10164 at level 9.0 shows that an attack by elevating privilege is possible
  • CVE-2019-9193 at level 9.0 show that COPY TO/FROM program allows superusers to execute arbitrary code
  • CVE-2018-16850 at level 7.5 is vulnerable to a to SQL injection in pg_upgrade and pg_dump

And SQL Server database engine is well known as the most secure database system… No CVE at level over 6.5 (critical) has been discover in the past 12 months…

What are the analytical function differences between PostgreSQL and SQL server? Compare the analytical functions in PostgreSQL vs. MSSQL

The author forgot to mention that: DENSE_RANK, NTILE, PERCENT_RANK, RANK and ROW_NUMBER that SQL Server also have!

What are the administration and GUI tools differences between PostgreSQL and SQL server? Compare the administration with GUI tools in PostgreSQL vs. MSSQL

The author forgot to mention that: DBeaver, Squirell, Toad… are also administrative tools available for SQL Server and monitoring health and performance tools includes Nagios, Solar Winds, Idera, Sentry, Quest, RedGate, Apex…

What are the performance differences between PostgreSQL and SQL server? Compare the performance of PostgreSQL vs. MSSQL

The author says that « because the SQL Server user agreement prohibits the publication of benchmark testing without Microsoft’s prior written approval, head-to-head comparisons with other database systems are rare ».  In facts such clauses in contracts are « leonine » clause and are not applicable, because they are against the laws, especially in the fields of human rights and of the freedom of expression.

So, you can publish benchmarks (I have published some’s) but, you must be fair in your comparisons, and it is the case when you use an approved benchmark like the ones that are available in the TPC org.

What I see, is that Microsoft has published a dozen of results for those benchmarks, and the current benchmark for OLTP database performances is the TPC E. But I have never seen any results for PostGreSQL of any kind, maybe because of the fear to be ridiculous…

Every time I do a benchmark between PostGreSQL and SQL Server, the winner is always SQL Server. But I must say that, sometime, PostGreSQL wins on some queries…!

Think about one thing… SQL Server database engine runs queries automatically in parallel mode since the 7.0 version (1998) and in practice, almost all the operations (over 100 types actually) in a query execution plan can be parallelized….  In PostGreSQL only 4 types of operations can use a multithreading internally which seems to be limited at 4 threads…!

With the help of « in memory » optimized table and columnstore indexes (vertical index), we are light-years away in terms of performances between SQL Server and PostGreSQL…

Ask yourself why no PostGreSQL OLTP benchmarks results have been published in the TPC.org…

What are the concurrency differences between PostgreSQL and SQL Server? Compare concurrency in PostgreSQL vs. MSSQL.

The author confuses about MVCC in SQL Server. Let’s talk about real facts.

Concurrency of accessing data is done by the standard ISO SQL isolation level that require 4 levels:

  • READ UNCOMMITTED (reads data ignoring locks)
  • READ COMMITTED (read data that have been committed)
  • REPEATABLE READ (re-read data consistently)
  • SERIALISABLE (avoid any type of transaction anomalies)

In the past, for very opportune reasons, some RDBMS vendors has claimed that those isolation levels are inefficient, because they need to locks data in pessimistic mode (locks are placed before reading or writing) and this mode is subject to data overcrowding.

One great paper on this topic has been writed by Microsoft Research in 1995 and great signatures like Phil Bernstein, Jim Gray, Jim Melton or the O’Neil couple, claims for a new isoaltion level that they called « snapshot »…

So some vendors decided to use an optimistic mode of locking, that take a copy of the data before doing anything to be out of the scope of concurrent accesses, the time to read and write data… But this technic, sued in Oracle and copied by PostGreSQL in not the panacea…

  • First is does not guarantee that your process will reach the end… If someone has modify the data you wanted to modify also, you loose your work! (« loss of update » transactional anomaly)
  • Second, copying the data needs to have some more resources than working on the original data.

Finally, the technical solution chosen for the MVCC of PostGreSQL (for duplicating rows) is the worst of all the RDBMS…

  • in Oracle, the transaction log (undo part exactly) is used to reads the formerly rows. No more data are generated for this feature, but the remaining part of the transaction logs can growth up if a transaction runs a long time;
  • In SQL Server the tempdb database is used to keep the row versions. Some more data, out of the production database is generated and quickly collected (garbage’s) to liberate space. The tempdb system database has a particular behavior to perform this work. Recently (2019) SQL Server add the ADR/PVS mechanism to accelerate recovery and when this feature is activated, the row versions can be stored in specific files relative to the database.
  • In PostGreSQL, all the different versions of the table rows are kept inside the data pages of the table and reside in memory, when data are caching, then is written on disks, causes table bloats and by consequent memory pressure, until a mechanism call VACUUM proceed to sweep of the obsolete rows.

This VACUUM process is well known to be a plague, and generated locks and deadlock a lot! Some customers end up stopping the data service in order to perform such maintenance, which in practice, is not compatible with large, highly transactional databases and 24h/24 operational state. Even with the recent parallel vacuuming that I suspect to cause even more deadlocks .

SQL Server is known to be the only RDBMS to have simultaneously the 4th transaction isolation levels and can be set in pessimistic of optimistic locking. Pessimistic mode offers the advantage of taking less resources and guarantee the finalization of the transaction, with the disadvantages of waiting because of blocking.

PostGreSQL doesn’t have the possibility to use the 4 transaction isolation levels and do only the optimistic locking in the scariest way…

And the worst of all is that for an UPDATE, PostGreSQL act as a DELETE followed by an INSERT…

What are the environment and stack differences between PostgreSQL and SQL server? Compare the environment and stack in PostgreSQL vs. MSSQL

The author seems to want to confine SQL Server to the Microsoft world, which in fact is not true.

  • First, SQL Server came into the world of LINUX there in 2017.
  • Second, SQL Sever is very used with LINUX Apache, Java and PHP clients.
  • Third, SQL Server runs, in multiples clouds (Amazon, Azure…), multiple virtualizers (HyperV, VMWare…) and Docker.
  • Fourth, SQL Server has integrated Python and R languages as a full part of SQL Server (that PostGreSQL did not…)

What are the 37 hidden gems that SQL Server have and PostGreSQL do not have ? Compare the lack of PostGreSQL vs MSSQL

This paragraph is of my own! Pankil Shah of course paid in a way or another by EDB do not write such a malicious topic…

1 – Compliance to the SQL standard

PostGreSQL has claim by the past to be the RDBMS the most compliant to the SQL ISO standard. It was perhaps true… at the old times! But actually no.
The most conformant RDBMS are DB2 and SQL Server!
Some example must be done:

  • SQL identifier must have 128 characters length. This is true in SQL Server. Not in PostGreSQL is limited to 63 characters length for SQL names.
  • Pessimistic locking mode is the default standard (and READ UNCOMMITED must be able to be activated). This is true in SQL Server. Not in PostGreSQL that only do optimistic locking.
  • MERGE SQL statement has been specified since the 2003 version of the SQL standard and is a part of SQL Server, but had never been released in PostGreSQL.
  • Temporal table has been defined in the 2015 version of ISO standard SQL and they are available in SQL Server since the 2017 version. PostGreSQL use a an abominable ersatz, unusable: none of the SQL temporal operator has been implemented…
  • SQL standard make a big distinction between PROCEDUREs and functions called UDF (User Defined Function) in the SQL standard. MS SQL Server respects it while PG confuses procedures and functions, that causes a great lack of security (a user can create a UPPER function that discards data because of the overloading code strategy…).
  • Creating a trigger needs a single statement in the standard and SQL Server too, but in PostGreSQL you have to code 2 distinct objects.
  • The DataLink ISO SQL standard way to store files under the control of the database had never been provided in PostgreSQL, while SQL Server had it since the 2005 version un the name of FILESTREAM.
  • And probably the most confusing problem is that some PostgreSQL functions have the same name as a standard function but does not at all do what the standard provided for.

As developments progress, PostGreSQL moves further and further away from the SQL standard, making portability more difficult …

2 – Stored procedure that returns multiple datasets

One feature I like very much when designing a website is the ability of the Microsoft SQL Server stored procedures to returns multiple result sets in only one round trip… If you consider performances, you will see that the major part of consuming time is due to round-trips. Reducing drastically the number of round-trips results in better performances and reduces contention and of course deadlocks…

There are no possibilities in PostGreSQL to have PG/PL SQL routines that returns many datasets to the client application…

MARS (Mutiple Active Results Sets) is another feature that PG does not have and allows applications to have more than one living request per connection, in particular you can execute other SQL statements (for example, INSERT, UPDATE, DELETE, and stored procedure calls) while a default result set is open and can be read.

3 – Automatic missing index diagnosis

SQL Server is the only RDBMS to provide a full diagnosis of missing indexes. This is the case since the 2005 version! It is therefore possible to improve performance very quickly. I remember that at one of my clients we divided by 20 the average response times of queries in less than half a day by creating a hundred indexes where SQL Server had diagnosed more than 300 missing!
There is no index diagnosis tool in PostGreSQL and you must enable a tracking tool to record query performances, then analyze manually thousands and thousands of queries and try to create the most relevant index on every case. Such a work is very time consuming and the productivity is very low and frustrating for the DBA!

4 – Storing file as « system » files under the RDBMS control

Another feature I like very much is the concept of FILESTREAM widely use for SharePoint. This is a standard ISO SQL feature called DATALINK(SQL:1999), rarely implemented in DBMS…. FILESTREAM allows you to keep the files that you wanted to store in a database, in the OS file system (out of the table), but under the transactional responsibility of the RDBMS! This makes it possible to always have the files (like pictures, pdf and so on) and the data of the tables in a synchronous manner, and in case of rollback of the transaction, the insertion of the adjoining file is also invalidated. In the event of a backup, even for partial backups that are entirely possible, the FILESTREAM files are collected in the backup process and restored during the restore without any loss, which no RDBMS today guarantees, not even Oracle!

One extension of FILESTREAM is the FileTable concept that is a two faces mirror of the Windows file system… Creating a FileTable at the entry point of the Windows file system, causes a dual interface for the files stored: one face will manage files in Windows the other will manages files in SQL Server!

SQL ISO DataLink feature or similar does not exists in PostGreSQL.

5 – A transaction logs for each database

Microsoft SQL Server use one transaction log per database. If you have 100 databases, you have 100 different transaction logs plus some transaction logs for the system database, especially for tempdb database that is used for temporary objects (tables…). And if you use FILESTREAM, a special transaction log is added only for the transacted input/output of the files stored under the control of SQL Server…
PostGreSQL use only one transaction logs for all the concurrent databases inside which all the transaction is written. This result in contention when accessing disk IO to write the transaction, which is a heavy brake for performances. Many other problems occur because of this behavior in backup processes, log shipping and streaming replication when there is a lot of databases in the PG cluster…

6 – Parametric read-write views

Online table’s UDF (User Defined Function) does not exists in PostGreSQL. Those type of functions are literally « parametric views », and can be used to read data with some function arguments, but also to write (INSERT, UPDATE, DELETE) the underlying tables.
They are defined such as :
CREATE FUNCTION MyOnlineTableFunctionName ( <list_of_variables> )
RETURNS TABLE
AS
RETURN (SELECT ... )

7 – Database snapshot

Sometime you need to have a readable copy of a database, containing the values of the data at a specific time. This is possible with the concept of the « database snapshot« . Creating such object is easy and immediate, whatever the database volume is.
Database snapshot does not exist in PostGreSQL.
As an example, the purpose is for enterprise reporting at a stable time or developer’s tests that wand to compare values before and after a big batch process, and can restore the « before » value with a RESTORE FROM SNAPSHOT.

8 – Temporary tables

In SQL Server, all the temporary objects and of course temporary tables, are created in a dedicated database called tempdb. This system database gain to be stored separately on a very fast storage device like Intel Optane or NVMe or, to get even more performance in the form of « Memory-optimized tempdb metadata », provided you have enough RAM.
This database is especially designed for very fast transactions and data movements:

  • parallelism is systematically active and storage reside on multiple files;
  • a specific garbage collector does not erase too fast unused tables in case of reuse.

PosteGreSQL does not have a special DB to do that but you can specify a distinct path to store temporary objects with the command:

alter</code> <code class="sql plain">system </code><code class="sql keyword">set</code> <code class="sql plain">temp_tablespaces = </code><code class="sql string">'...my path for temp table...'

Of course, PostGreSQL does not have « in memory » feature, even for temp tables…

9 – Temporal table… 

The concept of temporal table (not to be confused with temporary tables) as been fully specified by the ISO SQL Standard and consists to add:

  • timing intervals for transaction periods to the rows of the table (2 columns with UTC datetime that can be in a hidden mode, e.g. the column will not appear when querying with SELECT *)
  • a history table in which you will find older values of rows (after every update and delete)
  • many operators (AS OF …, FROM … TO …, BETWEEN … AND …, CONTAINED IN …., and finally ALL) to retrieve values at different points or periods of time.

History table can be created in a « in memory » table in SQL Server. If not, all the data in the temporal table is stored in a compressed mode.

The solution that PostGreSQL offers for temporal table has many drawbacks:

  • the design of PostgreSQL temporal tables has nothing to do with the SQL standard…
  • In PostGreSQL, temporal table uses triggers to generate all this stuff, resulting in poor performances and many locks;
  • PostGreSQL has no data compression mechanism to save data volume and execute query faster when you query the history table;
  • the interval, that is not compound of 2 datetimes is hard to index and results in poor performances when queried;
  • hidden the interval columns is not possible in PostGreSQL, so you cannot add it in software vendor’s databases;
  • None of the temporal SQL operators has been integrated into the code….

In fact PostGreSQL temporal solution is a ridiculous ersatz of what you can have with the standard, without performance and is strictly unusable!

Another manner of dealing with temporal information is the one by Dalibo called E-maj… This contrib, close to the Oracle concept of flashback queries (but enrolling multiple objects), use some heavy triggers and costly logging tables, and does not support any schema change! Off course it is also far away from the standard SQL way of doing it…

10 – XML

The implementation of XML in SQL Server is a gemstone! In PostGreSQL XML functionalities are poor and offers paltry performances

First you can reinforce the XML type of documents stored in an XML column by a XML schema collection. Each element of a XML collection is a XML schema doc, that define the XML content of document to be stored. If no corresponding XML schema is found in the collection, the XML document will be rejected like any other SQL constraint! There is no way to add constrained XML documents in PostGreSQL…

Second, dealings with XML in table’s columns is easy because SQL Server use XQuery, XML DML and XPath2 with five methods (query, value, exists, modify, nodes).  PostGreSQL only use XPath1, which is a severe limitation to manipulates XML data.

Third, modifying a data inside a XML document stored in a table, is easy with the modify method, that only changes the appropriate information and not the all content of the XML document as PostGreSQL do!

Fourth, PostGreSQL has no way to index XML stored documentsIndexing XML document is one of the most achieved feature in SQL Server, and one of the most simple!
You first need to create a primary XML index that serves as a « backbone » for other indexes and already provides significant gains in access. You can then choose to implement one, two, or three of the other specialized indexes for PATH, for VALUE or for PROPERTY…
The Microsoft research paper on this topic is available at this URL:
https://www.immagic.com/eLibrary/ARCHIVES/GENERAL/MICROSFT/M040611P.pdf
Tell my why the PG staff cannot apply such technics in the PostGreSQL relational engine?

And finally, XML document inside a table column stored as XML or BINARY datatype, or outside the table via FILESTREAM, can be fulltext indexed…

11 – Trees stored in path format

SQL Server comes with the hierarchyId datatype which allows to store a tree in path mode and publishes manipulation methods to do so. No such tree facilities exist in PostGreSQL except the ltree contrib which is in its infancy and is roughly 15% of what SQL Server is doing about it.
To be honest, hierarchyId is not my favorite solution. I prefer to store trees as intervals, which is clearly the fastest way to do so.

12 – Custom aggregate function

The reason why you cannot define aggregate function directly in Transact-SQL is that SQL Server cannot parallelize the execution of UDFs. So, to empower the running of such functions, they intelligently decided to allow the creation of theses objects in SQL CLR (.net), with some pre programmed methods to facilitate the multithreading code.

Splitting the function execution in multiple threads is done by the « Accumulate » method, then the « Merge » method finally merges the multiple results of the different threads into a single one.

There is no way to create custom aggregate function that executes in parallel in PostGreSQL…

13 – Dynamic Data Masking

This SQL Server integrated feature allow you to masks data with various functions which is a great use for the GPDR. PostGreSQL does not have any free dynamic data masking functionnality, but there is a project call PostGreSQL Anomymizer which is at an early stage of development and should used carefully. Also paid versions exists, like DataSunrise Data Masking for PostgreSQL or in the DBHawk tools.

14 – LOGON trigger

A logon trigger is very useful when you want to control who accesses the instance (cluster in PG words) and the database and also to limits the number of users using the same login.

Logon triggers is embryonic in PostgreSQL.

More precisely, there is a contribution, with some drawbacks, like the needs for a systematic public permission… What a lack of security!

15 – Impersonate

Microsoft SQL Server has two levels of impersonation: at the session level and at the routine level (procedures and triggers). PostGreSQL has only a session impersonnalisation but PostGreSQL cannot impersonate the execution of a routine which is a great feature to trace the user activities with many technical metadata when usually there is no grant to the user to access this information especially for use under the GDPR. Also this feature is a great help to code routines (procedures, triggers..) wich some extra privileges that, ordinary, the SQL user does not have. In this way, the codet is closer to what we do in object oriented languages…

16 – Service broker, a data messaging system

While application have SOA (Service Oriented Architecture), which means service running and cooperating by sending messages, this architecture can be set down to the database layer… That’s SODA (Service Oriented Database Architecture)…

Such a system is coded in SQL Server under the name of « Service Broker » is made of queues (which are specific tables), services (to send and receive messages), conversations (to manage a thread of requests and responses)… Off course to manage it over HTPP, you have to deal with HTTP endpoints, contracts, routes and to encrypt your data…

There is no equivalent to Service Broker in PostGreSQL to transmit data messages from an cluster to one another with routing, serialization and transaction.

17 – Data and index compression

Another very unfortunate lack in PostGreSQL is the absence of data and index compression. SQL Server have several modes for compressing data:

Of course, compressed index does not need to be decompress to seeks the data.

SQL Server data compression effects (same data, 6.6 times smaller)

 

 

 

 

 

18 – Resumable indexes

When dealing with very huge tables (hundreds of Gb…) one big problem is the duration of indexing such tables. Even if you are in an ONLINE mode to create indexes, some indexes can take a very long time to do. If the instance is overloaded at sometimes, you will need to cancel the CREATE INDEX process.
In SQL Server there is another intelligent way to process such a problem, called « resumable indexes »…

  • First you need to create the index with the ONLINE and RESUMABLE options.
  • Second, at any time you can stop the CREATE INDEX and preserve the work done with the command that ALTER INDEX … PAUSE.
  • Third you can restart the CREATE INDEX from the point is was, with the command ALTER INDEX … RESUME.

Another possibility is to create the index with the MAX_DURATION options. When the CREATE INDEX command exceed the amount of time specified resumable online index operation will be suspended.

There is nothing equivalent like the resumable index operation in PostGreSQL.

19 – Page restauration

Before restoring the whole database, in the case of some damaged pages, you can try to recover those pages by retrieving them into the different backups. This operation is feasible via the BACKUP command. In Enterprise version, this process cans be executed in the online mode.

When using the Mirroring/ AlwaysOn high availability system, the damaged pages are automatically repaired…

PostGreSQL does not have any possibility of page repair! When you lose some data in a table page, you are condemned to restore the full backup…

20 – Intelligent optimizer (« planer » as they call it in PG)

Some more extraordinaries features have been definitively appointed in SQL Server 2019 for the database engine performances under the term of Intelligent Query Processing:

  • Scalar UDF Inlining: scalar UDFs are automatically transformed into scalar expressions or scalar subqueries that are substituted in the calling query in place of the UDF operator. The result is a spectacular gain of performances for some classical UDF!
  • Memory grant feedback: when the amount of reserved memory for a query has been over evaluated, the optimizer reduces the size the next time a similar query will be executed. The result is a gain of cache in memory for much more data and client operations.
  • Batch mode on row store: usually algorithms in the execution plan of a query are reaching rows one by one. In batch mode the qualified rows are retrieved in a single step that involve thousand rows. The result is an important gain of performances on big tables.
  • Table variable deferred compilation: due to the optimization process a table variable cannot have a correct cardinality estimation. For instance, PG estimate to 100 the rows that will be in such object. From now on SQL Server stops the execution of the query just before filling the table variable, and readjust the remaining estimated query plan to be accurate and much more performant that the formerly first estimated plan. The result is an increasing gain of performance when you use table variable.
  • Adaptive joins: in some cases of heavy volume of data, where joins cannot be estimate in an accurate manner, an adaptive join is created, constituted in fact in two joins algorithms, and the final choice of the algorithm to joins the two datasets will be choose after a first read done in the smallest table (in terms of cardinality). The result is a faster join technic in some heavy queries.
  • Automatic plan correction : When a plan regresses due to plan changes, previously-executed plans of the same query are often still valid, and in such cases, reverting to a cheaper previously-executed plan will resolve the regression.

All those features combined with compression, vertical indexing (columnstore index) and automatic parallelism, give to the SQL Server engine, the most powerful RDBMS of the world (even Oracle does not use vertical indexes for OLTP workloads).

At the opposite, the PostGreSQL query « planer » (why don’t you say query optimizer? Because it does not optimize at all?…) reveal that it is unable to do a good job, when queries have a great number of join or subqueries. Why? The optimization rules of the PG planer are heavy consumer of resources (exhaustive-search) and a threshold has been defined (default value is 12 joins) to switch to a more sophisticated algorithm called GEQO… but the PG staff have emitted some criticism about there own solution (GEQO): « At a more basic level, it is not clear that solving query optimization with a GA algorithm designed for TSP is appropriate« …
Facing the facts, optimizing complex queries in PG is a nightmare!

21 – Query hints

Query hints: even if it is bad (and really I think so!) the possibility of using SQL query hints to impose an algorithmic strategy on the optimizer is clearly a good transitive solution to solve difficulties when the optimizer goes wrong… It is also a solution offered by the paid version of PostGreSQL delivered by EDB!
SQL Server offers two different hints: table hints and query hints.
The proponents of free PostGreSQL categorically reject this approach, but the presence of hints in EDB makes me think that it is more by protection of a market than by stupidity …

In SQL Server there is many ways to deal with query hints:

  • adding hints directly in the SQL text of the query;
  • adding a « plan guide » that adds the hints directly in the SQL text of the query on the fly (very interesting when you have an application coming from a tormented editor…);
  • some hints can also be enabled at the database level or the instance level (Trace Flag);
  • analyzing bad and good plan versions of the same query and decide to chose the most improved, even with hints, which in SQL Server is the role of Query Store.

In PostGreSQL there is only a possibility to enable some hint at the session or the cluster level, and this approach has severe limitations:  no way to enable the hint for one query and not the other, no way to add a query hint on the fly, which is the only way to correct bad plans when they are in a vendor application!

22 – Query plan analysis

Analyzing an execution plan for a query can be done in SQL Server by the Query Plan visualization and the plan comparison tool.

The pauper tools that offers PostGreSQL for visualizing and comparing query execution plans are iron age dated, comparing to what SQL Server Management Studio (SSMS) is able to do!
Not only can you see in what step of the execution plan your query is, but SSMS can also give you the differences between two similar execution plans.

Comparaisons des plans de requêtes
Some full addons like SentryOne Plan Explorer (free tool) reveal quickly the bottlenecks of query plan by a color code that is dark red for a heavy step…

Another possibility is to see, the query execution plans running with « Live Query Statistics« :

Query execution plans running with « Live Query Statistics » (with the courtesy of Manoj)

 

 

 

 

 

 

 

 

 

This is very useful when you are executing very long queries and wanted to know what is the steps actually running…

23 – Policy based management

If you want to automate some policy of management you can use this feature to make a continuous or scheduled check of rules that applies to some « facets » (instance, database, table…) of SQL Server. As an example, you can be informed, daily, if there is a database, newly created, that has not been backuped.

There is no tool in PostGreSQL that facilitate the implementation of management rules for the DBA.

24 – managing data quality

SQL Server have two distinct modules to manage data quality:

DQS (Data Quality Services)

MDS (Master Data Services)

There is nothing to manage data quality into PostGreSQL.

25 – Change Data Capture

Many corporate OLTP databases require that certain data to be transposed into an ODS or data warehouse. To identify the lines that have been inserted, modified or deleted during the time necessary to refresh the DW or the ODS, it is necessary to have a reliable tool so that the target base is only supplied by a data differential rather than a total reconstruction…
SQL Server has two tools for this:

  • Change tracking
  • Change Data Capture

PostGreSQL does not have any built-in similar tool to do that, so you doom yourself to rebuild your DW or your ODS from scratch!

Of course, you can use a tiers software to do it (Kafka as an example).

26 – Transparent Data Encryption

One major feature to deal with GDPR is the Transparent Data Encryption. This feature encrypts the whole database (date files and transaction logs, and to avoid any lack in this process, encrypt the tempdb database too). So, stealing any part of the database, even a backup, won’t do any good… This system has a very few performance impacts on the data process, because the encryption is made only when physical IO are needed. To do that, you need to manage internally (inside the RDBMS) the IO executed on the file, that PostGreSQL don’t.

PostGreSQL does’nt have anything such as Transparent Data Encryption.

27 – Database/server Audit

There is one possibility of database auditing in PostgreSQL (pgaudit). But the limitations to the database scope (no full server scope) and the fact that it is not possible to filter on columns for tables or view, nor on user account, produces an astronomical amount of data in which you will spend hours to find what you want.

Also the database audit can be stopped by a malicious strategy (e.g. filling the storage disk where reside the audit file to stop the audit process) while PostGreSQL continue to serve the data. On the contrary, SQL Server can be stopped when the audit process is unusable…

PostGreSQL’s audit solution has severe limitations that make it difficult to enforce GDPR in addition to a few security holes.

28 – Resource governor

The resource governor: is a very interesting feature when some users are high consumers of data while other one needs to be served quickly. You can define, by SQL profiling a quota of disk, RAM and CPU, and in order to prioritize processes, your different categories of users…

There are no possibilities in PostGreSQL to quota the resources of certain users

29 – More cache for data than the limit of the physical RAM!

Buffer pool extension: no enough RAM for caching data? Think to set 4 to 8 times your actual RAM to a second level of cache with this feature. Of course, it needs a sort of RAMDISK, like NVMe or Intel Optane…

There is no possibilities to create a secondary level of cache in PostGreSQL.

30 – Investigate inside the DBMS engine and further

The extraordinary collection of Data Management Views (DMV) helps you to collect, understand, tune, fix, analyze… everything you want inside and outside SQL Server when running in real time. 278 views or inline table function can be queried to see what SQL server is doing or has done…
A list of these views can be download in pdf format on the Quest website:
https://www.quest.com/docs/ql-server-2017-dmv-dynamic-management-views-poster-infographic-26735.pdfSQL S

278 management view in SQL Server… A big difference compare to the 27 views of the PostGreSQL « Statistics Collector »…

31 – EXtended Events

Extended Events is a lightweight architecture that gives the possibility to investigate at a high level in the system to understand what happen while SQL Server is running.

No high-level fine grain investigation is possible in PostGreSQL. Some contribution can do a few things but they are disparate, and synthesizing all the metrics of the different contributions is not an easy task as the different formats of the data collected are heterogeneous … Of course, the sharpness of the investigations given by the SQL Server EXtended Events has nothing to do with the roughness of the possible analyzes in PostGreSQL

There is absolutely nothing equivalent to EXtended Events in PostGreSQL to troubleshoot at a high level. 

32 – Database file security

While the SQL Server engine database is running there is no way to copy or drop the files of the database, because they are hooked by SQL Server process and protected by Windows (note that this feature is not available in Linux which have a different behavior). In PostGreSQL, that creates many many files for a single database, destroying a data file or an index file is easy and nothing is alerting the DBA, except that some events written in the error file! But it is too late…

33 – Easy tuning

SQL Server has always claim to be tuned automatically. So, there is few parameters to modify, and essentially when you are installing SQL Server on a machine, to deal with RAM and parallelism. 85 options are configurable but some of them are obsolete (but maintain for retro-compatibility feature) and some other automated (nothing to do).
When you modify many parameters simultaneously, SQL Server verify the concordance, except if you execute the command in WITH OVERRIDE option. In facts only 20 options are really used, and in 99% of the instances, only 5 options are configured

PG offers you 315 parameters to deal with, and no verification is made to help you to do no mistake!

34 – Monitoring tools

As we say in French « do not shoot the ambulance« . PGAdmin is the worst query tool ever. Even MySQL has a better tool! SQL Server Management Studio is, without a doubt, the best query tool! Some vendors has copied it for their solutions, like Quest with TOAD !

Many heavy enterprise monitoring tools exists for SQL Server like SentryOne SQL, Red Gate, Idera, Apex, Quest Spotlight… or lighter like Solar Winds, Paessler, DataDog, DBWatch, SpiceWorks… and sometime free like Kankuru !
Rare are similar monitoring products for PostGreSQL : Solar Winds, PaesslerDataDog… does not run at the same level that Sentry, Red Gate or Idera…

35 – Hot hardware escalation

SQL Server on Windows accepts, with a specific hardware design (HP, Dell) to have some more physical resources (CPU, RAM) without switching off the machine…

PostGreSQL has not been designed to acquire more physical resources without being extinguished beforehand.

36 – Documentation

Microsoft SQL Server is the most well documented RDBMS in the world. PostGreSQL documentation is very poor especially in real productive examples of code…

PostGreSQL vs SQL Server - Documentation

Comparison of documentation size between PostGreSQL and SQL Server

37 – Recovery models

SQL Server can log transactions in three different modes:

  • FULL:  everything is written into the transaction log file;
  • BULK LOGGED: some reproducible parts of transactions are not logged;
  • SIMPLE: same as BULK LOGGED, but the transaction log is automatically purged.

In FULL and BULK LOGGED the transaction log keep all the stuff until a BACKUP LOG that purges the transaction log is done. Simple recovery model is very fine for OLAP databases while OLTP needs FULL or BULK LOGGED.

Nothing equivalent to those recovery models exists in PostGreSQL…

38 – Automatic indexing

I will lie! The cloud version of SQL Server called SQL Azure has features ahead of version on premises, and offers the possibility to create automatically all the indexes you need.
On premise version will have it in few month and the internal system tables are already prepared to support them!

Automatic indexing in SQL Server

 

 

There is no way to ask PostGreSQL to automatically create the indexes necessary for the proper functioning of the database

What are the gotchas that PostGreSQL have and SQL Server do not have ? Compare the worst features of PostGreSQL vs MSSQL

This paragraph is of my own too ! 

1 – The plague of postGreSQL MVCC

To ensure optimistic locking, every RDBMS must have a system that copy the data for each row that needs to be read when another client want to modify the same data. To do that, different RDBMS uses different technics… Generically this is named MVCC for Multi Version Concurrency Control.

  • In Oracle database the transaction log is used to do so.
  • In SQL Server, the tempdb (a specific database with it’s own transaction logs) receive the versioned rows.
  • In PostGreSQL, all the different versions of the same row are clustered in the same page of the database.

The PG MVCC solution is the worst way to do it, because it has three major inconvenient:

  • First, the table is naturally bloating, and this « bloatage » is increasing severely when the number of concurrent transactional operation is high…
  • Second, because that pages are the minimal logical memory storage, the data cache (table and index’s pages containing rows are pin in memory to reduce IO disk access) is filled with plenty of obsolete rows, this results in the cache memory being replete with a lot of unnecessary data, pushing out a lot of data actually used.
  • Third, you need to use the terrific VACUUM tool, that locks a lot and causes some deadlock crashes to clean up the memory and remove the rows no longer needed.

Details about this behavior can be found at « Is UPDATE the same as DELETE + INSERT in PostgreSQL? »

2 – Bugs that remains from years…

I very much like this one… This query:

SELECT 'A' IN ('A' 'B');

Throw an exception… ERROR: syntax error on or near « ‘B’ »

If you arrange the query text differently, you have a surprise…

 

 

 

 

 

This bug is in PG since almost 2014…

The consequence is simple: Never indent your query texts in PostGreSQL! How easy it is for debugging…

Among the bugs, this other curiosity that can be found in the official PostGreSQL documentation and remains from 2010:

The rewriting forms of ALTER TABLE are not MVCC-safe. After a table rewrite, the table will appear empty to concurrent transactions, if they are using a snapshot taken before the rewrite occurred

3 – Data casting

Many dynamic typing are incorrect in PostGreSQL. When you use a datatype combined to some other, the SQL standard ISO language specify that the resulting datatype must be the minimal combination of the different datatypes involved in the operation. This is not true in PostGreSQL… As an exemple, char datatypes combination conducts systematically to the TEXT datatype as shown in the following picture:

Incorrect datatyping in PostGreSQL

Many other data types have the same trouble. Another example came with the money datatype in PostGreSQL which is a fine mess!

Incorrecte datatyping in PostGreSQL: money can't be findIf you use this datatype, you have to explicitly cast the value! This is an incredible limitation that no other RDBMS have!

4 – The horrible performances of COUNT…

Many papers pointed out the poor performances of the COUNT aggregate function in PostGreSQL (even PG staff), performances that are often the worst of all RDBMS… This is due to a design flaw in the PG engine architecture related to the way the MVCC was made. In facts, PG needs to read all the rows in the table’s or indexes pages because the MVCC leaves phantom rows… In other databases the optimizer collects only a single integer at the header level of the data page that contains the exact number of available rows. SQL Server is merely the faster of all in many cases where COUNT is used…

Count Distinct Compared on Top 4 SQL Databases

Our previous paper pointed out that PostGreSQL was globally 100 (one hundred) time slower than SQL Server !

PostGreSQL vs Microsoft SQL Server – Comparison part 2 : COUNT performances

5 – The hard way to udpates unique values

According to rule 7 of Codd’s twelve rules about RDBMS, the capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update and deletion of data.
This is not true in PostGreSQL!
As a matter fact, PostGreSQL is unable to update some data when a unique or primary key constraint strengthens the values of the data…

Let us show this with a simple example:

CREATE TABLE T (U INT UNIQUE);
INSERT INTO T VALUES (1), (2), (3);
UPDATE T SET U = U + 1;

The last query throws and error, while any other real RDBMS (Oracle, SQL Server, Sybase, DB2…) does it!
But…. PG delivers a contorted way to bypass the problem, by using a deferred transaction!

6 – L’absence d’index clustered (aussi appelés IOT)

Ce qui fait la force de « MS SQL Server » est que le moteur SQL a été bâti autour de la notion d’index clustered et optimisé pour cela. Or PostGreSQL, bien que proposant cette possibilité, n’est pas optimisé dans ce sens, bien au contraire. Les SGBDR comme SQL Server ou MySQL, sont beacoup plus performant de ce fait, car ils n’ont pas à se précoccuper de l’emplacement physique d’une ligne qui peut changer en fonction des mises à jour. Or un changement d’emplacement de stockage de la ligne dans la table cause toujours des pertes de performances car il faut remettre à jours tous les index !

En ayant un pointage logique type clustered index (nommé IOT par Oracle) aucune mise à jour de la valeur de pointeur n’est à faire dans aucun index, car cette valeur ne change pas, et d’autant moins si les index clustered sont construits sur des auto incréments réputés immuable…

Il n’est qu’à lire les commentaires posté en fin de cet article, pour se convaincre des méfaits de cette stratégie

7 – Views that cannot be updated

PostGreSQL have an incomprehensible limitation about views updating. When a view is compound of several table, a simple partial INSERT, UPDATE or DELETE can be execute with every RDBMS including MySQL/MariaDB) on one of the table of the view, except on PostGreSQL, that throw an exception: « Views that read multiple tables or views are not automatically writable. »

8 – Pivot or Unpivot?

While many RDBMS have PIVOT’s feature, PostGreSQL doesn’t have the PIVOT nor the UNPIVOT operator. Off course you can do a PIVOT operation without the PIVOT key word SQL operator, but the main difference is about performances! Using a bad trick like the one you can use with arrays in PostGreSQL is clearly the poorest one in terms of speed!

9 – No ONLINE DDL

When creating or altering database object, PostGreSQL set an exclusive lock, often on the whole table, sometimes on rows concerned (e.g. FK constraints). Only a CREATE INDEX can be set online.
In SQL Server, not only the CREATE INDEX can be online, but mostly operations like adding or deleting a column or a constraint.

10 – No debug for triggers

PostGreSQL does not offer the possibility to debug triggers otherwise by adding some RAISE commands… This is quite problematic when it is known that many developers are uncomfortable with coding triggers. By comparison, SQL Server offers a real debugging tool and, in addition, you can put PRINTs command to send a message to the diagnostic console or send a complete dataset via a simple SELECT statement…

11 – Planer statistics

Managing the statistics for the « planer » to optimize a query, is very minimalist in PostGreSQL:

  • there is no possibilities to compute statistics on the whole table or index in PostGreSQL, what we call FULLSCAN in SQL Server;
  • there is no possibilities to control the percentage, or the number of rows, concerned of the statistics sample in PostGreSQL, for each tables, what we call SAMPLE … PERCENT or ROWS in SQL Server;
  • there is no possibilities of import or exports statistics in PostGreSQL, what we call STATS_STREAM in SQL Server;
  • there is no possibility to automatically recompute statistics when needed in PostGreSQL, what we call AUTO_UPDATE_STATISTICS in SQL Server. PG’s AUTOVACUUM do much more jobs but it is blocking and it searches phantom slots, not inaccurate stats. PG ANALYZE process that recompute the stats set a read lock on the table, while SQL Server does not!
  • when restoring the database, all the planer statistics are loss;
  • creating statistics is very very slow in PostGreSQL compare to SQL Server that’s run in multithreaded…  in our tests PostGreSQL was about 10 times slower than SQL Server

PostGreSQL vs Microsoft SQL Server – Comparison part 1 : DBA command performances

Well, this last point is really one that is very important to take into consideration if you want to deal with huge tables and a heavy transactional activity in PostGreSQL!

The result of all that stuff is the bad quality of « planer » execution plan and, of course the global performances….

12 – Data encryption

One horrible feature with terrific security vulnerabilities is the way that PostGreSQL deals with encryption. In SQL Server, all the keys and certificates are hierarchically nested from the SQL Server instance to the encryption key, and needs to pass through the « caudine » forks of the database master key. So event if the database is stollen (the files or a backup) the burglar will not be able to restore it and clear the data, because the database needs to access the key at the upper level…

In PG, you must have somewhere the key in a file in the instance scope or the client application to encrypt or decrypt by pg_crypto. Do you leave your car with the keys on?

Much more, SQL Server can use an HMS (Hardware Security Module) that keep the keys into an electronic device, that can automatically be destroy in case of key rape attempt… Many health insurances use this principle.
No such system (HSM) is available in PostGreSQL!

13 – The disastrous user’s number scalability

Poor scalability: PostGreSQL has not been design for a high number of concurrent connections. Limited to 100 by default, many developers forget to think about scalability of concurrent users…
The PG staff say: « Generally, PostgreSQL on good hardware can support a few hundred connections. If you want to have thousands instead, you should consider using connection pooling software to reduce the connection overhead. »
A few hundreds? Ok! That’s gigabytes!!! Because PG uses a fair amount of memory per connection (about 10Mb).
Conversely SQL Server integrate a pooling system for many drivers (ADO, .net, PHP…)

https://miro.medium.com/max/1500/0*aoP9QjslvvSOS-jM.png

This picture shows the slowness increasing for a connection to PostGreSQL… The practical limit at never overpass is 300, meaning each connection will take about 300 ms, threshold from which users began to think that something abnormal is comming…

14 – Poor backup/restore strategies

  • PG_dump does not generate binaries backups, but writes a SQL batch. Conversely in SQL Server, BACKUPs are binary pages of the database plus a portion of the transaction log. This accelerates backups and restore processes.
  • PG_dump place a lock on the db that forbids the use of certain SQL commands like TRUNCATE or ALTER… in SQL Server there is no SQL command at the database level that have to wait because of a backup, except a new backup of course!
  • PG_dump is an external command out of the security scope of the database, and you cannot give the right to a SQL user to dump some database and not the others… In SQL Server the BACKUP command is a Transact SQL Statement and can be protected in different manners to ensure a maximum of security even for DBAs…
  • The PG_dump process uses a REPEATABLE READ isolation level, and finally when restoring the database, the data have the values that they have had when starting the backup process. Conversely, in SQL Server, the BACKUP internal command generates a restore with the value that the database had at the end of the BACKUP process, of course, this gives more accurate data and more security…
  • PG_dump is not able to spread the backup file over multiple destination. Also, there is no way to parallelize backup over multiple targets. SQL Server offers BACKUPs strategies known as family backups to access a set of devices, and also the MIRROR TO option, offers you to send the backup files on, at last, three different destinations (as an example on a local directory and simultaneously a distant file share).
  • Backuping a hot readable standby in PG needs to stop the replication on standby using SELECT pg_xlog_replay_pause() which in case of a crash while running pg_dump results as the loss of the « protected » database ! This is not the case in SQL Server… even in synchronous state and automatic failover…
  • PG offers no strategy to do a piecemeal restoration (beginning to work with the data of the database which is still under the restore process) as SQL Server does.
  • A simple PG_dump command if often 2 to 6 times slower on a database with exactly the same data as in SQL Server.
  • Restoring a PostGreSQL backup results in a database whose data state is when the backup was initialized, while SQL Server restores the database to the state upon completion of the backup, which allows a better continuity of successive backup process.
  • Optimizer stats (pardon, « planer stats ») are loose when restoring a database. You may recreate it from scratch. This may not be important for data production, but for application vendors who use PostGreSQL and need to recover customer data following an incident, this complicates the process and lengthens the time it takes to recover the database for analysis.

Dans cet article le site web « Le bon coin » indique que pour restaurer une base PostGreSQL de 100 Go avec 50 Go de journal de transaction, la durée serait de 255 minutes, soit plus de 4 heures. En comparaison, pour ce même volume et avec des disques magnétiques SQL Server va mettre environ 30 minutes…

15 – No internal storage facilities

We spoke about FILESTREAM and FileTable earlier that does not exist in PostGreSQL… Let’s talk about table and index storage….  In SQL Server, like in Oracle or DB2, the storage engine of the RDBMS uses internal routines to perform physical IO operations as quicker as possible. A database in SQL Server can use only one file for all objects, and it is possible to add some more files to parallelize (for a quicker access) or to ventilate data (for a peculiar purpose, read only as an example) .. On the contrary, PostGreSQL delegates all the IO operations to the OS. And because a table creates multiple files, (including indexes) the OS have to deal with an incredible number of file descriptors, which needs a lot of internal OS resources. Also, PostgreSQL is unable to have a part of the database stored in read only mode…

And PostGreSQL has no strategy to go on « In Memory » tables, except if you want to pay for the Fujitsu Enterprise version…

16 – bug in INFORMATION_SCHEMA

Some ISO SQL standard metadata views are not correctly implemented in PostGreSQL which is not a real problem, except the case you want to have the same database design in different RDBMS… But for a DBMS which loudly proclaims to be the most respectful of the SQL standard, not being able to use the INFORMATION_SCHEMA views because of such an error, seems to me to go against its philosophy.

https://www.postgresql-archive.org/Primary-key-error-in-INFORMATION-SCHEMA-views-td6022177.html

https://www.postgresql-archive.org/BUG-16642-INFORMATION-SCHEMA-DESIGN-td6156029.html

17 – The default « public » schema

While in other databases the default SQL schema is always a specific one, PostGreSQL have had the bad idea to use the « public » schema as the default for creating tables, views and any kind of routines. This, combined with the fact that functions in PostGreSQL are not really function as the SQL standard requires (UDF) and are able to writes data (which is forbidden in UDF), and do not requires the schema prefix to be used, makes an explosive combination of security lack… Anyone can create a malicious function called UPPER, placed by default in the public schema, that will destroys data in your database tables when any user will execute a query in which the code needs to capitalize some character’s string !

This design flaw has been recognized as one of PostgreSQL’s most important vulnerabilities and requires such modifications for software publishers that it should remain in the code for many more years….

In SQL Server this is impossible, because the distinction is made between procedures that cannot appear into SQL queries, and UDF which can only, reads data. Also, User Defined Functions needs to use the schema prefix to be distinguished from integrated SQL functions delivered by the editor that does not have any schema prefix, as required by the SQL Standard.

18 – Materialized view with no auto refresh

In November 1998, Larry Ellison, claims and bets a million dollars that Microsoft with SQL Server is unable to compete with Oracle about a TPC-D benchmark query on a 1 Tera byte database or it turns out 100 times faster than usual. But in March 1999, Microsoft, which has just released version 7 of SQL Server, retorted by showing that it did better than Oracle on this query. Oracle had just come up with the concept of materialized views, while SQL Server did better by delivering indexed views where the main advantage is that there is no need to refresh since these views are always synchronous with the data in the tables. Bill Gates had therefore won Larry Ellison’s bet, but the latter never paid for it!

Of course, because PostGreSQL has always aping Oracle, materialized views needs to be manually refresh… which, in practice, makes it very unusable.
The refresh process consists of recalculating all the data in the view!

In MS SQL Server, all the « materialized » views, which are called indexed views, are always synchronous to tables data and they doesn’t needs to be refresh, like any other index… And more, the optimizer can automatically substitute the results of an indexed view instead of executing the query on the tables, when query execution results are available in the indexed view.

The SQL Server optimizer substitue the results of the view instead using the table

The PostGreSQL « planer » does not have the « intelligence » to substitue the pre-computed results of a materialized view if needed into a query that access equyivalent tables…

In real life, PostGreSQL materialized view are only suitable for dead data, like archives, but not for lives tables !

19 – oid or not oid… Why extra columns?

In some PostGreSQL system table you have a column named oid. In some other, you don’t see this column in the table description, but this column is returned in a SELECT *. Some other system table does not have the oid columns… Why such an inconsistency? Why are some information hidden?

Also, when you create a table, PostGreSQL add some more columns (tableoid, cmax, xmax, cmin, xmin, ctid…) that I do not want and this violates the basics rules of Codd’s relational theory! … and can cause big trouble if you want to migrate from another RDBMS : imagine that in SQL Server you have a table with a functional column named ctid… You want the same one in PostgreSQL, but you cannot!

ALTER TABLE MyTable ADD ctid INT;
ERROR: column name « ctid » conflicts with a system column name
SQL state: 42701

… Because these colums does systematically exists in all the PG table.
You will have to rewrite your complete application because of these stupid columns:

There is absolutly no such hidden column in SQL Server.

20 – Improper use of reserved names

Some functions given by some PostGreSQL extensions have reserved names that they shouldn’t have. For instance, the widely used PostGIS, names all the function by the prefix ST_, because of the ISO SQL Standard (and OGC too) decided to define the geo functions as so… But there is a limited list of geo functions in the standard whereas PG named all the function with this prefix. This has two consequences:

  • first, it is impossible to distinguish what is proper to the standard and what not;
  • second, OGC and geo SQL, add regularly new functions, and the new function names can conflict with nonstandard PostGreSQL function names.

This can be a real trouble when you want to migrate queries between different RDBMS!

21 – Multitenant…. but

PostGreSQL’s bad habit of systematically wanting to copy Oracle often leads to reproducing pitiful functionality. This is the case with dblink, the prerogative of Oracle, that PostGreSQL staff stupidly reproduced, believing they were doing well.
The idea is to allow cross-database queries operating in a transversal way in different databases. for example, a join between two tables from two different databases or a sub-query carried out on a database whose data are exploited by a query from another base.
What was not abnormal under Oracle whose system only allowed to create a single database, until the arrival of the pompous « multitenant », turns out to be stupid under PostGreSQL which has been multi-base for many versions…

This PostGreSQL dblink, modelled on Oracle, allows an X-database to see and interpret, in a query, data from a Y-database. But there is a catch… Because of the dblink in PostGreSQL, data operations between these two databases are required to pass through a temporary table (often a « remote join »), which in practice is a real black hole in terms of performance.

However, it is common for software editors to create as many databases as there are customers or folders or anything else… And in the case where a common repository is needed for all these databases, as in the case where you have to cross the data from all theses databases (summary tables for example), the performance of the queries will go from slow to catastrophic!
SQL Server hasn’t needed dblink for a long time to perform cross-base queries, and more such queries are as well optimized as intra-database queries!

22 – Nested transaction

When you write multiple stored procedures, which in practice is a very good job for performance reasons, you need frequently to code a transaction for each procedure even if one procedure must call the other. Then, you have some nested transactions. One transaction inside another transaction. To solve elegantly this case, SQL Server use an asymmetric model of nested transaction :

  • The first BEGIN TRANSACTION encountered in the session, sets the session status in the transactional mode (the autocommit is disabled).
  • All new BEGIN TRANSACTION do nothing to the transactional status.
  • The first ROLLBACK encountered in the session, immediately stops the transactional state and cancel all the works that have been done.
  • The last COMMIT, corresponding to the exact count of BEGIN TRANSACTION that have been executed, save all changes made since the first BEGIN TRANSACTION.

There is no way to manage nested transactions in PostGreSQL

23 – Version compatibility

SQL Server, since the 7 version (1999), have always had a compatibility model, so, today, with the most recent version of SQL Server (2019) you can use a database build for 2008 in a 2019 instance, and this database will have the behavior of a 2008 one.

It is not rare to have a single SQL Server instance running dozens of databases with various compatibility version. This results in a very important save of money and time because managing a single instance with many databases is quicker, simpler and smart with the use of SSMS and maintenance jobs. One of our customers have more than 20 000 database on a single instance, running in different retro-compatibility models…

There is no database compatibility mode in PostGreSQL

24 – Migration facilities

SQL Server can migrate from an edition to one another (even if the targeted edition is of a lower level – from Enterprise to Standard as an example) or from a given version to a upper one. This can be done without any data service interruption by using the mirroring techniques or the AlwaysOn principles. The applications are informed that there is an alternative instance in the connection string for the mirroring technology or by the automatic redirection via a listener create by the failover cluster in the use case of AlwaysOn. This can also be used when a Service Pack (SP) or a Cumulative Update (CU) needs to restart the SQL Server services, which became very rare as time goes by…

There is absolutely no possibilities in PostGreSQL to have any migration from one version to another, minor or major, without breaking the databases service…

Another way to migrate databases in SQL Server can be done by detaching the files of the database (data and transaction log) from one instance to reattach it to one other. This operation take only few milliseconds and the SQL script is quite simple:

EXEC sp_detach_db 'MyDatabase';
:connect MyServerMyNewSQLInstance
CREATE DATABASE MyDatabase
ON (FILENAME = 'C:MS SQLMyDatabase.mdf')
FOR ATTACH ;

PostGreSQL does not have such Migration facilities… pg_upgrade is close to this process but many authors do not recommend it because it’s a bit complicated and the time to do so is largely over the one you will have to do it in SQL Server…

Migrate data using pg_upgrade

How to Perform a Major Version Upgrade Using pg_upgrade in PostgreSQL

Upgrading and updating PostgreSQL

25 – Applying patches

Minor versions of PostGreSQL appears to be the same features as Service Packs (SP) or Cumulative Updates (CU) for SQL Server. But the main difference is PostgreSQL needs systematically a restart of the database engine when SQL Server does rarely need it! Off course you can use the dreadful pg_prewarm,

Remember that every time you stop the service of a database, every table, index or prepared query execution, is dropped out of the cache, and these new users will have to wait for the queries to compute a new execution plan and charge the cache with the table or index data.

26 – Synonyms?

Although I’m not a fan of synonyms this kind of object can have its use especially for all those who come from the world of Oracle…

There is no synonym in the PostGreSQL world…

SQL Server has add synonyms especially for those coming from Oracle !

27 – Physical data integrity

PostGreSQL provides no way to verify the physical integrity of the data stored, as a regular maintenance task that is done in SQL Server via DBCC CHECK… or with Oracle dbverify (or  ALTER TABLE validate) …

This lack of data security can cause severe damages because it cannot be recovered (even the backups can be corrupted). There have been many discussions, since a lot, on this topic:

PostgreSQL Database Data File Integrity Check

Database consistency checker in postgresql

DBCC CHECKDB equivalent in postgres

But the time taken by this process is immeasurably ultra long and therefore totally incompatible with a professional use of databases, especially since it is totally blocking (the SQL Server verification process does not block …).

28 – Maintenance processes

All the maintenance processes in PostGreSQL are terribly time consuming. Especially recomputing optimizer statistics…

In our tests:

  • creating indexes over a 10 million rows table are 15 time longer in PostGreSQL
  • rebuilding tables and/or indexes in a 10 million rows, is about 35 to 42 time longer in PostGreSQL!

The lack of indexing diagnostics, to quickly optimize the performance of a database, makes this maintenance task very time consuming. It is in fact necessary to set up a tool for capturing requests, then analyze them one by one, deduce the indexes to be applied, and finally test the gain obtained …
This is a very long approach taking many days in a year for a single PostGreSQL database compares to the way you can do it in SQL Server, taking rarely more than one day to create many hundred indexes over all the databases inside the instance!

Off course PostGreSQL VACUUMing, even in the recent parallel mode, is very very time consuming and totally unexploitable in professional cases!

29 – complexity of system catalog

PostGreSQL has 129 system tables/views in the pg_catalog schema.

SELECT tablename, schemaname
FROM pg_tables AS c
WHERE schemaname IN ('pg_catalog', 'INFORMATION_SCHEMA')
UNION ALL
SELECT viewname, schemaname
FROM pg_views AS c
WHERE schemaname IN ('pg_catalog', 'INFORMATION_SCHEMA');

SQL Server 693 (tables, views, DMV, including server level).

SELECT name, type_desc
FROM sys.objects
WHERE is_ms_shipped = 1 AND type IN ('S', 'IF', 'TF', 'V')
UNION
SELECT name, type_desc
FROM sys.all_objects
WHERE is_ms_shipped = 1 AND type IN ('S', 'IF', 'TF', 'V') ;

Of course there is a big difference in terms of facilitate access to system data and metadata.

As an example, I never find anyway to have a simple query giving the complete list of all relational indexes, with table schema, table name, index name, key columns, ordinal position and order (ASC/DESC). What it is simple to do in SQL Server:

SELECT S.name AS TABLE_SCHEMA, T.name AS TABLE_NAME, X.name AS INDEX_NAME, 
       C.name AS COLUMN_NAME, IC.key_ordinal AS POSITION, 
       CASE IC.is_descending_key 
           WHEN 0 
              THEN 'ASC' 
           WHEN 1 
              THEN 'DESC' 
       END AS COLUM_ORDER
FROM   sys.indexes AS X
       JOIN sys.tables AS T 
          ON X.object_id = T.object_id
       JOIN sys.schemas AS S 
          ON T.schema_id = S.schema_id
       JOIN sys.index_columns AS IC 
          ON X.object_id = IC.object_id AND X.index_id = IC.index_id
       JOIN sys.columns AS C 
          ON IC.object_id = C.object_id AND IC.column_id = C.column_id
WHERE  IC.is_included_column = 0
AND    X.type BETWEEN 1 AND 2

… is a little bit complicated to have in PostGreSQL.

SELECT tnsp.nspname as table_schema,
       tbl.relname as table_name,
       ic.relname as index_name,
       CASE
          WHEN idx.indexprs is null 
             THEN col.attname
          ELSE pg_catalog.pg_get_indexdef(ic.oid, col.attnum, false)
       END as column_expression,
       col.attnum as position,
       am.amname as index_type,
       CASE am.amname
          WHEN 'btree' 
             TNEN CASE idx.indoption[col.attnum - 1] & 1
                     WHEN 1 
                        THEN 'DESC' 
                     ELSE 'ASC'
                  END
             ELSE null
       END as column_order
FROM   pg_index idx
       JOIN pg_attribute col 
          ON col.attrelid = idx.indexrelid
       JOIN pg_class tbl 
          ON tbl.oid = idx.indrelid
       JOIN pg_class ic 
          ON ic.oid = idx.indexrelid
       JOIN pg_am am 
          ON ic.relam = am.oid
       JOIN pg_namespace tnsp 
          ON tnsp.oid = tbl.relnamespace
WHERE  tnsp.nspname NOT IN ('pg_catalog', 'pg_toast')
ORDER  BY table_schema, table_schema, index_name, col.attnum;

You have to know 5 tables, write 5 joins, use a function (pg_get_indexdef) deal with a array (indoption[col.attnum - 1]), use three CASEs with two nested CASEs… Will it be simple for an ordinary developer?

Accessing metadata and system objects is a nightmare in PostGreSQL. Nothing is simple and system tables changes every version… So, the system queries written for one version will probably have to be rewritten for the next version. What a waste of time!

30 – Bad advices

A lot of bad advice are being distilled by PostGresSQL aficionados. One of the most frightening consists in saying that it is necessary to always take the type « text » for any literal column instead of CHAR/VARCHAR. It is imbecility that can cost a lot… Indeed, the type « text » has no size limit. It is practical and it could be harmless … Just think about this simple example: by coding a telephone as « text » and associating it with check constraint with a regular expression, if the user decides to enter a made-up telephone number of 1 G bytes of data… The process to verify such an entry can hurt the system and this can be a typical DOS attack!

31 – Stupid terminology

In the RDBMS field, there is a standard terminology to designate such or such element.

In particular, an engine installed from a DBMS is called INSTANCE, but not in PostGreSQL in which we lose a cluster! This term being inappropriate because a cluster is a site regrouping several machines cooperating, converging towards a single solution, which is not at all the case for this unfortunate term chosen by PostGreSQL!

Another inappropriate term is the word planer to refer to the OPTIMIZER. To make the French public laugh, whose term « planer » designates a hover, I amuse myself to say that it is because the staff of PostGreSQL is probably under the influence of LSD! Every other RDBMS uses the term OPTIMIZER …

Perhaps it is by snobbery that PostGreSQL prefers the term relation rather than the term TABLE, to designate a table! However, a relation in the mathematical sense of the term, as defined by Tedd Codd, the creator of RDBMS, has particular properties: key (primary) mandatory, no NULL, atomic data… This is obviously not the case with all tables, and the fact that PostGreSQL allows such exotic types as arrays or enums violates the very principles of relational databases …

By snobbery also PostGreSQL uses wrongly and through, the word tuple to designate a ROW in a table ! The same for attribute, which is simply a COLUMN in a table or a view…

Has Chris Date said – one of the major RDBMS contributor – inapropriate terminology introduce confusion in minds…

You probably noticed right away, […] that I used the formal terms relation, tuple, and attribute. SQL doesn’t use these terms, however – its uses the more « user friendly » terms tables, row, and column instead. And I’m generally sympathetic to the idea of using more user friendly terms, if they can help make the ideas more palatable; instead, they distort them, and in fact do the cause of genuine understanding a grave disservice. The truth is, a relation is not a table, a tuple is not a row, and an attribute is not a column…

SQL and the Relational Theory – How to Write Accurate SQL Code – C. J. Date – O’Reilly 2009

Bad news from SQL Server

Of course, as Osgood Fielding III says in the final scene of Billy Wilder 1959 movie: « Nobody’s perfect »… well, SQL Server too!

One terrible thing I hate from SQL Server is that the UNIQUE constraint accepts only one NULL, suggesting that the NULL is unique…
There is a workaround that consists to create a unique filtered index for the non-NULL values…. But it’s a pity. Not Microsoft fault, but Sybase’s one. Like what inheritances are not always good to take.

Good news from PostGreSQL

One thing I like in PostGreSQL is the « Exclusion Constraint », avoiding the use of trigger. I would like this feature be implemented in SQL Server…

Another thing that I do not understand why we don’t have in SQL Server, is the ability to use the standard ISO SQL WINDOW clause!

Congratulation

Finally, I would like to thank Mr. Billel Abidi from Altavia who gave me, despite himself, the idea of writing this paper …