{"id":12127,"date":"2026-04-24T09:13:10","date_gmt":"2026-04-24T07:13:10","guid":{"rendered":"https:\/\/spgoo.org\/?page_id=12127"},"modified":"2026-04-24T09:58:55","modified_gmt":"2026-04-24T07:58:55","slug":"doc-bdd-vectorielles","status":"publish","type":"page","link":"https:\/\/spgoo.org\/?page_id=12127","title":{"rendered":"Doc BDD vectorielles"},"content":{"rendered":"\n<p>Premi\u00e8re description d&#8217;une base de donn\u00e9es vectorielles<\/p>\n\n\n\n<p>Lorsqu&#8217;on va vouloir comparer les \u00e9l\u00e9ments qui ressemblent \u00e0 d&#8217;autres, on n&#8217; pas les outils n\u00e9cessaires pour le faire dans une base de donn\u00e9es classiques de type SQL ou NoSql. On dispose bien entendu de recherche exacte ou approximative qui se lient sur l&#8217;attribut : like &#8216;%&#8230;%&#8217; ou avec des expressions r\u00e9guli\u00e8res. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"475\" src=\"https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-47-16-1024x475.png\" alt=\"\" class=\"wp-image-12139\" srcset=\"https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-47-16-1024x475.png 1024w, https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-47-16-300x139.png 300w, https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-47-16-768x356.png 768w, https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-47-16.png 1170w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"has-vivid-cyan-blue-color has-text-color has-link-color has-medium-font-size wp-elements-a4aa84471a4fa74d90fd4b41948ef1df\">Qu&#8217;est ce qu&#8217;un vecteur (Embedding)<\/p>\n\n\n\n<p>Un embedding est une repr\u00e9sentation num\u00e9rique d&#8217;une donn\u00e9e (texte, image, son) sous forme de liste de r\u00e9els. Ce vecteur encode le sens ou les caract\u00e9ristiques de la donn\u00e9e dans un espace math\u00e9matique.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"618\" src=\"https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-47-59-1024x618.png\" alt=\"\" class=\"wp-image-12138\" style=\"aspect-ratio:1.656966792194454;width:780px;height:auto\" srcset=\"https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-47-59-1024x618.png 1024w, https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-47-59-300x181.png 300w, https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-47-59-768x464.png 768w, https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-47-59.png 1305w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p>Les mots de sens proche se regroupent &#8211; la distance = la similarit\u00e9 s\u00e9mantique. En r\u00e9alist\u00e9, les embeddings ont des centaines ou milliers de dimensions (OpenAI text-embedding-3-small = 1536 dimensions).<\/p>\n\n\n\n<p class=\"has-vivid-cyan-blue-color has-text-color has-link-color has-medium-font-size wp-elements-4dc72c22a56b2a262197c02ab4519689\">Comment mesurer la similarit\u00e9 ?<\/p>\n\n\n\n<p>La mesure la plus courante est la similarit\u00e9 cosinus &#8211; elle calcule l&#8217;angle entre deux vecteurs. <\/p>\n\n\n\n<p>Le d\u00e9fi : recherche dans des millions de vecteurs <\/p>\n\n\n\n<p>Comparer un vecteur \u00e0 tous les vecteurs de la base (en force brute) une par une devient intenable \u00e0 grande \u00e9chelle. Les bases de donn\u00e9es vectorielles utilisent des index sp\u00e9ciaux pour trouver rapidement les voisins les plus proches.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"557\" src=\"https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-48-59-1024x557.png\" alt=\"\" class=\"wp-image-12137\" srcset=\"https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-48-59-1024x557.png 1024w, https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-48-59-300x163.png 300w, https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-48-59-768x417.png 768w, https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-48-59.png 1387w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p>L&#8217;algorihme phare est HNSW ( Hierarchical Navigable Small World)<\/p>\n\n\n\n<p>HNSW atteint des complexit\u00e9s sub-lin\u00e9aires (~O(log(n)) contre O(n) pour la recherche brute. D&#8217;autres algorithmes existent IVF(Inverted File Index), PQ(Product Quantization, Scann, etc.<\/p>\n\n\n\n<p class=\"has-vivid-cyan-blue-color has-text-color has-link-color has-medium-font-size wp-elements-d100d50ea7ffc591cfe764923bac9f69\">Architecture d&#8217;un pipeline RAG <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"680\" src=\"https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-49-53-1024x680.png\" alt=\"\" class=\"wp-image-12136\" srcset=\"https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-49-53-1024x680.png 1024w, https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-49-53-300x199.png 300w, https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-49-53-768x510.png 768w, https:\/\/spgoo.org\/wp-content\/uploads\/2026\/04\/Capture-decran-du-2026-04-24-09-49-53.png 1169w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>Premi\u00e8re description d&#8217;une base de donn\u00e9es vectorielles Lorsqu&#8217;on va vouloir comparer les \u00e9l\u00e9ments qui ressemblent \u00e0 d&#8217;autres, on n&#8217; pas les outils n\u00e9cessaires pour le faire dans une base de donn\u00e9es classiques de type SQL ou NoSql. On dispose bien entendu de recherche exacte ou approximative qui se lient sur l&#8217;attribut : like &#8216;%&#8230;%&#8217; ou [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-12127","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/spgoo.org\/index.php?rest_route=\/wp\/v2\/pages\/12127","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/spgoo.org\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/spgoo.org\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/spgoo.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/spgoo.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=12127"}],"version-history":[{"count":12,"href":"https:\/\/spgoo.org\/index.php?rest_route=\/wp\/v2\/pages\/12127\/revisions"}],"predecessor-version":[{"id":12149,"href":"https:\/\/spgoo.org\/index.php?rest_route=\/wp\/v2\/pages\/12127\/revisions\/12149"}],"wp:attachment":[{"href":"https:\/\/spgoo.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=12127"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}