تاریک روشن

سی‌وید

سـرگـرمی
کـودکـان
ورزشــی
عــلـم و فـنـاوری
خــودرو و وســایـل نـقـلـیه
مـوسـیقـی
اخــبـار
بـازی و سـرگـرمی
حـیـوانـات و طـبـیعت
مــذهـبـی

تاریک روشن

صفحه اصلی
DCMA
کمک به خیریه محک

سی‌وید

سـرگـرمی
کـودکـان
ورزشــی
عــلـم و فـنـاوری
خــودرو و وســایـل نـقـلـیه
مـوسـیقـی
اخــبـار
بـازی و سـرگـرمی
حـیـوانـات و طـبـیعت
مــذهـبـی

تاریک روشن

صفحه اصلی
DCMA
کمک به خیریه محک

True Multimodal RAG - Audio/Image/Video/Text

Adam Lucek منتشر شده در تاریخ 1403/04/25

1.7 هزار بار بازدید - ماه قبل - Everyone knows general text based

Everyone knows general text based vector databases, and text based RAG for LLM applications, but as it turns out thats just the beginning! Taking advantage of CLIP & CLAP models along with some fancy tricks, we embed 25,000 text entries, 1999 pictures, 2000 audio files, and 99 videos into a single vector database, allowing us to run direct text to text/audio/image/video retrieval!

Resources:
Multimodal Image RAG Video:
Code: https://github.com/ALucek/true-multim...
Colab Notebook: https://colab.research.google.com/dri...

Chapters:
00:00 - Intro
01:04 - CLIP Model Review
02:08 - CLAP Model Overview
02:35 - Modality 1: Audio Setup & Dataset
03:45 - Modality 1: Custom Audio Embedding & Loader Functions
05:40 - Modality 1: Audio Embedding & Testing Retrieval
07:38 - Modality 2: Image Setup & Dataset
08:52 - Modality 2: Image Embedding & Testing Retrieval
09:46 - Modality 3: Text Setup & Dataset
10:24 - Modality 3: Text Embedding
12:22 - Modality 3: Testing Text Retrieval
13:06 - Modality 4: Video Setup & Methodology
15:06 - Modality 4: Video Dataset & Embedding
16:22 - Modality 4: Testing Video Retrieval
17:10 - Full Multimodal Retrieval!
18:34 - RAG: Setup
19:26 - RAG: Prompt Setup
20:25 - RAG: Full Multimodal Retrieval Augmented Generation
21:15 - Outro

#ai #coding #generativeai

ماه قبل در تاریخ 1403/04/25 منتشر شده است.

1,723 بـار بازدید شده

... بیشتر

21:33

Python RAG Tutorial (with Local LLMs): AI For Your PDFs

31:14

LLM Function Calling - AI Tools Deep Dive

18:05

How AI 'Understands' Images (CLIP) - Computerphile

25:21

Model Distillation: Same LLM Power but 3240x Smaller

1:00:40

How to Build a Real-Time Multimodal RAG Application in Minutes

23:43

RAG But Better: Rerankers with Cohere AI

16:56

Big Tech AI Is A Lie

15:26

Powerful AI Prompting Hacks Most People Don't Talk About

16:11

Agentic RAG: Make Chatting with Docs Smarter

22:20

Multimodal RAG!? - Pushing the Boundaries of AI

9:06

10 weird algorithms

41:28

Realtime Multimodal RAG Usecase Part 1 | Extract Image,Table,Text from Documents #rag #multimodal

36:39

Breaking Down & Testing FIVE LLM Agent Architectures - (Reflexion, LATs, P&E, ReWOO, LLMCompiler)

58:19

Make YOUR OWN Images With Stable Diffusion - Finetuning Walkthrough

13:08

Multimodal RAG with GPT-4-Vision and LangChain | Retrieval with Images, Tables and Text

42:12

18 Months of Building Autonomous AI Agents in 42 Minutes

15:32

RAG from the Ground Up with Python and Ollama

44:59

Why are vector databases so FAST?

43:53

Build Your Own Finance LLM for FREE with SEC Data

11:05

Microsoft GraphRAG Alternative and 10x Cheaper?

اشــتـراک گـذاری

دانــلـود

این امکان در حال حاضر وجود ندارد.

بـیـشــتر

شناسه ویدئو : qCAvqsBbN2Y