OpenAI CLIP: ConnectingText and Images (Paper Explained) Yannic Kilcher Kho Tổng Hợp 173,215 5 năm trước Add Nghe mp3 Facebook Tweet XEM MÔ TẢ #ai #openai #technology Paper Title: Learning Transferable Visual Models From Natural Language Supervision CLIP trains on 400 million images scraped from the web, along with text descriptions to learn a model that can connect the two modalities. The core idea is a contrastive objective combined with a large batch size. The resulting model can be turned into arbitrary zero-shot classifiers for new image & text tasks. OUTLINE: 0:00 - Introduction 3:15 - Overview 4:40 - Connecting Images & Text 9:00 - Building Zero-Shot Classifiers 14:40 - CLIP Contrastive Training Objective 22:25 - Encoder Choices 25:00 - Zero-Shot CLIP vs Linear ResNet-50 31:50 - Zero-Shot vs Few-Shot 35:35 - Scaling Properties 36:35 - Comparison on different tasks 37:40 - Robustness to Data Shift 44:20 - Broader Impact Section 47:00 - Conclusion & Comments Paper: https://cdn.openai.com/papers/Learning_Transferable_Visual_Models_From_Natural_Language_Supervision.pdf Blog: https://openai.com/blog/clip/ Code: https://github.com/openai/CLIP Abstract: State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. Authors: Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/ BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n Video liên quan 39:16 🟢Thứ 7 bánh xèo nhí khách đông tấp nập sớm quá bà con ơi Saigon food 14 view 1 tháng trước Add 12:33 Ranked, Shyvana & More | Dev Update - League of Legends League of Legends 363,043 view 3 tháng trước Add 43:37 Ghost Hunters- Lighthouse Inn & John Stone Tavern | FULL EPISODE | Sn 1 | Ep 3 | Lionsgate TV LionsgateTV 41,320 view 9 tháng trước Add 20:52 🔴Trực tiếp: Lễ tuyên thệ nhậm chức của Chủ tịch Quốc hội khóa XVI Báo Nông nghiệp và Môi trường 482,878 view 2 tháng trước Add 5:58 etalk Extended - Ellen Page on her eye-opening ‘Gaycation’ experience (06/01/2016) Elliot Page Online 10,272 view 10 năm trước Add 4:24 Cách làm Sốt Dầu Trứng đơn giản chỉ trong 10 phút Cleanup 98 21,341 view 4 năm trước Add 4:33 Kết Quả Xổ Số Miền Nam ngày 11/04/2026, KQXS Miền Nam: TP. HCM, Long An, Bình Phước, Hậu Giang Xổ Số Minh Ngọc - XoSoMinhNgoc.net.vn 210,495 view 2 tháng trước Add 1:29:22 HUYỀN THOẠI V-POP - Bầu Trời Ký Ức 8X 9X | Quang Vinh, Đan Trường, Lam Trường, Phương Thanh... Top Nhạc Việt 1,114,327 view 1 năm trước Add 15:19 Grow These 7 Perennial Crops for Endless Harvests! Epic Gardening 1,312,227 view 1 năm trước Add 58:50 He Got a System That Pays Him Back Triple Every Time He Spends! | Manhwa Recap Bablak Bonsignore 210,447 view 2 tháng trước Add 1:04:42 Do Số Anh Nghèo / Cay Đắng Anh Mang Hoàng Lam - Topic 2,065 view 2 tháng trước Add 33:34 THE MOST INSANE COMEBACK IN CHESS HISTORY?! GothamChess 617,325 view 2 tháng trước Add 13:37 Nấu mâm thịt vịt lấy lại sức vì nhiều chuyện ngoài ý muốn Thịt vịt cháy tỏi Gỏi vịt Cháo vịt 7 ở Hàn Quốc 41,632 view 5 tháng trước Add 6:29 VLOG ĐẶC BIỆT: ĂN TẾT Ở QUÊ DUY THẨM THÌ NHƯ THẾ NÀO???=)) Ngô Đức Duy 155,086 view 7 năm trước Add 0:49 Camera ghi cảnh nam thanh niên cướp vàng, bỏ chạy | VnExpress Báo Điện Tử VnExpress 2,350 view 6 tháng trước Add 12:34 Tin tức việt nam thời sự mới nhất ngày 12/4/2026 ✈ Tin Nóng Chính Trị Việt Nam và Thế Giới HCTV - TIN MỚI 91,879 view 2 tháng trước Add 1:49 Marvel Television’s Daredevil: Born Again Season 2 | Official Trailer Marvel Entertainment 3,910,517 view 2 tháng trước Add 24:01 FLOYD MAYWEATHER: The Genius Calculus Behind the Biggest Fight in History ALL THE SMOKE FIGHT 15,849,856 view 11 năm trước Add 43:37 Ghost Hunters- Mishler Theatre | FULL EPISODE | Season 1 | Episode 2 | Lionsgate TV LionsgateTV 30,911 view 9 tháng trước Add 1:11:31 Firebuds Best Moments 🚒 | Compilation | @disneyjr Disney Jr. 4,213,645 view 2 năm trước Add