OpenAI CLIP: ConnectingText and Images (Paper Explained) Yannic Kilcher Kho Tổng Hợp 173,218 5 năm trước Add Nghe mp3 Facebook Tweet XEM MÔ TẢ #ai #openai #technology Paper Title: Learning Transferable Visual Models From Natural Language Supervision CLIP trains on 400 million images scraped from the web, along with text descriptions to learn a model that can connect the two modalities. The core idea is a contrastive objective combined with a large batch size. The resulting model can be turned into arbitrary zero-shot classifiers for new image & text tasks. OUTLINE: 0:00 - Introduction 3:15 - Overview 4:40 - Connecting Images & Text 9:00 - Building Zero-Shot Classifiers 14:40 - CLIP Contrastive Training Objective 22:25 - Encoder Choices 25:00 - Zero-Shot CLIP vs Linear ResNet-50 31:50 - Zero-Shot vs Few-Shot 35:35 - Scaling Properties 36:35 - Comparison on different tasks 37:40 - Robustness to Data Shift 44:20 - Broader Impact Section 47:00 - Conclusion & Comments Paper: https://cdn.openai.com/papers/Learning_Transferable_Visual_Models_From_Natural_Language_Supervision.pdf Blog: https://openai.com/blog/clip/ Code: https://github.com/openai/CLIP Abstract: State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. Authors: Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/ BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n Video liên quan 3:53 Change Young P & Moe Ko Ko Lyrics video Wai Linn Oo 85,876 view 4 năm trước Add 24:09 SPECTACULAR RESTORED CASTLE FOR SALE IN UMBRIA Romolini - Christie's Real Estate 277,256 view 1 năm trước Add 0:23 #kikakim Kika Kim 130,875,870 view 1 năm trước Add 13:53 TOUR CÀ RI NHẬT: Cà ri thơm ngon nhưng được ăn đồ đắt tiền mới là lý do chính cho video này 🤗 HÔM NAY ĂN GÌ 14,062 view 3 tháng trước Add 16:40 TIÊU ĐIỂM: Khi lợn bệnh phù phép thành lợn sạch | VTV24 VTV24 51,745 view 2 tháng trước Add 33:58 SAPA TV | ẨM THỰC ĐÁM CƯỚI NGƯỜI DAO ĐỎ Ở SAPA SAPA TV 1,816,206 view 4 năm trước Add 1:15 FAN TRAILER: Frozen: Live Action - Anya Taylor-Joy (Parody) Royal Trailer 27,966 view 1 năm trước Add 0:43 Breaking news | Kourtney files for divorce after discovering Travis cheated on her. Celeb daily info 2,176,887 view 1 năm trước Add 13:38 LUXURY PRIVATE ISLAND ON THE AMALFI COAST Romolini - Christie's Real Estate 296,849 view 2 tháng trước Add 6:22 남들 따라 던지면 주식 이렇게 됩니다 #뉴스A라이브 #채널A 채널A News 4,393 view 3 tháng trước Add 6:15 True Cost Accounting in the Food System - Tom McDougall, 4P Foods Food Tank 520 view 11 năm trước Add 3:29 Mother Teresa Lil Kee Boi - Topic 3,279,733 view 1 năm trước Add 38:47 Jimmy The Welsh Viking sits down for a beer, Yorkshire pudding wrap and a chat - Episode 16 Saxon Forager 336 view 2 năm trước Add 32:08 Chứng khoán hôm nay | Nhận định thị trường : Thế có lấp gap không, Vnindex vẫn chưa chết hẳn đâu Tú ATS 8,274 view 2 tháng trước Add 2:33 비닐 대신 종이로 약 포장…세탁소도 한숨 / 채널A / 뉴스A 채널A News 23,320 view 2 tháng trước Add 3:12:13 [#again_playlist] 가요대축제 예열하기 1탄! 1992~1996년 데뷔 편 | KBS 방송 Again 가요톱10 : KBS KPOP Classic 752,555 view 1 năm trước Add 1:06 Cách ăn ô mai phố cổ sao cho đúng #hanoionline #shorts #hanoi HTV - Đài Hà Nội 2,091 view 1 năm trước Add 48:18 THE LOT LIVE - DJMJ × MV ( BOUYON 2026) DJ MJ 85,251 view 5 tháng trước Add 2:36 One More Time Young P + Moe Ko Ko - Topic 9,952 view 1 năm trước Add 3:36 LAUNG Young P + Moe Ko Ko - Topic 833 view 1 năm trước Add