T3:Deep Learning for Language and Vision Intelligence


 
讲者: 何晓冬(微软) 

 
Abstract:Deep learning has been the driving force in the recent renaissance of Artificial Intelligence (AI). In this tutorial, I will introduce recent deep learning advances in selected cognitive AI technologies in the interdisciplinary area of natural language processing and computer vision, including comprehension, reasoning, and generation across both modalities. Specifically, I’ll introduce learning of semantic representations across vision and natural language, which is critical for multimodal intelligence. Then I’ll cover key research progresses in the fronts of visual captioning, i.e., understanding the visual content and generating natural language descriptions; visual question answering, i.e., performing reasoning across both natural language and vision to infer answers; and image synthesis, i.e., generating images following natural language descriptions. In particular, I’ll emphasize interpretability and controllability in learning algorithms which are of fundamental importance to general intelligence. At the end of the tutorial, I’ll discuss future AI breakthrough that will benefit from multimodal intelligence, which empowers the communication between humans and the real world and enables enormous scenarios such as human-like chatbot, smart city, and intelligent augmented reality.(讲义内容为英文,报告语言为中文。)

个人简介:何晓冬博士是美国雷德蒙微软研究院深度学习组首席研究员,西雅图华盛顿大学电机工程系兼任教授。他的研究兴趣主要集中在人工智能领域,包括深度学习,自然语言处理,计算机视觉,语音识别,信息检索及知识表示。他和同事发明的DSSM(深度结构语义模型),获得2015年ACL优秀论文奖,并在2008年美国NIST机器翻译比赛,2015 COCO图像字幕挑战赛,以及最近的VQA2017视觉问答大赛上获得第一名。他的工作被福布斯、华盛顿邮报、CNN、BBC等媒体广泛报道,对微软计算机视觉智能云服务及Office、Seeing AI、CaptionBot等产品有重要影响。他也兼任IEEE多个期刊的客座主编、副主编和重要学术会议的领域主席。2016年任IEEE西雅图分会主席。

Brief introduction:Xiaodong He is a Principal Researcher in the Deep Learning Technology Center of Microsoft Research AI, Redmond, WA, USA. He is also an Affiliate Professor in the Department of Electrical Engineering at the University of Washington (Seattle), serves in doctoral supervisory committees. His research interests are mainly in artificial intelligence areas including deep learning, natural language processing, computer vision, speech, information retrieval (IR), and knowledge representation. He has published more than 100 papers in ACL, EMNLP, NAACL, CVPR, SIGIR, WWW, CIKM, NIPS, ICLR, ICASSP, Proc. IEEE, IEEE TASLP, IEEE SPM, and other venues. He received several awards including the Outstanding Paper Award at ACL 2015. He and colleagues invented the DSSM which is broadly applied to language, vision, IR and knowledge representation tasks. He has led the development of the MSR-NRC-SRI entry and the MSR entry that won the No. 1 Place in the 2008 NIST Machine Translation Evaluation and the 2011 IWSLT Evaluation (Chinese-to-English), respectively. He and colleagues also won the first prize, tied with Google, at the COCO Captioning Challenge 2015, and won the first prize at the Visual Question Answering (VQA) Challenge 2017. His work was reported by Communications of the ACM in January 2016. He is leading the image captioning effort now is part of Microsoft Cognitive Services, which provides the world’s first image-captioning cloud service, and enables next-generation scenarios such as CaptionBot, Seeing AI, and empowers Microsoft Word and PowerPoint to create image descriptions automatically for millions of users. The work was widely covered in media including Business Insider, TechCrunch, Forbes, The Washington Post, CNN, BBC. He has held editorial positions on several IEEE Journals, served as an area chair for NAACL-HLT 2015, and served in the organizing committee/program committee of major speech and language processing conferences. He is an elected member of the IEEE SLTC for the term of 2015-2017. He is a senior member of IEEE and a member of ACL. He was elected as the Chair of the IEEE Seattle Section in 2016.