深度学习资料汇总,含论文、数据集、学习课程、书籍、博客、教程...

AI科技大本营 2018-08-27 14:55
关注文章


还记得下面这张脑图吗?机器学习、NLP、Python、数学。。最全的AI学习资源都在这了!!,受到了大家的很多好评,感谢大家的支持。其实在学习过程中搜集学习资料是一件经常会做的事,但也是一件非常费时费力的事情,所以 AI科技大本营的各位编辑们日常就会特别关注留意一些不错的学习资源,分享给需要的同学们。我们将这些资料进行归纳整理,用一目了然的方式呈现给大家,希望有助于大家的检索利用。



其实上周推荐的内容中并不涉及论文和书籍,因为对于刚开始学习 AI 的同学们来说,一上来就读论文对学习基础内容并不是好的选择,所以上次给大家的内容更多的是好的公开学习课程或者一些博主的经验和心得,也许有些文章里面的方法和观点可以给大家不同的启迪。


本期内容为大家推荐的是 DL 方面的学习内容,而这次将会有很多的论文。现在 DL 是 ML 领域最流行火热的方法了,想深入学习这些 模型最好的方法就是读论文。无论是经典的、流行的还是最新的,作为一个需要 DL 的研究者都是需要学习的。虽然我们无法列出所有的论文,但是我们推荐都是在每个方向都应该研读的。而且我们还把这次的内容和上次做了一些结合,这样每次收藏整理和学习的知识都可以相关联,提高学习效益。



下面从下图开始正式介绍这次的内容。



这次最主要的内容就是和 DL 相关的论文清单了。其次第二部分是数据集,为大家整理出这些数据集就是希望大家在实践时知道有哪些公开的好的数据集可以直接使用。也为我们后续的内容做一个伏笔。


本次重点内容有:


论文


与模型(Model)相关论文:22 篇



一些重要&核心论文:17 篇



应用类型论文:38 篇



数据集


Images(图像类数据集)



General
  • MNIST Handwritten digits: [Link]
Face
  • Face Recognition Technology (FERET) The goal of the FERET program was to develop automatic face recognition capabilities that could be employed to assist security, intelligence, and law enforcement personnel in the performance of their duties: [Link]
  • The CMU Pose, Illumination, and Expression (PIE) Database of Human Faces Between October and December 2000 we collected a database of 41,368 images of 68 people: [Link]
  • YouTube Faces DB The data set contains 3,425 videos of 1,595 different people. All the videos were downloaded from YouTube. An average of 2.15 videos are available for each subject: [Link]
  • Grammatical Facial Expressions Data Set Developed to assist the the automated analysis of facial expressions: [Link]
  • FaceScrub A Dataset With Over 100,000 Face Images of 530 People: [Link]
  • IMDB-WIKI 500k+ face images with age and gender labels: [Link]
  • FDDB Face Detection Data Set and Benchmark (FDDB): [Link]
Object Recognition
  • COCO Microsoft COCO: Common Objects in Context: [Link]
  • ImageNet The famous ImageNet dataset: [Link]
  • Open Images Dataset Open Images is a dataset of ~9 million images that have been annotated with image-level labels and object bounding boxes: [Link]
  • Caltech-256 Object Category Dataset A large dataset object classification: [Link]
  • Pascal VOC dataset A large dataset for classification tasks: [Link]
  • CIFAR 10 / CIFAR 100 The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes. CIFAR-100 is similar to CIFAR-10 but it has 100 classes containing 600 images each: [Link]
Action recognition

  • HMDB a large human motion database: [Link]
  • MHAD Berkeley Multimodal Human Action Database: [Link]
  • UCF101 - Action Recognition Data Set UCF101 is an action recognition data set of realistic action videos, collected from YouTube, having 101 action categories. This data set is an extension of UCF50 data set which has 50 action categories: [Link]
  • THUMOS Dataset A large dataset for action classification: [Link]
  • ActivityNet A Large-Scale Video Benchmark for Human Activity Understanding: [Link]


Text and Natural Language Processing(文本&自然语言处理)


Text and Natural Language Processing

General
  • 1 Billion Word Language Model Benchmark: The purpose of the project is to make available a standard training and test setup for language modeling experiments: [Link]
  • Common Crawl: The Common Crawl corpus contains petabytes of data collected over the last 7 years. It contains raw web page data, extracted metadata and text extractions: [Link]
  • Yelp Open Dataset: A subset of Yelp's businesses, reviews, and user data for use in personal, educational, and academic purposes: [Link]
Text classification
  • 20 newsgroups The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups: [Link]
  • Broadcast News The 1996 Broadcast News Speech Corpus contains a total of 104 hours of broadcasts from ABC, CNN and CSPAN television networks and NPR and PRI radio networks with corresponding transcripts: [Link]
  • The wikitext long term dependency language modeling dataset: A collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. : [Link]
Question Answering
  • Question Answering Corpus by Deep Mind and Oxford which is two new corpora of roughly a million news stories with associated queries from the CNN and Daily Mail websites. [Link]
  • Stanford Question Answering Dataset (SQuAD) consisting of questions posed by crowdworkers on a set of Wikipedia articles: [Link]
  • Amazon question/answer data contains Question and Answer data from Amazon, totaling around 1.4 million answered questions: [Link]
Sentiment Analysis
  • Multi-Domain Sentiment Dataset TThe Multi-Domain Sentiment Dataset contains product reviews taken from Amazon.com from many product types (domains): [Link]
  • Stanford Sentiment Treebank Dataset The Stanford Sentiment Treebank is the first corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language: [Link]
  • Large Movie Review Dataset: This is a dataset for binary sentiment classification: [Link]
Machine Translation
  • Aligned Hansards of the 36th Parliament of Canada dataset contains 1.3 million pairs of aligned text chunks: [Link]
  • Europarl: A Parallel Corpus for Statistical Machine Translation dataset extracted from the proceedings of the European Parliament: [Link]
Summarization
  • Legal Case Reports Data Set as a textual corpus of 4000 legal cases for automatic summarization and citation analysis.: [Link]

Speech Technology

  • TIMIT Acoustic-Phonetic Continuous Speech Corpus The TIMIT corpus of read speech is designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems: [Link]
  • LibriSpeech LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey: [Link]
  • VoxCeleb A large scale audio-visual dataset: [Link]
  • NIST Speaker Recognition: [Link]


学习课程+书籍+博客+教程


四部分汇总




Courses

_img/mainpage/online.png

  • Machine Learning by Stanford on Coursera : [Link]
  • Neural Networks and Deep Learning Specialization by Coursera: [Link]
  • Intro to Deep Learning by Google: [Link]
  • NVIDIA Deep Learning Institute by NVIDIA: [Link]
  • Convolutional Neural Networks for Visual Recognition by Stanford: [Link]
  • Deep Learning for Natural Language Processing by Stanford: [Link]
  • Deep Learning by fast.ai: [Link]
  • Course on Deep Learning for Visual Computing by IITKGP: [Link]

Books

_img/mainpage/books.jpg

  • Deep Learning by Ian Goodfellow: [Link]
  • Neural Networks and Deep Learning : [Link]
  • Deep Learning with Python: [Link]
  • Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems: [Link]

Blogs

_img/mainpage/Blogger_icon.png

  • Colah's blog: [Link]
  • Andrej Karpathy blog: [Link]
  • The Spectator Shakir's Machine Learning Blog: [Link]
  • WILDML: [Link]
  • Distill blog It is more like a journal than a blog because it has a peer review process and only accepyed articles will be published on that.: [Link]
  • BAIR Berkeley Artificial Inteliigent Research: [Link]
  • Sebastian Ruder's blog: [Link]
  • inFERENCe: [Link]
  • i am trask A Machine Learning Craftsmanship Blog: [Link]

Tutorials

_img/mainpage/tutorial.png

  • Deep Learning Tutorials: [Link]
  • Deep Learning for NLP with Pytorch by Pytorch: [Link]
  • Deep Learning for Natural Language Processing: Tutorials with Jupyter Notebooks by Jon Krohn: [Link]


精彩预告

DL 相关的内容只看论文和教程的话还是远远不够的,必须要亲自上阵写代码,所以,下一次我们会给大家特别介绍关于 DL 领域的一些实践案例,而这些实践很多都是用到了上面提到的部分数据集,这也是刚刚说的埋下的伏笔,敬请期待。




更多链接与详情可访问原文:

https://github.com/astorfi/Deep-Learning-World#image

编辑 | Jane

出品 | AI科技大本营


微信扫一扫
关注该公众号

文章被以下专辑收录
AI
{{panelTitle}}
支持Markdown和数学公式,公式格式:\\(...\\)或\\[...\\]

还没有内容

关注微信公众号