Animated GIFs are widely used on the Internet to express emotions, but automatic analysis of their content is largely unexplored. To help with the search and recommendation of GIFs, we aim to predict how their emotions will be perceived by humans based on their content. Since previous solutions to this problem only utilize image-based features and lose all the motion information, we propose to use 3D convolutional neural networks (CNNs) to extract spatiotemporal features from GIFs. We evaluate our methodology on a crowdsourcing platform called GIFGIF with more than 6,000 animated GIFs, and achieve better accuracy than any previous approach in predicting crowdsourced intensity scores of 17 emotions. We have also found that our trained model can be used to distinguish and cluster emotions in terms of valence and risk perception.