ABSTRACT
As a 360-degree image carries information of all directions, it often has too much information. Moreover, in order to investigate a 360-degree image on a 2D display, a user has to either click and drag the image with a mouse, or project it to a 2D panorama image, which inevitably introduces severe distortions. In consequence, investigating a 360-degree image and finding an object of interest in such a 360-degree image could be a tedious task. To resolve this issue, this paper proposes a method to find a region of interest and produces a 2D naturally looking image from a given 360-degree image that best matches a description given by a user in a natural language sentence. Our method also considers photo composition so that the resulting image is aesthetically pleasing. Our method first converts a 360-degree image to a 2D cubemap. As objects in a 360-degree image may appear distorted or split into multiple pieces in a typical cubemap, leading to failure of detection of such objects, we introduce a modified cubemap. Then our method applies a Long Short Term Memory (LSTM) network based object detection method to find a region of interest with a given natural language sentence. Finally, our method produces an image that contains the detected region, and also has aesthetically pleasing composition.
KEYWORD
360 image, deep learning, natural language processing, LSTM, photo composition
REFERENCES(19)
-
[confproc] R. Girshick / 2014 / Rich feature hierarchies for accurate object detection and semantic segmentation / Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition : 580 ~ 587
-
[confproc] R. Girshick / 2015 / Fast r-cnn / Proceedings of the IEEE International Conference on Computer Vision : 1440 ~ 1448
-
[other] J. Dai / 2016 / R-FCN : Object detection via region-based fully convolutional networks / arXiv preprint arXiv : 1605. 06409
-
[confproc] J. Donahue / 2015 / Long-term recurrent convolutional networks for visual recognition and description / Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition : 2625 ~ 2634
-
[confproc] R. Hu / 2016 / Natural language object retrieval / Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition : 4555 ~ 4564
-
[jounal] J. R. Uijlings / 2013 / Selective search for object recognition / International Journal of Computer Vision 104 (2) : 154 ~ 171
-
[jounal] L. Liu / 2010 / Optimizing photo composition / Computer Graphics Forum 29 (2) : 469 ~ 478
-
[confproc] C. L. Zitnick / 2014 / Edge boxes: Locating object proposals from edges / European Conference on Computer Vision : 391 ~ 405
-
[confproc] S. Ren / 2015 / Faster r-cnn: Towards real-time object detection with region proposal networks / Advances in Neural Information Processing Systems : 91 ~ 99
-
[confproc] O. Vinyals / 2015 / Show and tell : A neural image caption generator / in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition : 3156 ~ 3164
-
[confproc] K. Xu / 2015 / Show, attend and tell: Neural image caption generation with visual attention / International Conference on Machine Learning : 2048 ~ 2057
-
[other] J. Mao / 2014 / Deep captioning with multimodal recurrent neural networks(m-rnn) / arXiv preprint arXiv : 1412. 6632
-
[other] S. Li / 2017 / Person search with natural language description / arXiv preprint arXiv : 1702. 05729
-
[confproc] M. -M. Cheng / 2014 / Bing : Binarized normed gradients for objectness estimation at 300fps / Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition : 3286 ~ 3293
-
[jounal] O. Russakovsky / 2015 / Imagenet large scale visual recognition challenge / International Journal of Computer Vision 115 (3) : 211 ~ 252
-
[other] K. Simonyan / 2014 / Very deep convolutional networks for large-scale image recognition / arXiv preprint arXiv : 1409. 1556
-
[confproc] T.-Y. Lin / 2014 / Microsoft coco: Common objects in context / European Conference on Computer Vision : 740 ~ 755
-
[confproc] S. Kazemzadeh / 2014 / Referitgame: Referring to objects in photographs of natural scenes / EMNLP : 787 ~ 798
-
[confproc] J. Xiao / 2012 / Recognizing scene viewpoint using panoramic place representation / Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition : 2695 ~ 2702

