A Scalable Media Job Framework for an Open Source Search Engine

This paper explores efficient ways to implement various
media-updating features like news aggregation, video conversion,
and bulk email handling. All of these jobs share the property
that they are periodic in nature, and they all benefit from being
handled in a distributed fashion. The data for these jobs also often
comes from a social or collaborative source. We isolate the class of
periodic, one round map reduce jobs as a useful setting to describe
and handle media updating tasks. As such tasks are simpler than
general map reduce jobs, programming them in a general map
reduce platform could easily become tedious. This paper presents
a MediaUpdater module of the Yioop Open Source Search Engine
Web Portal designed to handle such jobs via an extension of a
PHP class. We describe how to implement various media-updating
tasks in our system as well as experiments carried out using these
implementations on an Amazon Web Services cluster.




References:
[1] S.Baluja, R. Seth, D. Sivakumar, Y. Jing, J.Yagnik, S. Kumar, D.
Ravichandran, and M. Aly. Video Suggestion and Discovery for YouTube:
Taking Random Walks Through the View Graph. Proceeding of WWW
2008.
[2] Bash Reduce GitHub Page. Retrieved on Sep. 11, 2015 from
https://github.com/erikfrey/bashreduce.
[3] Krishna Bharat. And now, News. The Official Google Blog. Jan. 23,
2006.
[4] FFmpeg. Retrieved Dec 4., 2015 from
http://ffmpeg.org/.
[5] W.Lam, L.Liu, S.Prasad, A.Rajaraman, Z.Vacheri, and A.Doan. Muppet:
Mapreduce-style processing of fast data. Proceedings of the VLDB
Endowment (PVLDB), 5:18141825, 2012.
[6] Leonardo Neumeyer, Bruce Robbins, Anish Nair, and Anand Kesari. S4:
Distributed Stream Computing Platform. In Data Mining Workshops,
International Conference. IEEE Computer Society. pp 170–177. 2010.
[7] P. O’Connell. New Economy; Yahoo charts the spread of the news by
e-mail, and what it finds out is itself becoming news. New York Times.
Jan. 29, 2001. http://www.nytimes.com/2001/01/29/business/
new-economy-yahoo-charts-spread-e-mail-what-it-findsitself-
becoming.html
[8] Oozie 4.2.0 Documentation. Retrieved on Sep. 11, 2015, from,
http://oozie.apache.org/docs/4.2.0.
[9] Yioop Documentation from Seekquarry. Retrieved on Sep. 11, 2015 from
http://www.seekquarry.com/p/Documentation.
[10] A. Silberstein, J. Terrace , B. F. Cooper , R. Ramakrishnan. Feeding
Frenzy: Selectively Materializing Users Event Feeds . In SIGMOD 2010.
[11] Yahoo! Headline. Nov. 28, 1996. Internet Archive.
https://web.archive.org/web/19961128074525/http://www8.yahoo.com/
headlines/