Tiling slideshow Jun-Cheng Chen, Wei-Ta Chu, Jin-Hau Kuo, Chung-Yi Weng, and Ja-Ling Wu http://www.cmlab.csie.ntu.edu.tw/~wtchu/TilingSlideshow/index.php Version 1.01 README file (2006/12/04) ** This is a very preliminary released version. We exploit several separate modules and didn't take care in code optimization. The processing time would be out of your expection. Program efficiency will be improved in the future. If you have any suggestion, please contact Wei-Ta Chu through wtchu@cmlab.csie.ntu.edu.tw. ** Introduction =========================== Tiling slideshow is a kind of new media that provides elaborate photo browsing experience. This package automatically performs (1) photo filtering & clustering; (2) music beat analysis; and (3) spatial & temporal composition. Technical details please refer to [1]. License =========================== The license for this package is available in the "LICENSE" file. For details, refer to the LICENSE file. Quick Start =========================== Step 1: Extract TilingSlideshow_v1.01.rar. Then you get two directories. The directory "TilingSlideshow" includes the main programs, and the directory "VirtualDub" is empty. Step 2: Download VirtualDub from http://www.virtualdub.org/ and extract it to the directory "VirtualDub". Step 3: Edit "photo_filelist.txt" in "TilingSlideshow" directory to indicate where your photos are. Step 4: In command line, run "TilingSlideshow.exe photo_filelist.txt your_wav_file parms.txt" in which "your_wav_file" should be a path indicating a wav file. Installation =========================== We take advantage of VirtualDub (http://www.virtualdub.org/) and Xvid (http://www.xvid.org/) to perform video encoding. Please install Xvid codec first. Download VirtualDub and put it in parallel to the extracted directory of "TilingSlideshow." In addition to video coding, we also exploit several packages to perform music beat analysis and face detection: - Music beat analysis: we exploit the algorithm proposed in [2]. The package downloaded from http://sound.media.mit.edu/~eds/beat/tapping.tar.gz has been recompiled by Cygwin and g++ for usage in Microsoft Windows environment. The executable program "tapping.exe" is located in the "beat_detection" directory. Note that the copyright belongs to the original author. - Face detection: we exploit Intel Open Computer Vision Library (OpenCV) to perform face detection. Some necessary descriptions are located in the "face_detection" directory, and some DLLs are copied from OpenCV and are located in the root directory. Note that the copyright belongs to the original authors. System Requirements =========================== Again, this is a very preliminary released version, and we didn't do much in code optimization. We suggest you should have at least 512MB RAM and 2.8GHz+ CPU. In our environment (2.8GHz CPU and 1G RAM), we need to process about 20 minutes for 200 photos and a 4.5-min music. The bottleneck of this process is face detection and video encoding. Usage =========================== Usage: TilingSlideshow.exe photo_filelist wav_file parm_config (1) photo_filelist: path of the file that stores the path of photos. The default directory pathes are stored in "photo_filelist.txt". Note that multiple directories can be assigned. Recursive traverse for sub-directories is also supported after version 1.01. (2) wav_file: path of the wav file. (3) parm_config: path of the parameter file. The default parameters are stored in parms.txt. Input =========================== (1) Photos: - EXIF metadata: orientation and time information are necessary for correct processing in orientation correction and time-based clustering. - Number of photos: this program is suitable for browsing large amouts of photos. We suggest you prepare at least 200 photos to generate the final result. (2) Music: - Format: current version only affords mono-channel wav files. Please store your music file as .wav in advance. Parameter Settings =========================== Parameters are stored in parms.txt. They include: (1) QualityFiltering = 0 or 1. - Indicate whether to perform blur and over/underexposure detection and filter out ill-quality photos. The default value is 1. (2) ClusterSep = 0 or 1 or 2. - Three profiles can be selected. Larger value indicates more finer clustering is prferred. This parameter influences the average photos displayed at the same frame. The default value is 1. (3) AudioInterval = 0 or 1 or 2. - Three profiles can be selected. They indicate three different search ranges in seeking the timing for frame switching. The default value is 0, which indicate r1=4 and r2=6. The parameter influences the rate of frame switching. Larger value indicates lower frame switching rate. Output =========================== The default output is "slideshow.avi", which will be located in the root directory. Xvid codec is used in current version. More selections may be provided in the future. References =========================== [1] J.-C. Chen, W.-T. Chu, J.-H. Kuo, C.-Y. Weng, and J.-L. Wu, "Tiling Slideshow," Proceedings of ACM Multimedia Conference, pp. 25-34, 2006. [2] E.D. Scheirer, "Tempo and beat analysis of acoustic musical signals." Journal of Acoustical Society of America, vol. 103, no. 1, pp. 588-601, 1998.