Optimizing big data processing through lazy computations: a systematic review of techniques and applications

Authors

  • M.V. Talakh Yuriy Fedkovich Chernivtsi National University
  • Yu.O. Ushenko Yuriy Fedkovich Chernivtsi National University
  • E.V. Vatamanitsa Yuriy Fedkovich Chernivtsi National University
  • Yu.O. Halin Yuriy Fedkovich Chernivtsi National University

DOI:

https://doi.org/10.31649/1681-7893-2024-48-2-24-33

Keywords:

lazy operations, Big Data, data processing, optimization, data streams, computing strategies, programming languages.

Abstract

The article examines the concept of lazy operations and its application for efficient processing of large volumes of data. The main principles of lazy computations, their implementation in various programming languages, and strategies for effective use in Big Data processing are analyzed. The advantages and limitations of the lazy approach are investigated, particularly regarding memory savings, performance improvement, and the ability to work with infinite data streams. A concept for selecting computation strategies based on data size and computational complexity is proposed.

Author Biographies

M.V. Talakh, Yuriy Fedkovich Chernivtsi National University

Ph.D., assistant professor of Computer Science Department

Yu.O. Ushenko, Yuriy Fedkovich Chernivtsi National University

D.Sc.,Professor of Computer Science Department

E.V. Vatamanitsa, Yuriy Fedkovich Chernivtsi National University

assistant  professor  of  Computer  Science  Department

Yu.O. Halin, Yuriy Fedkovich Chernivtsi National University

postgraduate student of Computer Science Department

References

Zaharia M. Apache spark: a unified engine for big data processing / M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave et al., 14 authors in total // Communications of the ACM. –2016. –Vol. 59, No. 11. –P. 56-65.

Dean J. MapReduce: simplified data processing on large clusters / J. Dean & S. Ghemawat // Communications of the ACM. –2008. –Vol. 51, No. 1. – P. 107-113.

Carbone P. Apache flink: Stream and batch processing in a single engine / P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi & K. Tzoumas // Bulletin of the IEEE Computer Society Technical Committee on Data Engineering. –2015. – Vol. 36, No. 4.

Richter S. Preferred Operators and Deferred Evaluation in Satisficing Planning / S. Richter & M. Helmert // ICAPS. –2009. – Vol. 19, No. 1. – P. 13345.

Exploring Lazy Evaluation and Compile-Time Simplifications for Efficient Geometric Algebra Computations // Systems, Patterns and Data Engineering with Geometric Calculi. –2021. – P. 111-131.

Chen X. A Comparison of Greedy Algorithm and Dynamic Programming Algorithm / X. Chen // SHS Web of Conferences. –2022. – Vol. 144. – P. 03009.

Gbedawo V. W. An Overview of Computer Memory Systems and Emerging Trends / V. W. Gbedawo, G. O. Agyeman, C. K. Ankah, M. I. Daabo // American Journal of Electrical and Computer Engineering. – 2023. – Vol. 7, No. 2. – P. 19-26.

Ren S. A comprehensive review of big data analytics throughout product lifecycle to support sustainable smart manufacturing: A framework, challenges and future research directions / S. Ren, Y. Zhang, Y. Liu, T. Sakao, D. Huisingh, C. M. V. B. Almeida // Journal of Cleaner Production. –2018. –Vol. 210. – P. 1343-1365.

Lim C. L. Lazy and eager approaches for the set cover problem / C. L. Lim, A. Moffat, A. Wirth // 37th ACSC. – 2014. – P. 19-27.

Wang Y. Review on greedy algorithm / Y. Wang // Theoretical and Natural Science. – 2024. – Vol. 14, No. 1. – P. 233-239.

Badanidiyuru A. Streaming submodular maximization: massive data summarization on the fly / A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, A. Krause // 20th ACM SIGKDD. – 2014. – P. 671-680.

Akidau T. The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing / T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. J. et al., 11 authors in total // Proceedings of the VLDB Endowment. –2015. –Vol. 8, No. 12. –P. 1792-1803.

Zaharia M. Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters / M. Zaharia, T. Das, H. Li, S. Shenker, I. Stoica // HotCloud. – 2012. – Vol. 12. –P. 10-10.

Agarwal S. Knowing when you're wrong: building fast and reliable approximate query processing systems / S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, et al., 8 authors in total // Proceedings of the 2014 ACM SIGMOD international conference on Management of data. –2014. –P. 481-492.

Akidau T. MillWheel: fault-tolerant stream processing at internet scale / T. Akidau, A. Balikov, K. Bekiroğlu, S. Chernyak, et al., 10 authors in total // Proc. VLDB Endowment. –2013. –Vol. 6, No. 11. –P. 1033-1044.

Olston C. Pig latin: a not-so-foreign language for data processing / C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins // Proceedings of the 2008 ACM SIGMOD international conference on Management of data. –2008. –P. 1099-1110.

Armbrust M. Spark sql: Relational data processing in spark / M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, ... M. Zaharia // Proceedings of the 2015 ACM SIGMOD international conference on management of data. – 2015. – P. 1383-1394.

McSherry F. Scalability! But at What COST? / F. McSherry, M. Isard, D. G. Murray // 15th Workshop on Hot Topics in Operating Systems (HotOS XV). –2015. –Kartause Ittingen, Switzerland.

Cormode G. Synopses for massive data: Samples, histograms, wavelets, sketches / G. Cormode, M. Garofalakis, P. J. Haas, C. Jermaine // Foundations and Trends in Databases. –2012. –Vol. 4, No. 1-3. –P. 1-294.

Carbone P. Apache Flink: Stream and batch processing in a single engine / P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, K. Tzoumas // Bulletin of the IEEE Computer Society Technical Committee on Data Engineering. –2015. –Vol. 36, No. 4.

Chen J. Deep Learning With Edge Computing: A Review / J. Chen, X. Ran // Proceedings of the IEEE. –2019. –Vol. 107, No. 8. –P. 1655-1674.

Cai Z. Simulation of database-valued Markov chains using SimSQL / Z. Cai, Z. Vagena, L. Perez, S. Arumugam, P. J. Haas, C. Jermaine // Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. –2013. –P. 637-648.

Zaharia M. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing / M. Zaharia, M. Chowdhury, T. Das, et al., 9 authors in total // 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). –2012. –P. 15-28.

Cheng X. Optimizing Spark RDD Operations with Lazy Evaluation / X. Cheng, X. Yan // 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). –2020. –P. 1457-1462.

Isard M. Dryad: distributed data-parallel programs from sequential building blocks / M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly // ACM SIGOPS operating systems review. –2007. –Vol. 41, No. 3. –P. 59-72.

Toshniwal A. Storm@ twitter / A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, ... N. Merchant // Proceedings of the 2014 ACM SIGMOD international conference on Management of data. –2014. –P. 147-156.

Kreps J. Kafka: A distributed messaging system for log processing / J. Kreps, N. Narkhede, J. Rao // Proceedings of the NetDB. –2011. –Vol. 11. –P. 1-7.

Romanyuk O., Pavlov S. (2017). Fast ray casting of function-based surfaces, Przeglad elektroteczny, 5, p. 83-86.

Romanyuk Olexander, Pavlov Sergii, etc. (2020). A function-based approach to real-time visualization using graphics processing units, Proc. SPIE 11581, Photonics Applications in Astronomy, Communications, Industry, and High Energy Physics Experiments 2020, 115810E https://doi.org/10.1117/12.2580212

Timchenko Leonid, Kokriatskaia Natalia, etc. (2020). Q-processors for real-time image processing, Proc. SPIE 11581, Photonics Applications in Astronomy, Communications, Industry, and High Energy Physics Experiments 2020, 115810F, https://doi.org/10.1117/12.2580230

Downloads

Abstract views: 6

Published

2024-11-19

How to Cite

[1]
M. . Talakh, Y. . Ushenko, E. . Vatamanitsa, and Y. Halin, “Optimizing big data processing through lazy computations: a systematic review of techniques and applications”, Опт-ел. інф-енерг. техн., vol. 48, no. 2, pp. 24–33, Nov. 2024.

Issue

Section

OptoElectronic/Digital Methods and Systems for Image/Signal Processing

Metrics

Downloads

Download data is not yet available.

Most read articles by the same author(s)

1 2 3 > >>