07 november lecture of data platform

🔷 Note

small paper or submit own paper to the call for paper deadline 31/12/25

Why big data

Made data driven decision making based on analytics

Analytics effects

Big data

big data are data where data volume or IOPS is not manageable by a normal processing capabilities (typical databases solutions)

characteristics

Long tail problem

The data needs to be collected without discrimination

Bigger and smarter ?

Collection means that send data is easy, moving and migrated them is not

Hardware concerns

Scaling up a single machine to manage big data implies a lot of problems Scale up vs scale out scale out, scale out is much simpler but problems like

Distributed filesystem for scale out solutions

In order to scale out resources for big data a distributed filesystem istributed filesystem is required to share the storage resources , one of the solution is HDFS

main features of Hadhoop

Relational Distributed databases

pros cons
relational algebra for query optimization fixed schema that is difficult to update
difficult to scale out
impedence mismatch
consistency feature means more latency

platforms for big data

this is microsoft implementation

Lambda architecture

Lambda architecture relies on 2 different paths to collect data:

References