1、Big Data + Big Data Technologies: A Database Perspec7ve Ling Liu Professor Distributed Data Intensive Systems Lab School of Computer Science Outline •Big Data: a DB perspec?ve –What do we mean by “Big data”? –Why and How useful Big data is •Present and Future Big Data Technologies
2、: a DB perspec?ve –Fundamentals –Opportuni?es 22What do we mean by “Big Data”: a DB perspec?ve •Big data refers to datasets that exceeds the processing capacity of conven7onal hardware and/or soEware tools and systems. •Subjec7ve, evolving/moving defini7on of big data –As technol
3、ogy advances over ?me, the size of datasets that qualify as big data will also increase. –The defini?on can vary by sector, depending on what kinds of soQware tools are commonly available and what sizes of datasets are common in a par?cular industry or science domain. •Common Percep?
4、on: The data is too big, moves too fast, doesn't fit the strict structures of conven7onal database architectures. To gain value from this data, you must choose an alterna?ve way to process it. 33Big Data Driver from a DB Perspec7ve:BeLer storage technology •Storage & disks –Cheaper –
5、More volume Largedatasetsare–Physically smaller affordable–More efficient 44Big Data from a DB Perspec7ve Cloud Compu7ng: Pay per use •Pay‐as‐you‐go •Elas7city •Mul7‐tenancy •Economics of Scale Moreafforabletoperformbigdataanalytics55Big Data from a DB Perspec7veBeLer networking, Ub
6、iqui7ous Devices •High speed Internet •Cellular phones •Wireless LAN •GPS, Loca?on sensing Moredataconsumers•Laptops, Smart phones, Wireless devices Moredataproducers66Big Data Architecture oSeamlesslyfaulttoleranceDistributedFileSystemsoNoSQLDB7NoSQL Databases Popular name for a s
7、ubset of structured storage soQware •Schema‐free/semi‐structured data •Massive data stores •Scaling is easy •High availability and High fault tolerance ACID compliance –(Atomicity, Consistency, Isola?on, Durabiity) Tolerant of scale by way of horizontal distribu?on •Major Categor