3_Big_data_presentation_l1_v1
Оценка 5

3_Big_data_presentation_l1_v1

Оценка 5
pptx
09.05.2020
3_Big_data_presentation_l1_v1
3_Big_data_presentation_l1_v1.pptx

Introduction to Big Data & Basic

Introduction to Big Data & Basic

Introduction to Big Data & Basic Data Analysis

Big Data EveryWhere! Lots of data is being collected and warehoused

Big Data EveryWhere! Lots of data is being collected and warehoused

Big Data EveryWhere!

Lots of data is being collected and warehoused
Web data, e-commerce
purchases at department/ grocery stores
Bank/Credit Card transactions
Social Network

How much data? Google processes 20

How much data? Google processes 20

How much data?

Google processes 20 PB a day (2008)
Wayback Machine has 3 PB + 100 TB/month (3/2009)
Facebook has 2.5 PB of user data + 15 TB/day (4/2009)
eBay has 6.5 PB of user data + 50 TB/day (5/2009)
CERN’s Large Hydron Collider (LHC) generates 15 PB a year



640K ought to be enough for anybody.

Maximilien Brice, © CERN

Maximilien Brice, © CERN

Maximilien Brice, © CERN

The Earthscope The Earthscope is the world's largest science project

The Earthscope The Earthscope is the world's largest science project

The Earthscope

The Earthscope is the world's largest science project. Designed to track North America's geological evolution, this observatory records data over 3.8 million square miles, amassing 67 terabytes of data. It analyzes seismic slips in the San Andreas fault, sure, but also the plume of magma underneath Yellowstone and much, much more. (http://www.msnbc.msn.com/id/44363598/ns/technology_and_science-future_of_technology/#.TmetOdQ--uI)

1.

Type of Data Relational Data (Tables/Transaction/Legacy

Type of Data Relational Data (Tables/Transaction/Legacy

Type of Data

Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data
Social Network, Semantic Web (RDF), …

Streaming Data
You can only scan the data once


What to do with these data? Aggregation and

What to do with these data? Aggregation and

What to do with these data?

Aggregation and Statistics
Data warehouse and OLAP
Indexing, Searching, and Querying
Keyword based search
Pattern matching (XML/RDF)
Knowledge discovery
Data Mining
Statistical Modeling

Statistics 101

Statistics 101

Statistics 101

Random Sample and Statistics Population: is used to refer to the set or universe of all entities under study

Random Sample and Statistics Population: is used to refer to the set or universe of all entities under study

Random Sample and Statistics

Population: is used to refer to the set or universe of all entities under study.
However, looking at the entire population may not be feasible, or may be too expensive.
Instead, we draw a random sample from the population, and compute appropriate statistics from the sample, that give estimates of the corresponding population parameters of interest.

Скачать файл