Big Data EveryWhere! Lots of data is being collected and warehoused
Big Data EveryWhere!
Lots of data is being collected and warehoused Web data, e-commerce purchases at department/grocery stores Bank/Credit Card transactions Social Network
How much data? Google processes 20
How much data?
Google processes 20 PB a day (2008) Wayback Machine has 3 PB + 100 TB/month (3/2009) Facebook has 2.5 PB of user data + 15 TB/day (4/2009) eBay has 6.5 PB of user data + 50 TB/day (5/2009) CERN’s Large Hydron Collider (LHC) generates 15 PB a year
The Earthscope The Earthscope is the world's largest science project
The Earthscope is the world's largest science project. Designed to track North America's geological evolution, this observatory records data over 3.8 million square miles, amassing 67 terabytes of data. It analyzes seismic slips in the San Andreas fault, sure, but also the plume of magma underneath Yellowstone and much, much more. (http://www.msnbc.msn.com/id/44363598/ns/technology_and_science-future_of_technology/#.TmetOdQ--uI)
Type of Data Relational Data (Tables/Transaction/Legacy
Type of Data
Relational Data (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social Network, Semantic Web (RDF), …
Streaming Data You can only scan the data once
What to do with these data? Aggregation and
What to do with these data?
Aggregation and Statistics Data warehouse and OLAP Indexing, Searching, and Querying Keyword based search Pattern matching (XML/RDF) Knowledge discovery Data Mining Statistical Modeling
Random Sample and Statistics Population: is used to refer to the set or universe of all entities under study
Random Sample and Statistics
Population: is used to refer to the set or universe of all entities under study. However, looking at the entire population may not be feasible, or may be too expensive. Instead, we draw a random sample from the population, and compute appropriate statistics from the sample, that give estimates of the corresponding population parameters of interest.