Securing Big Data KAIZE N AP P ROAC H , IN C . Big Data Defined Big data is where the data volume, acquisition velocity, or data representation limits the ability to perform effective analysis using traditional relational approaches or requires the use of significant horizontal scaling for efficient processing. (NIST 2012) Big Data value In the eye of the beholder
Value is defined through hypotheses and data modeling of the data sets Data which had been collected in the normal course of business can now be mined and correlated to find relationships and meaning Data sets vary from medical records, financial transactions, web cam photos, firewall logs, web logs, web url searches, physical security logs Big Data the 5 Vs Volume: processing petabytes of data with low overhead and complexity
Veracity: using data from a variety of domains Value: using commodity hardware Variety: leveraging flexible schemas to handle structured and unstructured data Velocity: performing real time analytics and ingesting streaming feeds as well batch processing Examples of Big Data users P R IVATE S E C T OR P UB L IC SE C T OR
Wal-Mart DoD Apple EBay CDC Verizon DoE
Bank of America GSA NYSE IRS Amazon Google Yahoo
NASA NOAA Big Data Security Issues Large aggregated data store is an attractive target for hackers and malicious insiders Big Data stored in a public or hybrid cloud environment has a larger attack surface, virtual environment has its own security issues Sensitive data is being ported from mature and secure relational databases into NoSQL data stores lacking
compatible security controls Big Data Security Concerns SOU RC E : C LOUD SE C U RIT Y AL LIAN C E B IG DATA W ORK IN G GROU P NoSQL and Big Data NoSQL databases are ideal for huge quantities of data, especially unstructured or non-relational data. Some NoSQL systems do allow SQL-like query language NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond
record storage , offering marked gains in scalability and performance Challenges include support issues, lack of trained personnel, lack of standardization, immaturity, lack of a database management system Examples : HBase (Hadoop), Cassandra, MongoDB, Riak, CouchDB Hadoop is most popular Hadoop is a Suite of Tools Distributed file system (HDFS) Distributed execution framework (MapReduce) Query language (Pig)
Distributed, column-oriented data store (HBase) Machine learning (Mahout) Hadoop Pros Process large data very efficiently Distributed storage and computation Very flexible horizontally scalable HDFS file system is optimized for high throughput Simple API and model Parallel processing
Inexpensive NoSQL database model (HBASE) Hadoop Security Cons Security is NOT built into Hadoop (or any NoSQL database) at all: was never built for enterprise security but for publically available data No native encryption services offered Data spread on multiple machines in a cluster, making securing/hardening individual machines challenging and backup / recovery difficult Hadoop tools lack basic security controls
Data veracity is a challenge given the possible multitude of data sources Securing Big Data: Products Several types of products available: 1. NoSQL / Hadoop products with enhanced security built on top offering integrated authentication (not just Kerberos!) and encryption options 2. API gateways/proxies controlling what applications can access/ which data queries can be made against a database cluster Hadoop/NoSQL Security Products
Cell-level access labels (Sqrrl/Accumulo) Kerberos authentication(Opensource, IBM, Cloudera, MapR) Access control lists for tables/column families (all Hadoop vendors) Data encryption (Sqrrl/Accumulo,Datameer,Gazzang,DataGuise,Vormetric ) Authentication integration with LDAP and PKI (Sqrrl/Accumulo, MapR,Datameer) Hadoop/NoSQL Security Products:
Accumulo Sorted, distributed key/value store using Hadoop as its file system Developed by NSA beginning in 2008, Accumulo is now an open source software projected hosted by the Apache Foundation and natively integrates with Hadoop. Accumulo has three differentiators from Hadoop and other NoSQL databases: Secure: Fine-grained security controls allow organizations to control data at the cell-level, integrating existing authentication functions in the enterprise (PKI, LDAP, AD) Scale: proven to operate and perform at massive scale with low administrative overhead
Adapt: provides real-time analysis Hadoop/NoSQL Products: Accumulo and Sqrrl Sqrrl is the commercial version of Accumulo, a startup of developers and engineers from NSA. Their version of Accumulo is Sqrrl Enterprise Sqrrl Enterprise is different from other Big Data tools because security is built into the platform, as a result, cell-level security controls do not result in any significant performance degradations. Data can be labeled or tagged by cell to provide fine grained access control. Sqrrl Enterprise integrates with enterprise Identity and Access Management (IAM) systems, such as Active Directory, LDAP, and PKI, biometrics.
Sqrrl provides encryption of data-at-rest and data-in-motion Big Data Security Products: API Gateways Appliance exposes published APIs, proxying between data on NoSQL or relational databases and applications Only approved/ published APIs permitted Tied into existing authentication sources Authorization and encryption available Malware/virus and DLP checking available Placed behind firewall Intels EAM, CAs Layer7 and Mulesoft
API Gateway Example: Intel EAM Securing Big Data: General Approaches Determine which data should be in a NoSQL database given immaturity of Big Data products/implementations Firewall off the big data clusters from rest of network Harden and secure machines (virtual and physical) where database cluster is distributed Limit who can access the databases with authentication
Understand the target of and power of consolidated data to attackers and malicious insiders Realize that compliance/regulatory issues are the same for NoSQL databases as for Relational databases: backup, auditing, monitoring, securing data is still required How Kaizen Can Help Our experienced professionals are steeped in security concepts, risk management, technology and principles of data processing We separate facts from fads and hype Were vendor neutral, not resellers
Our staff has extensive private and public sector experience with security: host/server, network and database/applications We keep up to date with current technology and events, applying best practices, experience and common sense to examine problems and come up with solutions How Kaizen Can Help The tools to secure Big Data are new or being developed, but the concepts behind securing the data are not.
Kaizens professionals can map the security requirements to the tools, and show what is lacking; We can test and research products, suggest procedures and practices to maintain and enhance the security of Big Data environments. Summary
Kaizen can help with big data problem analysis, test technical options and determine a solution, combining the technical and procedural This presentation surveys the problem space and possible combinations of security solutions: Secure NoSQL database implementations API gateways Encryption Leveraging existing firewall, authentication and authorization technology Appendix: Vendors
B IG DATA/H ADO OP B IG DATA/ SE C UR ITY Apache Sqrrl IBM Intel/Mashery
Chapter 4 The Interdependent Global Economy Introduction The Historic Atlantic Alliance ... Craftsman Proprietor Satisfied Small Business Steady Cruisers * * One Buyer A Few Buyers Many Buyers One Seller Monopsony Monopoly A Few Sellers Oligopsony Oligopoly Limited Quantity of...
Harlow's Monkey Experiment . ... Which of John Bowlby's four stages of attachment seems most important? Why? How are Erikson and Bowlby's theories similar? Which Theory do you prefer, John Bowlby's or Erik Erikson's and why? Discussion Questions. The Blind...
Beat poetry was the most anti-establishment form of literature in the United States, but beneath its shocking words lies a love of country. The poetry is a cry of pain and rage at what the poets see as the loss...
Enroll in a flexible spending account - a health care and/or dependent care account, under the FSAFEDS Program. Employees MUST re-enroll in FSAFEDS each year to participate. Enroll in, change, or cancel an existing enrollment in a dental and/or a...
Kinematics in 1-D. Learning Target. I can differentiate between position, distance, displacement, speed, and velocity. What is motion? The mathematical description of motion is called kinematics. The explanation of motion in terms of its causes is called .
Fixed-income arbitrage. Attempts to profit from price anomalies in related interest rate securities. Includes interest rate swap arbitrage, U.S. versus non-U.S. government bond arbitrage, yield-curve arbitrage, and mortgage-backed arbitrage.
0900 APT / Contractor Brief to the Survey Team. 0930 Facility Tour. 1000 Begin Inspection. 1500 Daily Survey Team Recap (Survey Team only) 1545 Daily Hot Wash (Summary)(Survey Team and APT)-This is the proposed visit schedule for the Survey.
SISTEM PENCERNAAN (SISTEM DIGESTORIUM) SISTEM PENCERNAAN MAKANAN (SISTEM DIGESTORIUM) Merupakan sistem tabung Terjadi adaptasi morfologis, akibat dari variasi makanan Fungsi : memecah makanan menjadi unit - unit kecil → mudah diabsorbsi, sisanya dikeluarkan SISTEM PENCERNAAN MAKANAN (SISTEM DIGESTORIUM) Melalui proses...
Ready to download the document? Go ahead and hit continue!