Skip to content

Spark #
Find similar titles

Structured data

Programming Language
Python
Java
Scala
URL

Apache Spark is an Open source cluster computing framework originally developed in the AMPLab at UC Berkeley. In contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's in-memory primitives provide performance up to 100 times faster for certain applications.

특징

  • 스토리지를 경유하는 Hadoop에 비해 인메모리 처리를 기본으로 하여 보다 빠름
  • RDD(Resilient distributed datasets) 읽기 전용 데이터 구조 기반하며, 연산은 RDD에서 RDD로의 변환을 연쇄적으로 실시함으로써 이루어짐

Components

  1. Spark Core and Resilient Distributed Datasets
  2. Spark SQL
  3. Spark Streaming
  4. MLlib Machine Learning Library
  5. GraphX

관련자료

Incoming Links #

Related Articles #

Related Codes #

Suggested Pages #

web biohackers.net
0.0.1_20140628_0