Spark
#
Find similar titles
- (rev. 14)
- Hyungyong Kim
Structured data
- Programming Language
- Python
- Java
- Scala
- URL
- http://en.wikipedia.org/wiki/Apache_Spark
Apache Spark is an Open source cluster computing framework originally developed in the AMPLab at UC Berkeley. In contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's in-memory primitives provide performance up to 100 times faster for certain applications.
특징
- 스토리지를 경유하는 Hadoop에 비해 인메모리 처리를 기본으로 하여 보다 빠름
- RDD(Resilient distributed datasets) 읽기 전용 데이터 구조 기반하며, 연산은 RDD에서 RDD로의 변환을 연쇄적으로 실시함으로써 이루어짐
Components
- Spark Core and Resilient Distributed Datasets
- Spark SQL
- Spark Streaming
- MLlib Machine Learning Library
- GraphX
관련자료
- Best Practices Writing Production-Grade PySpark Jobs - How to Structure Your PySpark Job Repository and Code
- Using Apache Spark to Analyze Large Neuroimaging Datasets : 뇌영상 자료 분석 - with scikit-learn and PySpark
- Spark Programming 엄태욱
- 10 things you need to know about Spark
- Simplifying Big Data Analysis with Apache Spark
- Spark은 왜 이렇게 유명해지고 있을까?
- Configuring IPython Notebook Support for PySpark : IPython Notebook setting for Spark
- Spark의 핵심은 무엇인가? RDD! 하용호
Incoming Links #
Related Articles (Article 0) #
Related Codes (Code 1) #
Suggested Pages #
- 0.174 SNV
- 0.110 Large language model
- 0.087 Python for Scientists
- 0.078 DVC
- 0.062 GenePattern
- 0.058 Machine learning
- 0.039 Genetics
- 0.025 Programming language
- 0.025
- 0.022 Hyuna Kim
- More suggestions...