article thumbnail

20 Best Open Source Big Data Projects to Contribute on GitHub

ProjectPro

Apache Spark is also quite versatile, and it can run on a standalone cluster mode or Hadoop YARN , EC2, Mesos, Kubernetes, etc. You can also access data through non-relational databases such as Apache Cassandra, Apache HBase, Apache Hive, and others like the Hadoop Distributed File System.