Table of Contents
Seahorse is a visual framework letting users create Apache Spark applications in a intuitive and interactive way. All while connected to any Spark Cluster (YARN, Mesos, Standalone) or to a bundled local Spark.
For a more detailed overview go to the Overview section.
Seahorse for Mac and Windows is distributed in the form of a Vagrant image.
Vagrantfilefrom the get Seahorse page.
vagrant upfrom the command line.
For more details and troubleshooting go to the Seahorse Standalone Deployment mode page.
Seahorse for Linux is distributed in the form of docker images.
docker-compose.ymlfrom the get Seahorse page.
docker-compose.ymlfile and run
docker-compose upfrom the command line.
For more details and troubleshooting go to Seahorse Deployment page.
In the following steps we will read some data. Then we will apply a simple transformation to the data.
Start editing by clicking from the top menu. It will start up an Apache Spark backend for your workflow session.
Nodefrom the toolbox on the left to the canvas, or just right-click on the canvas.
Read DataFramefrom the operation selector.
Read Dataframe. You can set its parameters on the right-hand side panel. Select the
In the next step you will apply a simple transformation to your data.
Filter Columnsoperation from the operation selector.
selected columnparameter of
Filter Columnsto some set of columns of your choice.
Congratulations! You have successfully created your first Seahorse workflow.