Getting Started

Table of Contents

Quick Introduction

Seahorse is a visual framework letting users create Apache Spark applications in a intuitive and interactive way. All while connected to any Spark Cluster (YARN, Mesos, Standalone) or to a bundled local Spark.

For a more detailed overview go to the Overview section.

Run Seahorse on Your Machine

Mac or Windows

Seahorse for Mac and Windows is distributed in the form of a Vagrant image.

  1. Install Vagrant (required). You can find the Vagrant installation guide at vagrantup.com.
  2. Download Vagrantfile from the get Seahorse page.
  3. Go to the catalog with Vagrantfile and run vagrant up from the command line.
  4. Go to http://localhost:33321 and start using Seahorse!

For more details and troubleshooting go to the Seahorse Standalone Deployment mode page.

Linux

Seahorse for Linux is distributed in the form of docker images.

  1. Install Docker (required) and docker-compose (required). You can find the Docker installation guide at docs.docker.com/engine and the docker-compose installation guide at docs.docker.com/compose.
  2. Download docker-compose.yml from the get Seahorse page.
  3. Go to the catalog with docker-compose.yml file and run docker-compose up from the command line.
  4. Go to http://localhost:33321 and start using Seahorse!

For more details and troubleshooting go to Seahorse Deployment page.

Use Seahorse

In the following steps we will read some data. Then we will apply a simple transformation to the data.

Create New Workflow and Read Your Data

The Seahorse home screen is a list of all workflows - initially filled with examples.
Workflow Editor
DataFrame Report opened after clicking report icon

Transform Your Data

In the next step you will apply a simple transformation to your data.

DataFrame Report with filtered columns

Congratulations! You have successfully created your first Seahorse workflow.

Learn More!