Spark Ports#
Spark Cluster (Data Engine)#
Component |
Description |
IP Address |
Port(s) |
Notes |
---|---|---|---|---|
Spark Master |
The main control node of the cluster. |
|
2424 |
Used for internal communication with workers and for drivers to connect to submit jobs. |
Spark History Server |
Detailed view of a specific Spark application. It’s like a dashboard for that individual application, showing you how it’s performing and helping you troubleshoot any issues. |
|
18080 |
|
Spark Master UI |
Web interface for monitoring the master. |
|
9292 |
|
Spark Workers UI |
Web interfaces for Spark worker nodes. |
|
9293 |
Each worker node has its own UI instance. |
See Spark UI proxying for an example proxy setup to access the Spark UIs.
Spark Driver#
These are ports that need to be open for the Spark Master/Workers to communicate to the driver.
Component |
Description |
IP Address |
Port(s) |
Notes |
---|---|---|---|---|
Spark Driver RPC |
Port used for communication between the master/workers and the driver. |
Driver IP |
Static (e.g., 5000) |
Configurable via |
Spark Block Manager |
Port used by executors and workers to transfer shuffle and cached data. |
Driver IP |
Static (e.g., 5001) |
Configurable via |
Static Ports Recommended: Using static ports is ideal for Docker environments to simplify port mapping and avoid conflicts.
Essential Ports: The master and workers need access to the Driver RPC Port and the Block Manager Port for control and data communication.
Docker Port Mapping: Map these ports in your docker-compose.yml or Docker run command to ensure proper networking.
Configuration Example:
spark = (
SparkSession.builder
.appName("SparkConnectionTest")
.config("spark.driver.port", "5000")
.config("spark.blockManager.port", "5001")
...
)