Newest 'ray' Questions

0 votes

0 answers

20 views

SUMO RL : module’ object is not callable

please help me, when i run the following code i get error “module’ object is not callable” if __name__ == "__main__": # Use: # ray[rllib]==2.44.1 # python==3.12.7 # torch==2....

Do Giang

1

asked Apr 25 at 2:18

0 votes

2 answers

47 views

Ray rllib episode_reward_mean not showing

Can anyone explain to me why the episode_reward_mean is NOT part of the results dictionary? Is it replaced by a different key in the latest API? I see env_runners/episode_return_mean and env_runners/...

aaden

13

asked Apr 8 at 7:03

0 votes

0 answers

72 views

Why does Ray attempt to install ray wheels from ray-wheels.s3-us-west-2.amazonaws.com?

When submitting a Ray job using a conda runtime env (runtime_env = {"conda": "environment.yml"}), Ray attempts to install the ray wheel from ray-wheels.s3-us-west-2.amazonaws.com ...

Martin Studer

2,331

asked Apr 7 at 7:03

1 vote

0 answers

56 views

Streaming write using ray's write_parquet for vllm inference

I need to do inference using vllm for large dataset, code structure as below: ds = ray.data.read_parquet(my_input_path) ds = input_data.map_batches( VLLMPredictor, concurrency=ray_concurrency, ...

cnmdestroyer

21

asked Mar 12 at 21:09

2 votes

2 answers

67 views

Raising Error in Function task-parallelized with Ray

Starting to try to use Ray to parallelize a number of task-parallel jobs. I.e. each task takes in an object from a data frame, and then returns a list. Within the function, there is a check for a ...

Ludger

33

asked Mar 6 at 19:46

1 vote

0 answers

25 views

How can I load a ray Rllib algorithm on a machine with fewer cores than the algorithm used for training?

I trained a ray algorithm with 20 cpu cores with algo.train() and saved with algo.save() and now when i try to load it on a machine with 6 cores i just get this warning message "(autoscaler +29s) ...

Ada Hatland

11

asked Feb 26 at 12:36

0 votes

1 answer

232 views

How to set Ray head node in high availability mode using KubeRay?

I am trying to set up high availability (HA) for Ray head node. Currently, if Ray head node is down, the Ray job running in this Ray cluster will fail and disappear. To clarify, I am not using Ray ...

Hongbo Miao

50.3k

asked Feb 26 at 5:08

1 vote

1 answer

199 views

How to cancel a Ray job submitted to a Ray cluster?

I have a long-run Ray job. main.py import time import ray @ray.remote def square(n: int) -> int: time.sleep(50000000) return n * n @ray.remote def sum_list(numbers: list[int]) -> int: ...

Hongbo Miao

50.3k

asked Feb 25 at 2:27

0 votes

1 answer

33 views

ray.init() called, but ray client is already connected error in a cloud run

I have this code running in a Cloud Run container in GCP import os import sys import logging import json import time import ray import google.protobuf.json_format from flask import Flask, request, ...

Juan Lozano

667

asked Feb 23 at 19:58

-1 votes

1 answer

65 views

Serialization error using ray-tuner for hyperparameter tuning [closed]

I am trying to tune some hyperparameters for my neural network for an image segmentational problem. I set up the tuner as simple as it can be, but when I run my code i get the following error: 2025-02-...

Adam Bencsik

21

asked Feb 21 at 15:18

0 votes

0 answers

57 views

Why ray.train.get_checkpoint() from Ray Tune is returning None even after saving the checkpoint?

I am trying to tune my model with ray tune for pytorch. I would really like to be able to save the tuning progress, stop the execution and resume the execution from where I left. Unfortunately, I am ...

Gheorghe Balamatiuc

1

asked Feb 20 at 23:32

0 votes

1 answer

54 views

How to check manually created depth buffer for higher point between two fragments

I am really new to shader programming and trying (maybe erroneously) to make a lighting system for a little game I am making. The engine is 2D, so I am trying to pull off a slightly weird trick for ...

Logan Davis

1

asked Feb 15 at 8:02

1 vote

0 answers

34 views

Ray ObjectRef automatically collected and deserialised?

I have Ray actors who interact with each other, one generates a numpy array, When I collect the object reference and send it to another actor, it seems to deserialize automatically. See the following ...

Vaas

111

asked Jan 19 at 15:27

1 vote

1 answer

122 views

Ray Serve WebSocket Deployment: "ASGI callable returned without sending handshake" error for more than 5 connections

I'm developing a FastAPI WebSocket server deployed using Ray Serve for handling multiple WebSocket connections. The application is designed to allow real-time communication with up to 50k concurrent ...

Musab

11

asked Jan 3 at 10:07

2 votes

0 answers

68 views

Parallelizing highly dynamic and unbalanced loads

I have a computation with a following structure (pseudocode): intermediate_results = [] for source in sources: # (1) source_data = prepare( load( source ) ) # (2) for sample in ...

meditative potato

21

asked Dec 30, 2024 at 3:50

0 votes

0 answers

48 views

RLLib - testing my trained agent gives bad results

I'm using Ray (version 1.8.0) to train an agent. The agent controls a unit in a simulation, and the simulation can end in one of three different ways: "UnitADestroyed", "UnitBDestroyed&...

Henrik Berg

549

asked Dec 20, 2024 at 9:37

0 votes

0 answers

71 views

AttributeError: 'numpy.ndarray' object has no attribute 'categories'

Modin DataFrame Merge Issue After dropna on Categorical Column: I'm encountering an issue when using Modin to merge DataFrames that contain categorical columns. The issue arose after I performed a ...

Sumukha G C

13

asked Dec 13, 2024 at 10:37

0 votes

1 answer

43 views

the action space of a reinforcement model is 1 dimensional, but when test stage the model output action with 2 dimensional

I trained a PPO model with action space self.action_space = gym.spaces.Box(-1, 1, (1,), data_type)" with rllib But when i use the trained model to manually call forward_inference, the inference ...

Altman Jeffry

143

asked Dec 11, 2024 at 9:44

1 vote

1 answer

40 views

Correct way of using foreach_worker and foreach_env

I am quite new to Reinforcement Learning and can’t understand it. I am unable to update configurations for the batch data using PPO. I am using my custom-defined GYM environment, and want to train it ...

Abid Meraj

11

asked Dec 10, 2024 at 11:42

0 votes

1 answer

120 views

How to Parallelize a Flask App with Gunicorn and Distribute GPU Usage Among Workers?

I am building a Flask app to handle facial embeddings using DeepFace. My goal is to serve approximately 50 clients, with an estimated 10 requests per minute. Each request involves running deepface....

gpu-try-deepface

1

asked Dec 3, 2024 at 9:39

0 votes

0 answers

28 views

Getting Serialisation Error on Initial Call to Class Function Decorated with Ray.remote

I'm using Ray with ray.remote to define an InferenceActor class, which includes a method run_inference which contains one parameter (A list of strings) for handling model inference tasks. However, ...

Matthew Dickson

21

asked Nov 19, 2024 at 11:02

0 votes

1 answer

123 views

Pytorch + Ray Tune Reporting ImplicitFunc Is Too Large, No Idea Which Reference Is Large

Similar to this question, Ray Tune is reporting to me: ValueError: The actor ImplicitFunc is too large (421 MiB > FUNCTION_SIZE_ERROR_THRESHOLD=95 MiB). Check that its definition is not implicitly ...

Falcondance

102

asked Nov 12, 2024 at 16:21

0 votes

1 answer

115 views

How can I ensure that my Python logic runs exclusively on the Apache Ray Worker Nodes?

I am using Apache Ray to create a customized cluster for running my logic. However, when I submit my tasks with ray.remote, they are executing on the driver node rather than on the worker nodes I ...

question.it

2,988

asked Nov 11, 2024 at 5:14

0 votes

1 answer

147 views

How to configure Ray cluster to utilize the Full Capacity of Databricks Cluster

I have a Databricks cluster configured with a minimum of 1 worker and a maximum of 4 workers, with auto-scaling enabled. What should my Ray configuration (setup_ray_cluster) be to fully utilize the ...

question.it

2,988

asked Nov 8, 2024 at 4:40

0 votes

2 answers

198 views

Ridiculous VMEM usage when using Ray on a cluster

Initial Problem: I am testing out a multiprocessing Python package called Ray to parallelise my code. Original code works fine on my laptop, core-i7-13800H, 32GB RAM. When running on a local cluster ...

Bawb

11

asked Nov 4, 2024 at 11:51

0 votes

0 answers

74 views

Understanding the ray.get() method

I am fairly new to Ray and I am struggling to understand what the ray.get() function actually does. I found a small example online here that can help. @ray.remote class Prime: # Constructor ...

Kit Searle

1

asked Oct 24, 2024 at 16:49

1 vote

0 answers

81 views

inter core interconnect Checking in simple Slices of TPU

ICI (inter core interconnects) offers a very fast connectivity with TPUs (that is connected with different hosts) and thus also increase its total available memory for TPU calculations (I guess!). ...

Krishna Mohan

11

asked Oct 14, 2024 at 11:04

0 votes

0 answers

79 views

helm install raycluster kuberay/ray-cluster --version 1.1.1 stuck at pulling image

I follow the documentation to deploy a raycluster on kubernete, I already setup a Kubernete cluster, now I am deploy a Ray cluster on top of it. https://docs.ray.io/en/latest/cluster/kubernetes/...

Liang

155

asked Oct 2, 2024 at 18:14

0 votes

0 answers

41 views

Android sceneView 3D model HitResult

This is my first time trying 3D, I'm using SceneView <io.github.sceneview.SceneView android:id="@+id/sceneView" android:layout_width="match_parent" ...

WeiChen Chen

1

asked Sep 25, 2024 at 1:51

0 votes

0 answers

25 views

Policies' directory not present in saved checkpoint

I'm using RayRL Lib, and after switching to the new API version, the checkpoint directory no longer includes the policies folder. Why this might be happening? Currently, the checkpoints contain the ...

Khashayar Ghamati

368

asked Sep 16, 2024 at 15:18

0 votes

1 answer

403 views

What is `_serve_asgi_lifespan` in Ray Serve?

I'm trying to use Ray + vLLM and face AttributeError ('VLLMDeployment' object has no attribute '_serve_asgi_lifespan'). I would like to know how to solve this issue. Steps Download the Ray Docker ...

dmjy

1,841

asked Sep 16, 2024 at 9:12

0 votes

2 answers

198 views

How can I specify the port number of health check of Ray?

I have two windows servers (192.168.1.11 and 192.168.1.12) and try to run a Ray Docker container (image tag = 2.35.0-py312-gpu) on each server. Steps I run these two commands to start the Ray process....

dmjy

1,841

asked Sep 13, 2024 at 11:56

0 votes

0 answers

107 views

K8s Readiness probe failed: success for ray-worker, docs maybe unclear

I’m working on setting up a Ray cluster with a head node and worker pods. While the head node deploys successfully and functions as expected, the worker node fails with the error: “Readiness probe ...

zacko

397

asked Sep 12, 2024 at 12:31

1 vote

2 answers

642 views

How to make ray task async

I want to run a function (Ray Task) that may trigger another request afterward. For example, if I have 10 tasks but only 1 CPU, the system will process one task at a time since each task requires 1 ...

zacko

397

asked Sep 9, 2024 at 14:12

0 votes

0 answers

47 views

Using Ray and RayCast to look around in first person

I am currently working on a small project where you controll the character in first person mode, allthough this question is also relevant for third person. What I want to do is looking around by just ...

Oliver Hostettler

1

asked Sep 5, 2024 at 15:03

1 vote

0 answers

38 views

Discrepancies in Output with Ray Parallelization of ODE Propagation

I am using Ray to parallelize my ODE propagation code, which employs solve_ivp with the lsoda solver. For performance, the ODE code also uses Numba JIT. While the code runs correctly in a single-...

Peng

11

asked Aug 23, 2024 at 18:20

0 votes

0 answers

33 views

Slurm step failure capture via trap

I am trying to setup my ray cluster with a sbatch script. I am starting head & worker nodes as steps in my script. The worker nodes are expected to keep running till the job is alive. mysbatch....

DOOM

1,244

asked Aug 5, 2024 at 19:35

0 votes

0 answers

193 views

How to do distributed batch inference using tensor parallelism with Ray?

I want to perform offline batch inference with a model that is too large to fit into one GPU. I want to use tensor parallelism for this. Previously I have used vLLM for batch inference. However, now I ...

ganto

222

asked Aug 5, 2024 at 11:18

0 votes

0 answers

56 views

Ray + lightning prepare_data_loader MisconfigurationException

I am trying to start a training session with Ray on GPU but experiencing errors while on CPU everything works smoothly. The issues are raising from the data modules: I have the following class which ...

magzmag

1

asked Aug 5, 2024 at 4:16

0 votes

0 answers

92 views

utilize computer resources on RLLIB config.build().train() effectively

I am trying to learn what are the best practices for determining parameters that utilize computer resources effectively in RlLib. On my laptop, I have 16 cpu cores and 1 gpu. I tried running the ...

Jordan

45

asked Aug 1, 2024 at 3:14

0 votes

0 answers

93 views

Extending a Ray Actor or make a subclass be a Ray Actor

Look at the code below: class A: def __init__(self, n): self.n = n class B(A): def __init__(self): super(B, self).__init__(n=10) Adding @ray.remote at the beginning of any ...

Rick Dou

334

asked Jul 30, 2024 at 9:32

0 votes

0 answers

125 views

Pyarrow error with ray & lightning on databricks

I am trying to train a neural net with pytorch lightning on ray on a databricks cluster. As a start, I copied the example from https://docs.ray.io/en/latest/train/getting-started-pytorch-lightning....

DataDiver

1

asked Jul 25, 2024 at 8:08

0 votes

0 answers

47 views

Memory Issue with Ray in Cluster Environment

I'm new to using Ray, and I've set up a workflow to read and process several .csv files using Pandas. Here's a snippet of my setup: with on_ray( num_cpus=6, object_store_memory=10 * 1024 * ...

Vgamero

1

asked Jul 25, 2024 at 3:31

0 votes

0 answers

32 views

When to apply backface culling, depending on the ray and material type?

I am currently implementing a ray tracer, which supports reflection and refraction. I have the following types of rays: camera rays shadow rays reflection rays refraction rays I have the following ...

Kotaka Danski

598

asked Jul 19, 2024 at 16:19

0 votes

0 answers

234 views

How to Forward gRPC Requests in a Proxy Server for Ray?

I am implementing a reverse proxy for Ray. The reverse proxy works well for HTTP requests, but some communications of Ray use gRPC. For example, the reverse proxy is getting requests like http://...

Emmanuel Murairi

401

asked Jul 10, 2024 at 4:45

0 votes

0 answers

83 views

Error: Missing argument 'CLUSTER_CONFIG_FILE'. Ray GCP

I have created a GCP Ray cluster from the ray cluster dashboard within GCP, I have also created a Ray cluster locally via docker compose. Is there an easy way to generate the ray cluster config? For ...

jm-nab

114

asked Jul 8, 2024 at 15:02

0 votes

0 answers

296 views

How to implement ray server with multiple gpus?

I'm trying to implement a multi-gpu local server with ray and vllm. I have uploaded my full code and commands to this github repository. In short, I want to serve a big model that requires 2 gpus, but ...

Boyuan Chen

43

asked Jul 3, 2024 at 13:19

0 votes

0 answers

234 views

Overriding Ray dashboard url returned by ray.init()

I'm running a JupyterHub installation on Kubernetes (EKS) and an elastic Ray cluster automatically starts for each user when their notebook starts, and automatically stops when the notebook closes. ...

Igor

337

asked Jun 18, 2024 at 13:39

0 votes

0 answers

268 views

Ray logging is not working for logging.info calls in main or worker process

I am trying to set up a logger on Windows that can output messages to both a log file as well as stdout with different log levels. I am using ray to run a couple of remote worker processes so I would ...

altwood

141

asked Jun 12, 2024 at 10:50

0 votes

1 answer

94 views

Custom MLPPolicy issues in Ray RLLIB

I'm trying to create a custom MLP-based policy in Ray Rllib using this code below: python: 3.10 Rayrlib version: 2.23 class CustomMLPModel(TorchModelV2, nn.Module): def __init__(self, obs_space, ...

David

33

asked Jun 12, 2024 at 5:07

Collectives™ on Stack Overflow

Related Tags