How to use sagemaker-pyspark in batch inference

0

am trying to execute the code below

ENDPOINT_NAME = "my-endpoint"
from sagemaker_pyspark import SageMakerModel
from sagemaker_pyspark import EndpointCreationPolicy
from sagemaker_pyspark.transformation.serializers import ProtobufRequestRowSerializer
from sagemaker_pyspark.transformation.deserializers import ProtobufResponseRowDeserializer
from pyspark.sql.types import StructType, StructField, MapType, StringType, IntegerType, ArrayType, FloatType

attachedModel = SageMakerModel.fromEndpoint(
    endpointName = ENDPOINT_NAME,
    requestRowSerializer=ProtobufRequestRowSerializer(
        featuresColumnName = "col1"
    ),
    responseRowDeserializer=ProtobufResponseRowDeserializer(schema=StructType([
        StructField('prediction', MapType(StringType(), FloatType()))
    ]))
)

data=SageMakerModel.transform(attachedModel, df['col1'])

I keep getting the below error though

py4j.protocol.Py4JError: An error occurred while calling o58.__getstate__. Trace:
py4j.Py4JException: Method __getstate__([]) does not exist
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
        at py4j.Gateway.invoke(Gateway.java:274)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)

any ideas ?

質問済み 2年前380ビュー
1回答
0

Hi there,

it would seem to me that there maybe an discrepancy in when one of your outputs is returned, they should only be returned after an if else statement and defined within the if else statement. Here is a resource that seems to address a problem similar to yours, perhaps applying a similar strategy will produce significant results.

Hopefully this will provide some insight into your problem.

Regards NN

回答済み 2年前
  • Hi Ntiyiso, where do you suggest to add such logic ?

  • Hi there exorcismus, I would suggest applying the logic to your predictions in the model itself if applicable.

  • Hi Ntiyiso, the model and the endpoint works fine and am able to use them for prediction, am not sure how/why implementing that logic affects being invoked from spark, what do you think ?

  • Hi exorcismus, it's possible that pyspark version and pyarrow settings aren't compatible with your os and/vm configurations similarly what sudopip was experiencing.

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ