![]() ![]() Host the scrip in any of your server (LAN) and connect to Redshift using boto3.It seems to me you are reading some data from Redshift then create test and training set and finally get some predicted result (records).If so: Finally, you can use windows scheduler / corn job to schedule your Python scripts with parameters like SQL Server Agent job does.Alternatively, you can create function-using python in Redshift as well that will act like Stored Procedure. ![]() Inside these functions, you can put / constrict your transformation You can write your own functions using Python that mimics stored procedures. ![]() To talk with Redshift from any workstation of you LAN (make sure It seems you are a python developer (as you told you are developing Python based ML model), you can do the transformation by following the steps below: It depends upon the technology stack you require.Īmazon CloudWatch Events can schedule Lambda functions, which could then launch EC2 instances that could do your processing and then self-Terminate. Many ML users like using Spark on Amazon EMR. If it is small, you could use AWS Lambda (maximum 5 minutes run-time). Yes, you could run code on an EC2 instance. (Use psql v8, upon which Redshift was based.)Īlternatively, you could use more sophisticated ETL tools such as AWS Glue (not currently in every Region) or 3rd-party tools such as Bryte. This could be as simple as running a script that uses psql to call Redshift, such as: `psql -c 'insert into z select a, b, from x'` You should run your ETL logic external to Redshift, simply using Redshift as a database. (I know of many Oracle customers who have locked themselves into never being able to change technologies!) Also, I should point out that stored procedures are generally a bad thing because you are putting logic into a storage layer, which makes it very hard to migrate to other solutions in the future. Thanks in advance for the assistance!Īmazon Redshift does not support stored procedures. I see tons of AWS (and non-AWS) products that look like they might be relevant (AWS Glue/Data Pipeline/EMR), but there's so many that I'm a little overwhelmed. What's the best way to host my python logic and do the data processing if the plan is to pull data from that Redshift cluster, score it, and then insert it into a new table on the same cluster? It seems like I could spin up an EC2 instance, host my python scripts on there, do the processing on there as well, and schedule the scripts to run via cron? The other task I have involves developing a machine learning model (in Python) and scoring records in that Redshift database. How would I go about creating a SQL job and scheduling it to run nightly (for example) in an AWS environment? However, sprocs don't appear to be a thing in Redshift. In an MSSQL environment, I would simply put all the logic into a parameterized stored procedure and schedule it via SQL Server Agent Jobs. The first task I need to do requires some basic transforming of existing data in that cluster into some new tables based on some fairly simple SQL logic. I have been tasked with doing some data processing and machine learning on the data in that Redshift cluster. I'm working with a small company currently that stores all of their app data in an AWS Redshift cluster. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |