Site Reliability Engineer
First Performance Global
First Performance develops the next generation of financial card controls, AI/ML based data enrichment and digital engagement to leading processors and banks across the world. We empower their cardholders to use, manage, and control their finances digitally.
First Performance is a global company headquartered in Midtown Atlanta’s hub of technology and innovation and has offices also in Santiago, Chile and Sao Paulo, Brazil. We are backed by some of the largest and most respected financial and corporate investors including MasterCard, Fiserv/First Data, Synchrony Financial, Regions Bank, RRE Ventures, and Thandorf.
Everyone at First Performance brings purpose and passion to work every day. Our teams are small, dedicated, and collaborative. Individuals are given ownership and accountability for their work. Our company is not just about technology, it’s about people. We help employees to build great careers and live great lives, especially in these complicated times. Our goals are about achieving success for our customers and for our company. If you love to invent, have an entrepreneurial spirit, and strive for operational excellence, we want you on our team!
First Performance is seeking a leader and visionary to join the team as a Site Reliability Engineer. We are looking for someone who is currently a sre or dev-ops engineer, has experience working with teams both domestic and international, and is looking to build cloud and enterprise products to scale and process large datasets across multiple toolchains.
This critical role carries significant responsibility for establishing best practices for deployment and monitoring our applications at scale. This includes working closely with our engineering and operations teams to ensure design, implementation, and run-time performance is tracked, measured, and maintained once code reaches production.
We are looking for an individual that has demonstrated experience in working with Kubernetes, one or more cloud platforms, as well as multiple big data pipelines.
A candidate for this role should have 5+ years experience with designing and maintaining: cloud deployment architectures, and continuous integration and deployment. We strive to be on the forefront of best practices in the realm of software development and delivery and are looking for someone to continue pushing us to improve the way we deliver value to our customers.
Essential Duties and Responsibilities
Infrastructure Architecture: Define, establish, and implement comprehensive, yet practical strategies to ensure the quality of First Performance’s infrastructure. Architectural strategies must consider all processes for the Product and Software Development lifecycles and operations pipelines.
Data-driven: Identify key assumptions in architectural strategy and take a data-driven approach to validating assumptions and increase likelihood of success, adjusting architecture and processes as needed for continuous improvement.
Communication and Collaboration: Develop and maintain a strong working and personal relationship with the First Performance product, development, operations, and support teams to ensure plans account for ample time required to bake-in quality from product inception through implementation to delivery.
Monitor and Support: Assist in the 24x7 monitoring of production environments. Help teams with responses to client support requests.
Automation: Work side by side with engineering leaders to enforce ‘automation first’ strategy through the entire SDLC and all operations pipelines, ensuring security is integrated into the CI/CD pipeline and quality is delivered.
Team Mentorship: Inspire and engage others to embrace change in order to build upon the strengths of First Performance team members, increasing and providing career growth.
Security and Compliance: Ensure that all infrastructure operations maintain the strictest security possible for operations.
Resilience: Ensure that all services are deployed with resilience in mind. Develop strategies for high availability and disaster recovery.
Document: Develop detailed documentation for all infrastructure processes and deployments to ensure long term success.
Proven experience (5+ years) maintaining cloud managed (or on-prem) big data systems (Hadoop, Cassandra, Kafka, Spark, etc.)
Proven experience (5+ years) maintaining cloud systems AWS strongly preferred
Proven experience (2+ years) with server side scripting languages (python, bash)
Experience (2+ years) deploying and maintaining Kubernetes clusters, preferred
Experience with SQL and NoSQL RDBMS management
Strong understanding of security principles
Familiarity with payments preferred
Understanding of performance tools, metrics and benchmarks
Excellent communication skills in verbal and written English. Proficiency with Spanish a major plus.
Identify bottlenecks and pain points and direct efforts to address the challenges in a data-driven manner.
Focus on continuous improvement by setting up processes that track issues, allow us to learn from them, and incorporate those learnings to prevent recurring issues.
Comfortable with client interaction
Experience with PCI compliance preferred
Experience working with SaaS deployments preferred
Experience with Atlassian Suite (JIRA, bitbucket, Confluence) is a strong plus