Enterprise Monitoring and Performance Engineer
You will be responsible for partnering with Operations, Infrastructure, development, QA and product teams to analyze the product and hardware performance and scalability against the agreed upon requirements and projected use cases and ensure proper test cases, monitors, alerts and dashboards are created prior to production implementation.
You'll have a key role in our application and system’s success by partnering with other key stakeholders to recommend product improvements, such as features that would improve application usability. This role is also responsible for proactively evaluating processes and providing recommendations to increase efficiency, system availability, and the quality of the user experience.
This is a technical and hands-on role for someone who is adept at gathering and analyzing performance data and finding the root cause of performance bottlenecks and monitor and troubleshoot performance issues in pre and post production systems.
Here’s what you can expect from the job and what you need to be successful:
Job Duties:
- Expand the Application Performance Monitoring (APM) program to improve application and platform performance, availability, and resiliency through real-time performance monitoring and alerting
- Optimize the APM tool foot print by avoiding redundancies, rationalization, and upgrading to the latest cost efficient solutions
- Establish verifiable performance benchmarks for core and system components
- Partner with application/operational teams and vendors to build a solution and custom configuration, including network, systems, and application performance dashboards
- Scope, gather technical requirement around the customer monitoring use cases and business KPIs, and translate them to tool specifications for APM, Alerts, Synthetics, Monitoring and Dashboards, and ensure successful implementation
- Develop and enhance existing tools and processes for monitoring and measuring service performance in production and sandbox environments
- Identify, analyze and resolve the most complex public and hybrid cloud-based issues
- Configure and extend monitoring, alerting, and performance tools to satisfy user needs, including application delivery, middleware, infrastructure, database administration, etc.
- Partner with network, infrastructure, and database teams to develop and deploy appropriate and effective monitors and alerts that facilitate proactive response and/or faster service mean time to restore
- Collaborate with development and QA to develop relevant scalability, stability and stress scripts, and monitors to help deliver a quality product that meets agreed upon performance metrics that get sign-off prior to production roll-out
- Create, maintain, and update documentation for tracking issues, errors, application changes, infrastructure related changes, incident resolution, etc.
Essential Skills:
- Experience with system capacity monitoring and forecasting applications
- Solid understanding of large scale applications, network architectures, monitoring performance and fault management
- Experience with application & infrastructure monitoring for performance, availability, and scalability
- Strong Experience with Windows and Linux OS as well as virtualization platforms such as VMWare and Microsoft Hyper-V in a hybrid multi-cloud architecture
- Basic scripting knowledge and experience including regular expressions, bash, PowerShell, and Python
- Basic experience with network & firewall topology
- Proven track record of automating processes and developing effective QA measures that include performance testing, capacity management, and reporting
- Stay abreast of the latest monitoring technology and trends
- Minimum 4+ years’ experience administering and creating dashboards for multiple monitoring systems such as Dynatrace, New Relic, Solarwinds, Splunk, ExtraHop, as well as native tools such provided by vendors such has HP, VMWare, Microsoft, AWS, etc.
- 2 years of experience working with agile scalable software engineering
- 2 years of experience in CI/CD, automation and DevOps practices
- Must have knowledge in application architecture, OSI layers and software design and development methodologies
- Prior experience working with business metrics reporting, customer experience monitoring and optimization for digital products
- Excellent analytical, time management, organizational and problem-solving skills with the ability to multi-task and work in a deadline-driven environment
- Excellent verbal, written, and interpersonal communication skills and the ability to engage with business partners to understand their requirements
Location: San Jose, CA 95134 | Rocklin, CA 95765 | Hillsboro, OR 97124
Later this year, we will be asking our employees to return to their assigned First Tech location in a hybrid work format that will vary based on the role.
What makes First Tech different? Click here to learn more!
First Tech is not currently offering Visa sponsorship for this position
Equal Opportunity Employer/Protected Veterans/Individuals with Disabilities
The contractor will not discharge or in any other manner discriminate against employees or applicants because they have inquired about, discussed, or disclosed their own pay or the pay of another employee or applicant. However, employees who have access to the compensation information of other employees or applicants as a part of their essential job functions cannot disclose the pay of other employees or applicants to individuals who do not otherwise have access to compensation information, unless the disclosure is (a) in response to a formal complaint or charge, (b) in furtherance of an investigation, proceeding, hearing, or action, including an investigation conducted by the employer, or (c) consistent with the contractor’s legal duty to furnish information. 41 CFR 60-1.35(c)