Sr. System Engineer
Company: Support Revolution
Location: San Jose
Posted on: April 30, 2025
Job Description:
Select how often (in days) to receive an alert: Create
AlertLocation: San Jose, California, United StatesAbout
Supermicro:Supermicro is a Top Tier provider of advanced server,
storage, and networking solutions for Data Center, Cloud Computing,
Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded
customers worldwide. We are the #5 fastest growing company among
the Silicon Valley Top 50 technology firms. Our unprecedented
global expansion has provided us with the opportunity to offer a
large number of new positions to the technology community. We seek
talented, passionate, and committed engineers, technologists, and
business leaders to join us.Job Summary:As a global leader in
server technologies, Supermicro has been growing extremely fast in
many key markets such as Cloud Computing, Big Data, HPC, AI and
Storage, etc. To meet the market demand, Supermicro is developing
end to end enterprise IT solutions with compute, storage,
networking all integrated into full rack or multi-rack level
systems. Senior System Engineer plays an important role in
designing, implementing, testing and deploying rack system
solutions for data center and enterprise customers.Essential Duties
and Responsibilities:Includes the following essential duties and
responsibilities (other duties may also be assigned):
- Deploy Rack/Cluster infrastructure and execute comprehensive
system level testing on the latest GPUs, CPU processors, Network
and Storage, encompassing functionality, compatibility,
performance, stress, and reliability testing, leveraging
proprietary in-house tools.
- Conduct proof of concept design and testing. Establish
expertise in HPC/AI applications and benchmarks, providing
optimized benchmarks for HPC/AI applications by fine-tuning system
settings, optimizing OS/network configurations, and demonstrating
strong problem-solving skills and building robust processes and
procedures for HPC/AI solutions.
- Provide operational support for Cluster, Storage, HPC and Cloud
infrastructure. Identify and document hardware and software quality
issues. Collaborate with product management and other Engineering
teams to integrate enhancements into future products.
- Write technical documents for test procedures, test reports and
troubleshooting procedures related to servers/networks/clusters
software and hardware to facilitate knowledge sharing.
- Deliver on-site deployment services to ensure customer
acceptance verification and satisfaction.
- Write automation tools for cluster deployment and test
environment.Qualifications:
- BS/MS in Electrical Engineering, Computer Engineering or a
related field, MS preferred.
- 5-8+ years of work-related experience in server/network/storage
hardware configuration, testing, debugging and
troubleshooting.
- 5-8+ years of work-related experience in DevOps or in cloud
environments, including but not limited to Docker/Containers and
Kubernetes.
- Experience with AI/ML frameworks such as PyTorch, TensorFlow,
etc.
- Familiar with TCP/IP protocol stack, UDP, IPv4-IPv6, DNS, DHCP
and other Application protocols.
- Familiar with HPC, AI or Cloud benchmark tests, networking
architecture.
- Excellent Programming skills in Python and shell
scripting.
- Strong communication skills and strong sense of teamwork and
good team player.
- Familiar with MLPerf Training/Inference benchmark, LLM, HPL-AI
or RCCL/NCCL is a plus.
- CCNA, OpenStack, Openshit, Azure or AWS is a plus.Salary
Range$140,000 - $158,000The salary offered will depend on several
factors, including your location, level, education, training,
specific skills, years of experience, and comparison to other
employees already in this role. In addition to a comprehensive
benefits package, candidates may be eligible for other forms of
compensation, such as participation in bonus and equity award
programs.EEO StatementSupermicro is an Equal Opportunity Employer
and embraces diversity in our employee population. It is the policy
of Supermicro to provide equal opportunity to all qualified
applicants and employees without regard to race, color, religion,
sex, sexual orientation, gender identity, national origin, age,
disability, protected veteran status or special disabled veteran,
marital status, pregnancy, genetic information, or any other
legally protected status.
#J-18808-Ljbffr
Keywords: Support Revolution, Parkway-South Sacramento , Sr. System Engineer, Other , San Jose, California
Didn't find what you're looking for? Search again!
Loading more jobs...