Paxton Gao | AI Agent and Web3 Risk Engineer

Applied AI Agent Developer candidate

I turn ambiguous AI ideas into working agent systems with tools, retrieval, evidence, and evaluation.

My current focus is ChainRisk Agent, an evaluation-backed Web3 risk triage prototype that separates deterministic risk scoring from LLM-generated summaries.

Contact View project

About

Engineering profile

I am completing an engineering master's degree at the University of Sheffield after a dual-degree background in automatic control systems. My practical work sits at the intersection of LLM applications, RAG, backend APIs, and system safety.

For AI Agent roles, I position myself as a hands-on builder: I can break a fuzzy task into input routing, tool calls, retrieval, deterministic checks, structured output, logs, and evaluation cases.

Experience

RAG and applied AI work

2025.02 - 2025.08

RAG Engineer Intern, Robin AI

Worked on retrieval-augmented question answering for contract and legal-document workflows, focusing on document preprocessing, retrieval quality, citation-aware prompting, and failure analysis for internal evaluation.

Handled document structure concerns such as OCR text, natural sections, tables, and bullet-style content.
Explored query expansion, hybrid retrieval, reranking, and citation-aware answer generation.
Kept claims evidence-bound: public resume metrics should be backed by internal reports before being presented as production impact.

Featured project

ChainRisk Agent

Evidence-grounded Web3 risk triage agent. The system accepts a wallet, token, or project input and returns risk_level, risk_score, evidence, tool_trace, uncertainties, next_checks, and safety notes.

Workflow: router -> read-only tools -> hybrid RAG -> deterministic rules -> RiskSignal fusion -> DeepSeek report writer -> critic.
Read-only guard blocks transfer, sign, swap, approve, and other high-risk write intents before tool execution.
LLM boundary: DeepSeek writes evidence-grounded summaries and next checks; it does not decide risk_score or risk_level.
Evaluation-backed prototype; metrics are dataset-specific and should not be read as live risk-scoring claims.

ChainRisk agent workflow diagram — ChainRisk separates facts, retrieval, scoring, report writing, and critic checks.

Evidence

What can be verified locally

Unit tests

Coverage for safety blocking, missing data, retrieval metadata, public labels, and risk signal fusion.

Workflow cases

Local evaluation cases for risk levels, evidence, unsupported claims, and latency tracing.

RAG cases

Small retrieval smoke test covering liquidity, rug pull, honeypot, holder concentration, phishing, and failed transaction risk.

70%

Real-CATS recall

Dataset-specific behavior-fusion baseline on held-out addresses; useful signal, not a live fraud-detection claim.

Skills

Working stack

Languages and backend

Python, FastAPI, REST APIs, schema design, unit tests, local smoke servers.

LLM systems

RAG, Agent Workflow, Prompt Engineering, Hybrid Retrieval, Reranking, Evaluation.

Web3 risk tooling

Read-only tooling, public label benchmark, RiskSignal fusion, safety guardrails.