Our Scoring Methodology
Every robot on RobotTesters is evaluated using a transparent, data-driven framework. Scores range from 1 (worst) to 10 (best) per subcategory and are combined into a final 0–100 overall score through weighted averages — no subjective impressions, no sponsored rankings. Each robot family (humanoid, robot vacuum, lawn mower, window cleaner, pool cleaner, companion and pet) is scored against a dedicated set of categories tailored to what actually matters for that kind of robot — pick a tab below to see its scoring scheme.
Last updated: April 2026
Score rating scale (applies to every robot family)
Mixed review model: hands-on tested vs specs-based
Not every robot in our database has been physically tested. Some scores are derived from publicly available specifications and verified secondary sources while we work to get hands-on time. We make this distinction explicit on every review — RTINGS, Consumer Reports and Wirecutter all do this for the same reason: a number without context is misleading.
The robot was reviewed in person by our team. Every subcategory — including those that require physical use (response naturalness, movement quality, setup, app control, durability under real conditions) — has a primary-source score. The full score is shown without an asterisk.
Score derived from manufacturer datasheets, verified demo footage and cross-referenced technical documentation. Subcategories that cannot be honestly judged without physical use are marked Pending hands-on and excluded from the weighted average — the remaining weights are renormalised. The overall score is shown with an asterisk * and a partial-score note.
Specs-based reviews exist so the comparator stays useful even before we get
every robot in the lab. When we do get hands-on time, the review is upgraded
and the date of the change is recorded in the JSON-LD dateModified
field. If you want to filter out specs-based reviews in the comparator, use
the Show only hands-on tested toggle on the
comparison page.
How the overall score is calculated
Each robot is assessed across 5 main categories, each split into weighted subcategories. A weighted average inside each category produces a category score (1–10). Those category scores are then combined with their respective category weights to produce the final 0–100 score (category score × 10).
1. Mobility & Performance
20% of final scoreMobility is the foundation of any useful robot. This category measures how fast, how far, how much and how well a robot moves through the physical world. For humanoids in particular, locomotion capability is the single biggest factor separating research toys from deployable machines. The category carries the highest weight alongside Intelligence because physical versatility directly determines real-world usefulness.
- 1–3: Below 0.5 m/s — barely useful for dynamic tasks.
- 4–6: 0.5–1.5 m/s — adequate for slow-paced service work.
- 7–9: 1.5–2.5 m/s — competitive for most industrial use cases.
- 10: 3.0 m/s or above — world-class bipedal locomotion.
- 1–3: Under 2 kg — limited to handling very light objects.
- 4–6: 2–8 kg — suitable for household or light industrial tasks.
- 7–9: 8–20 kg — strong enough for most industrial manipulation.
- 10: 25 kg or above — heavy-duty industrial-grade capability.
- 1–3: Under 1 hour — insufficient for most deployments.
- 4–6: 1–2 hours — marginal; acceptable only with swap capability.
- 7–9: 2–4 hours — practical for extended operational shifts.
- 10: 4+ hours or hot-swap enabled — deployment-ready endurance.
- 1–3: Flat indoor surfaces only; stumbles on minor obstacles.
- 4–6: Handles gentle ramps and thresholds; no outdoor capability.
- 7–9: Navigates stairs, uneven ground and moderate outdoor terrain.
- 10: Robust outdoor locomotion including rough terrain and recovery from falls.
2. Intelligence & Autonomy
20% of final scoreRaw physical capability is meaningless without the cognitive layer to direct it. This category evaluates how smart, perceptive and self-sufficient a robot is. As LLMs and AI stacks become embedded directly in robotic hardware, intelligence is increasingly the differentiator between robots that require constant operator intervention and those that can execute multi-step tasks independently.
- 1–3: Basic scripted or teleoperated responses only.
- 4–6: Pre-trained perception models; limited generalisation.
- 7–9: Strong vision + voice AI, partial LLM integration, real-time inference.
- 10: Full onboard LLM, multi-modal AI, generalised task reasoning.
- 1–3: Basic RGB camera and IMU only.
- 4–6: Stereo depth or structured-light sensors; basic obstacle detection.
- 7–9: Multiple depth cameras, voice recognition, comprehensive SLAM inputs.
- 10: 360° LiDAR, multi-array microphones, tactile sensing, full sensor fusion.
- 1–3: Teleoperated only; no autonomous movement.
- 4–6: Semi-autonomous; requires pre-mapping or operator supervision.
- 7–9: Full autonomous navigation in known environments; dynamic obstacle avoidance.
- 10: Self-mapping and autonomous navigation in unknown, dynamic environments.
- 1–3: Fixed, hardcoded behaviours only.
- 4–6: Basic imitation via teleoperation recording; limited generalisation.
- 7–9: Reinforcement learning or imitation learning with cloud pipeline.
- 10: On-device continual learning with zero-shot generalisation via LLM.
3. Build Quality & Hardware
16% of final scoreGreat software running on fragile or poorly engineered hardware will not survive deployment. This category assesses the physical engineering of the robot — the materials it is built from, how many axes of movement it has, and how rigorously safety is designed into the platform. It intentionally does not dominate the overall score because hardware quality, while essential, is a hygiene factor rather than a differentiator in 2024–2025.
- 1–3: Cheap plastics; fragile joints; no environmental protection.
- 4–6: Mixed materials; adequate for lab conditions but not field work.
- 7–9: Aluminium alloy or reinforced composites; robust for industrial use.
- 10: Aerospace or military-grade materials; full IP-rated environmental sealing.
- 1–3: Under 12 DOF — very limited range of motion.
- 4–6: 12–20 DOF — sufficient for basic bipedal locomotion.
- 7–9: 21–35 DOF — capable of complex manipulation alongside locomotion.
- 10: 36+ DOF — full dexterous hands plus rich body articulation.
- 1–3: No meaningful safety features; unsafe for human-adjacent operation.
- 4–6: Basic E-stop; software collision avoidance only.
- 7–9: Force-torque control, hardware compliance, E-stop, safety certifications.
- 10: Full ISO/CE safety certification; zero-force backdrivable joints; redundant stop systems.
4. Value for Money
35% of final scoreA robot that scores perfectly on performance but costs five million dollars serves almost no one. This category contextualises capability against real-world affordability, using a simple benchmark: can this robot justify its price by replacing a US worker who earns the same amount annually? A $16,000 robot is measured against a $16,000-per-year laborer; a $110,000 robot against a $110,000-per-year professional. This category also evaluates post-purchase support structures and how easily buyers can actually acquire the robot. At 35% of the final score, it is the single most influential category in our methodology — reflecting that purchase ROI is the primary gating factor for most buyers.
- 1–3: Cannot match the output of a human earning the robot's purchase price annually.
- 4–6: Partially matches the equivalent human — workable but not compelling ROI.
- 7–9: Reliably replaces or augments a human at the equivalent salary level.
- 10: Clearly outperforms the equivalent-salary human across all relevant tasks.
- 1–3: No formal warranty; no spare part availability.
- 4–6: Standard limited warranty; email support only.
- 7–9: Multi-year warranty, available parts, responsive technical support.
- 10: Comprehensive warranty, on-site service option, dedicated account support.
- 1–3: Pre-order only or limited to select institutional buyers.
- 4–6: Available in select regions; purchase process complex.
- 7–9: Available via authorised distributors in major markets.
- 10: Global direct sale with immediate fulfilment and clear pricing.
5. Developer Experience
9% of final scoreRobots are platforms, not appliances. The richness of the software ecosystem around a robot determines how quickly researchers and engineers can build new capabilities on top of it. This category carries the lowest weight (9%) because most end-users will not write code — but for the research and developer community it is a crucial signal. Robots with strong developer ecosystems compound in value over time as the community contributes new skills and tools.
- 1–3: Proprietary, closed ecosystem with no public SDK.
- 4–6: Partial SDK; some features locked behind proprietary layers.
- 7–9: Full ROS2 SDK with active maintenance; simulation environment available.
- 10: Fully open SDK, ROS2 + custom APIs, active releases, simulation + hardware in the loop.
- 1–3: Sparse or no documentation; no public community presence.
- 4–6: Basic documentation available; small community with infrequent activity.
- 7–9: Comprehensive docs, active GitHub, recurring community contributions.
- 10: World-class documentation, large open-source community, research paper ecosystem.
How a robot vacuum's overall score is calculated
Robot vacuums are scored across 5 categories designed around the only thing that ultimately matters: a clean floor with the least possible human intervention. Cleaning power and the autonomy of the navigation system together drive 55% of the final score, with ease of use, build quality and value filling out the rest.
1. Cleaning Performance
30% of final scoreThe actual cleaning result. Suction strength, mopping capability and how long the robot can keep going before the dustbin or tank become the bottleneck.
- 1–3: Under 2,000 Pa — only adequate for very light surface dust.
- 4–6: 2,000–4,000 Pa — solid hard-floor cleaning, weak on rugs.
- 7–9: 4,000–7,000 Pa — strong all-rounder, handles most carpets.
- 10: 7,000+ Pa — class-leading deep-clean performance.
- 1–3: No mopping function or static drag-pad only.
- 4–6: Basic vibrating or rotating mop, manual pad cleaning.
- 7–9: Auto-wash mop pad with downward pressure.
- 10: Hot-water mop wash + auto-refill water tank at the dock.
- 1–3: Small bin (<300 ml), no auto-empty.
- 4–6: 300–500 ml bin, no auto-empty.
- 7–9: Auto-empty dock with 30–45 day cycle.
- 10: Auto-empty + 60-day bag cycle and large 2L+ dock.
2. Navigation & Mapping
25% of final scoreHow smartly the robot moves around the home. Better mapping tech and obstacle avoidance translate directly into fewer missed spots and fewer cables eaten by mistake.
- 1–3: Random / bump navigation, no persistent map.
- 4–6: Gyroscope or single-camera mapping, single floor only.
- 7–9: LiDAR with multi-floor maps and saved zones.
- 10: LiDAR + reactive 3D / structured-light fusion, real-time updates.
- 1–3: Bumper-only — collides then turns.
- 4–6: IR or single-camera — avoids large objects.
- 7–9: Structured-light or AI vision identifying common categories.
- 10: Multi-class AI vision (cables, pet mess, socks, toys) with learning over time.
- 1–3: Manual placement on dock or unreliable return.
- 4–6: Reliable auto-return for charging only.
- 7–9: Auto-dock with auto-empty bin or auto-mop wash.
- 10: All-in-one dock: auto-empty + mop wash + water refill + dry.
3. Ease of Use
20% of final scoreHow frictionless the robot is to live with day to day — app polish, smart-home integration, and how rarely you need to actually touch it.
4. Build Quality
15% of final scoreHow long the robot will keep working. Anti-tangle rollers, replaceable parts, and the manufacturer's after-sales commitment.
5. Value for Money
10% of final scoreWhat you actually get per dollar, plus how easy it is to buy. A genuinely competitive robot at $400 can score better than a flagship at $1,500.
How a robot lawn mower's overall score is calculated
Robot lawn mowers are scored across 5 categories that combine cutting capability with the realities of an outdoor product: weather, slopes, theft and a multi-hour install. Cutting and navigation/safety together account for 55% of the final score.
1. Cutting Performance
30% of final scoreLawn size handled, what slopes the mower can climb, and how clean the cut is week over week.
- 1–3: Up to 300 m² — small urban gardens only.
- 4–6: 300–800 m² — average suburban garden.
- 7–9: 800–1,500 m² — large garden capacity.
- 10: 1,500+ m² — estate / commercial-grade.
- 1–3: Under 20% slope — flat lawns only.
- 4–6: 20–30% — gentle slopes.
- 7–9: 30–40% — moderate hills.
- 10: 45%+ — steep terrain, four-wheel-drive class.
2. Navigation & Safety
25% of final scoreHow the mower stays inside your lawn, doesn't get stolen, and avoids running in conditions it shouldn't.
- 1–3: Boundary wire only, no GPS.
- 4–6: Boundary wire + basic GPS tracking.
- 7–9: Wireless GPS or RTK system, no wire required.
- 10: Multi-sensor (RTK + vision) wireless boundary with cm-level accuracy.
3. Ease of Use
20% of final scoreHow painful the install is, how good the app is, and how easy it is to set a weekly schedule and forget about it.
- 1–3: 4+ hours of wire installation required.
- 4–6: 2–4 hours with assistance.
- 7–9: Under 1 hour, wire-free.
- 10: 15–30 min plug-and-play with auto-mapping.
4. Build Quality
15% of final scoreOutdoor robots have to survive years of rain, sun, dirt and the occasional kick. This category measures how prepared they are.
5. Value for Money
10% of final scoreCost benchmarked against capability and ease of acquisition.
How a window-cleaning robot's overall score is calculated
Window cleaners are scored across 5 categories with safety treated as a first-class concern: a robot that falls off a third-floor window is a failure, no matter how well it cleans. Cleaning quality plus safety/reliability together account for 60% of the final score.
1. Cleaning Performance
35% of final scoreHow well the robot actually cleans glass: pattern coverage, water and detergent control, and edge / frame detection that prevents falls.
2. Safety & Reliability
25% of final scoreWhether the robot stays on the window when something goes wrong — power loss, slippery glass or the cable snags.
3. Ease of Use
20% of final scoreHow quickly you can attach it, start a clean and choose between framed and frameless glass modes.
4. Build Quality
15% of final scoreMaterials, motor longevity and warranty backing — a piece of glass-mounted hardware needs to keep working for years.
5. Value for Money
5% of final scoreWindow cleaners are a niche category, so price weighs less than safety and clean quality — but it still matters.
How a robotic pool cleaner's overall score is calculated
Pool cleaners are scored across 5 categories that focus on full coverage of the pool surface, the efficiency of the cycle, and survival in chlorinated water for years. Cleaning coverage plus efficiency account for 60% of the final score.
1. Cleaning Coverage
35% of final scoreWhether the robot reaches every surface that matters: floor, walls, the waterline ring, and how well it captures fine debris.
2. Efficiency & Performance
25% of final scoreHow long a full cycle takes, how good the filtration is, and whether the navigation is methodical or random.
3. Ease of Use
20% of final scoreHow easy it is to drop in, schedule, and clean afterwards. Pool cleaners live a hard life and the human side has to be effortless.
4. Build Quality
15% of final scoreYears of submersion in chlorinated water destroy bad designs fast. Materials, sealing and warranty are critical.
5. Value for Money
5% of final scorePool robots are a long-term purchase, so value is weighted lightly: pure capability and durability dominate.
How a companion robot's overall score is calculated
Companion robots are scored across 5 categories where conversation quality and social interaction together carry the most weight. A companion robot that can't hold a meaningful conversation or read a room is missing the whole point — even if the hardware is excellent.
1. Conversation Quality
25% of final scoreWhether you can actually talk with the robot, how natural the responses feel, and whether it remembers anything from previous chats.
- 1–3: Pre-canned responses, simple keyword matching.
- 4–6: Basic NLU with limited intent detection.
- 7–9: Cloud LLM integration (GPT-class) with multi-turn dialog.
- 10: State-of-the-art LLM with personalised tone and on-device fallback.
2. Social Interaction
25% of final scoreHow the robot reads the user's mood and expresses itself through movement, sound and animation.
3. Ease of Use
20% of final scoreHow quickly the robot is up and running, how good its app is, and how often the manufacturer pushes meaningful updates.
4. Build Quality
15% of final scoreMaterials, finish, and battery life — companion robots live on a desk for years.
5. Value for Money
15% of final scoreCompanion robots span a wide price range; capability per dollar matters more here than in industrial categories.
How a robot pet's overall score is calculated
Robot pets live or die on how lifelike they feel. Behaviour realism gets the heaviest weight (30%), with responsiveness and the rest of the categories filling out the framework. Build quality matters because pets are picked up and dropped, and battery life because pets that "sleep" all day get boring fast.
1. Behaviour Realism
30% of final scoreMovement quality, expressiveness and personality depth — the three things a person can sense within minutes of unboxing.
- 1–3: Stiff, mechanical movement, very few gaits.
- 4–6: Multiple gaits, occasionally robotic.
- 7–9: Smooth multi-gait locomotion with playful behaviour.
- 10: Indistinguishable-from-life-like at a glance, dynamic recovery from stumbles.
2. Responsiveness & AI
25% of final scoreHow well the pet reacts to touch, voice and the long-term changes in its environment.
3. Ease of Use
15% of final scoreHow quickly the pet "wakes up" the first time, and how good the companion app is for daily play.
4. Build Quality
15% of final scoreRobot pets get picked up, dropped, hugged and rolled — durability and battery life decide whether they stay loved or end up in a drawer.
5. Value for Money
15% of final scoreRobot pets range from $150 toys to $3,000 collectibles, so price relative to capability really matters.
Editorial independence
RobotTesters is an independent publication. No manufacturer pays for scores, rankings or featured placements. All scores are assigned by our editorial team based on publicly available specifications, hands-on testing where possible, and cross-referenced technical documentation.
Affiliate links (where present) help fund the site but have zero influence on scores. If you believe a score is incorrect or outdated, please reach out at info@robottesters.com — we welcome corrections.