UR7e Autonomous USB-C Insertion

1. Introduction

The Problem

In the field of industrial automation, the "Peg-in-Hole" problem is a classic benchmark for manipulation capability. While humans perform this task using intuition and high-fidelity tactile feedback, robots struggle with the tight tolerances involved. Our project focuses specifically on inserting a USB-C connector—a task that requires sub-millimeter precision (±0.5mm) and delicate force control to avoid damaging the small, reversible port.

Our Solution

We developed a hybrid approach combining Eye-in-Hand visual servoing for coarse alignment with a force-feedback spiral search algorithm for fine insertion. The system uses a monocular camera mounted on the robot's end-effector to locate the USB-C port, then transitions to a "blind" tactile search phase that compensates for calibration errors and ensures safe, successful insertion.

The Impact

By solving this precision manipulation challenge with low-cost sensors (a webcam and the UR7e's built in force sensor), we demonstrate that sophisticated insertion tasks don't require expensive force/torque sensors or sub-millimeter positioning systems. This approach can be adapted for automated testing, electronics assembly, and any application requiring reliable connector mating in industrial environments.

Project Goals

The primary objective was to develop a "blind-capable" insertion system. We aimed to build a robotic system that could:

Visually identify a USB-C port on a flat surface using a monocular camera.
Align a robot arm (UR7e) with the target using visual servoing.
Execute a tactile "spiral search" maneuver to compensate for visual calibration errors.
Insert the connector without exceeding a safety force threshold of 10N.

Why is this interesting?

Standard industrial robots rely on perfect calibration and fixed fixtures. If a part moves by 2mm, a standard robot will crash. Our project solves the Unstructured Environment problem. By combining vision (for "getting close") and force (for "getting it right"), we created a system that works even if the charging port is moved or placed under different lighting conditions.

Real-World Applications

Automated Product Testing: Robots plugging/unplugging devices to test port durability in manufacturing.
EV Charging: Autonomous charging stations connecting to vehicles.
Electronics Assembly: Precision mating of connectors during PCB assembly.
Everyday Household:: Ensuring personal devices are being charged overnight for the next day of work.

2. Design

Our approach to creating a precision USB-C insertion system

Design Criteria

To achieve a successful insertion, our system had to meet the following criteria:

Detection Precision: Align the tool tip within <1.0mm tolerance of the port center
Robustness: Handle slight misalignments in the port's position in the X/Y plane.
Insertion Precision: Successful insertion with a >70% success rate and <5mm localization error prior to full-fledged insertion.
Safety: Stop pushing immediately upon meeting resistance to avoid damaging the delicate port.

Final Design:

Our final design is in 4 stages:

Calibration: First move the robot to 18 unique calibration positions and measure the chessboard pose. Using the chessboard pose in camera's frame and end effector positions (joint states), calculate the end effector to camera transform.
Blind Shift: Move the end-effector down to a pre-align height and align the camera directly with the chessboard. That way, as long as the USB-C is aligned with the chessboard, we will know how to insert.
Visual Shift and Dive: Use the camera to detect the USB-C port and iteratively align the camera (not end-effector) above it, then move down to the insertion point.
Force-Guided Spiral Search: Transform so that the end effector (instead of camera) is aligned with the port. Engage force mode and perform a spiral search pattern until the port is found, then insert.

Design Trade-offs

We originally chose to assume the transform between the camera and end effector was fixed. That way, we can align the camera and move straight down to insert into the port. However, we eventually realized even the slightest error in camera mounting would cause significant misalignment at the tool tip. Thus, we used camera calibration to reduce error in each step.

In order to achieve higher accuracy, we revised our scope to not picking up any cable for the rigidity of a fixed tool. This reduced the complexity of cable detection as well as cable pick up, allowing us to focus on the insertion logic and execution. Ultimately, we pivoted a decent amount from our initial proposal (due to our conversation with our assigned TA) to prioritize robust core functionality of the USB-C twist of the traditional peg insertion problem. In terms of project specific alterations, we also switched from joint-space planning to Cartesian path planning to ensure the camera remained perpendicular to the workspace.

Impact of Design Choices: Real-World Implications

Our system was designed not just for a lab demo, but to model how robust assembly lines function under uncertainty. We focused on bridging the gap between "computer vision" and "mechanical reality."

1. Hybrid Strategy: Mimicking Industrial "Blind" Assembly

Design Choice: We handed off the "last millimeter" of insertion to a tactile force-guided spiral search rather than relying on perfect vision.

Real-World Value: In factory environments, lighting is inconsistent and metallic parts (like USB-C ports) cause specular reflections that blind depth sensors. By decoupling success from perfect visibility, our approach mirrors cost-effective industrial automation where "compliance" (tactile sensing) is prioritized over expensive, infinite-resolution cameras.

2. Cartesian Planning: Enhancing Cobot Safety

Design Choice: We enforced strictly linear Cartesian paths rather than the curved, energy-efficient arcs of joint-space interpolation.

Real-World Value: In collaborative workspaces (Cobots), predictability is safety. Operators can intuitively anticipate a linear path, whereas joint-space arcs can swing unpredictably into human zones. Linearity also ensures sensors remain perfectly aligned with the workspace, eliminating the "perspective drift" common in dynamic environments.

3. Software-Compensated Hardware (Calibration)

Design Choice: We utilized a custom 18-point hand-eye calibration routine to solve for physical offsets, rather than relying on fixed CAD assumptions.

Real-World Value: This approach drastically lowers the "barrier to entry" for hardware. It demonstrates that high-precision tasks do not require expensive, perfect manufacturing. By using software to mathematically correct for physical imperfections (like loose mounts or 3D printing tolerances), manufacturers can achieve sub-millimeter accuracy using cheaper, off-the-shelf components.

3. Implementation

Hardware Setup

CAD Models

End Effector with USB-C Attachment

Baseplate with Bigger Holes and Tolerance USB-C

Baseplate with Bigger Holes & Tolerance USB-C

To Spec USB-C Baseplate

Mechanical Architecture

We opted for a Fixed-Tool, Eye-in-Hand configuration.

Calibration Board: A 9x6 checkerboard pattern (23.25mm squares) provides the "World Frame" and solves scale ambiguity.
The End Effector: A custom 3D-printed end effector that is held by the gripper of the UR7e. This also serves as a bracket tp rigidly mount a Logitech C922 webcam parallel to the tool flange as well as contains the USB-C shaped stub at the bottom to serve as a connector.
The USB-C Port: A 3D-printed USB-C port on a baseplate.

Software A. Perception: The 'Calibration' Node

The core perception logic relies on a robust OpenCV pipeline encapsulated within the CalibrationNode. We utilize cv2.findChessboardCorners to identify the grid pattern in the raw video feed, followed immediately by cv2.cornerSubPix to refine these detections to sub-pixel accuracy. This refinement step is crucial for high-precision insertion tasks, as it minimizes quantization error before any geometric calculations occur. Once the 2D pixel coordinates are locked, cv2.solvePnP matches them against the known 3D geometry of the chessboard (obj_points). This algorithm solves the Perspective-n-Point problem, outputting the rotation and translation vectors that define exactly where the chessboard sits relative to the camera's optical center.

To make this spatial data actionable for the robot, we perform a rigorous coordinate transformation sequence. The raw output from the vision pipeline is in the camera's optical frame, which is distinct from the robot's coordinate system. The code converts the rotation vectors into quaternions using scipy.spatial.transform and leverages the ROS 2 tf2 buffer to transform the pose from the camera_link frame to the base_link frame. This ensures that the detection is not just relative to the sensor, but rooted in the robot's fixed base frame, allowing for consistent motion planning regardless of how the camera is mounted or moved.

Finally, the node functions as a managed state machine rather than a continuous stream, ensuring system stability. It utilizes a ROS service (/calibrate_height) to trigger detection only when the robot is static, preventing motion blur from corrupting the measurement. The integration of the PinholeCameraModel validates the intrinsics (focal length and principal point), allowing us to project rays from 2D pixels (u, v) into 3D space. Visual debugging tools embedded in the code—drawing the principal point and board center—allow operators to visually verify that the physical camera matches the mathematical model before calibration data is published.

Pseudocode:

def process_frame(image):
    # 1. Detection: Find grid pattern in 2D image
    corners_2d = cv2.findChessboardCorners(image)
    
    if found:
        # 2. Refinement: Improve accuracy to sub-pixel level
        refined_corners = cv2.cornerSubPix(corners_2d)
        
        # 3. PnP Solver: Match 2D pixels to known 3D geometry
        # Returns position/rotation of board relative to Camera
        pose_in_camera = cv2.solvePnP(refined_corners, known_3d_points)
        
        # 4. Transformation: Convert to Robot Base Frame
        # Crucial for motion planning
        pose_in_base = tf_buffer.transform(pose_in_camera, 
                                          from="camera_link", 
                                          to="base_link")
        
        publish(pose_in_base)

Software B. Motion Planning: The 'Move' Node

This node acts as the system's "Brain," using a Finite State Machine to command the UR7e via MoveIt. It orchestrates the transition from visual alignment to tactile insertion.

1. The Control Loop (FSM)

ALIGN: Uses closed-loop visual servoing to minimize error to <1.0mm.
DIVE: Descends to a "pre-insert" height (Z=12cm) while locking X/Y alignment.
SHIFT: Executes a "blind shift" using TF2 transforms to physically swap the camera position with the gripper tip.
SPIRAL: Triggers the /spiral_search service, handing off control to force feedback.

2. Cartesian Path Planning

We prioritize Cartesian Path Planning over joint interpolation. By enforcing strictly linear motion, we prevent the arc-induced "perspective drift" common in joint moves. This ensures the camera remains perpendicular to the target throughout the descent, validating our coordinate transforms.

3. The "Shift"

The SHIFT state handles the critical hand-eye coordination. The code calculates the real-time inverse transform between camera_link and tool0, determining exactly how to position the gripper where the camera lens just was. This maneuver relies entirely on the accuracy of our calibration node, as the target becomes occluded by the robot itself.

Pseudocode:

state = "ALIGN"

while node_is_running:
    if state == "ALIGN":
        # Closed-loop Visual Servoing
        error_vector = get_usb_c_position()
        if magnitude(error_vector) < 1.0mm:
            state = "DIVE"
        else:
            # Move linearly to correct error
            move_cartesian(dx=error_vector.x, dy=error_vector.y)

    elif state == "DIVE":
        # Move straight down 12cm, maintaining X/Y lock
        move_cartesian(dz=-0.12)
        state = "SHIFT"

    elif state == "SHIFT":
        # The "Blind" Hand-Eye Swap
        # Calculate physical offset between camera lens and gripper tip
        offset = get_transform(from="camera_link", to="tool0")
        move_cartesian(offset) 
        state = "SPIRAL"

    elif state == "SPIRAL":
        # Hand off control to force feedback system
        trigger_service("/spiral_search")
        stop_cartesian_control()

Software C. Object Detection: The 'CV' Node

While the Calibration Node handles static alignment, the CV node is responsible for real-time target detection. Because the robot may approach the USB-C port from varying heights (changing the apparent size of the port), standard template matching is insufficient.

To solve this, we implemented a Multi-Scale Template Matching pipeline. The algorithm iteratively resizes the template across a defined range (0.1% to 1.0% of the image area) to ensure scale invariance. We utilize Normalized Cross-Correlation (NCC) (cv2.TM_CCOEFF_NORMED) as our matching metric, which makes the system robust to global lighting changes—a critical feature for detecting metallic ports that reflect light unpredictably.

Figure C.1: Correlation Heatmap. Red areas indicate high probability matches (NCC > 0.8), while blue areas indicate low correlation.

Finally, to prevent false positives from cluttering the data, we apply Non-Maximum Suppression (NMS). This filters out overlapping bounding boxes, ensuring that only the single strongest detection candidate is published to the /usbc_pose topic.

Pseudocode:

def find_usbc_port(image, template):
    all_matches = []

    # 1. Multi-Scale Search
    # Iterate scale from 0.05% to 1.5% of image size
    for scale in linspace(min_scale, max_scale, 20):
        
        # Resize template to current scale
        resized_template = cv2.resize(template, scale)
        
        # 2. Normalized Cross-Correlation
        # Robust against brightness differences
        heatmap = cv2.matchTemplate(image, resized_template, TM_CCOEFF_NORMED)
        
        # 3. Thresholding
        locations = where(heatmap > 0.8)
        all_matches.append(locations, score)

    # 4. Non-Maximum Suppression (NMS)
    # Remove overlapping boxes, keep only the highest score
    best_match = non_max_suppression(all_matches, overlap_thresh=0.3)

    return best_match.center_x, best_match.center_y

Software D. Force Control: The 'SpiralSearch' Node

Standard ROS topics (and our attempted use of a load cell) have too much latency (50-100ms) for force insertion. We solved this by injecting URScript directly into the controller to toggle force mode on the UR7e.

The Algorithm

Apply 4N downward force (Force Mode).
Move in a square spiral pattern (0.5mm step size).
Monitor Z-position. If Z drops > 3mm, trigger SUCCESS.

Pseudocode:

# Snippet of our URScript Injection
def _perform_spiral_search(self):
    ur_script = """
    force_mode(p[0,0,0,0,0,0], [0,0,1,0,0,0], [0,0,-5.0,0,0,0], 2, ...)
    while step_count < max_steps:
        # Move in expanding spiral (0.5mm steps)
        # Check Z-height
        if drop > 0.003: # 3mm drop detected
            textmsg("HOLE FOUND!")
            break
    """
    self.script_publisher.publish(ur_script)

We put all of our services, MoveIt, camera initialization into launch files for easier management and reproducibility.

5. Conclusion

We successfully engineered a robotic system capable of sub-millimeter insertion without expensive hardware. The project demonstrated that Hybrid Control (Vision + Force) is far more reliable than Vision alone. The visual system gets us "in the ballpark," but the tactile system "scores the run." Overall, we are quite content with our work as we ended up successfully accomplishing our intended target outcomes achieving a 80% success rate with <5mm localization error given our adjusted project (focusing only on USB-C insertion) while leaving extensive room for taking the project to the next level going forward.

Challenges & Improvements

Challenges:

The "Hand-Eye" calibration (finding the exact offset between camera and tool) was our biggest difficulty. We spent significant time playing around with varying parameters until we found the proper values of fully accurate calibration. Our live demo utilized flawed calculation logic due to the relatinoship complexity between varying transforms.

Flawed transform logic (live demo code).

Fixed and 100% accurate transform logic.

The solution to force mode (along with general ability to collect force data) was also an elaborate journey as we were unaware of the UR7e's default ability to enter a force detection state. Prior to this, we designed our own custom cantilever load cell setup where we attempted to a) interface an ESP32 serially; b) publish load-cell data to MQTT broker for subscribers. Both of these proved to be way too slow for our use case, and thankfully we discovered force mode.
Logic behind effective search pattern - with the nature of the shape of the USB-C, a typical circle shaped search pattern is not optimal, so we tried varying other spiral search patterns to find the most effective for USB-C insertion.
Given our goal of having redundancy of USB-C detection in varying lighting conditions, we needed to have effective detection with high confidence interval for the USB-C contour. We tried varying CV options including Keyhole detection (and many more) to find the most consistent and robust contour detection.

Future Work:

Fix the bonk after toggling force mode - with the nature of our class's UR7e setup (where it is constantly waiting for a command to be sent from the computer), us injecting UR_script to toggle force mode directly severs that preexisting connection, meaning upon the completion of insertion our robot ends the "BONK" error message. We hope to be able to find a way to solve this issue to make our project inherently more repeatable.
Rotated USB-C functionality - the last step towards fully robust USB-C insertion would be the ability to rotate to properly plug into a port that is not already aligned inline with the conenctor already. As the real world rarely has everything lined up already, this is the next (and final logical step) towards achieving comprehensive USB-C insertion.
Widen apparatus of port types - as our original project proposal hoped to achieve, we want to expand our project to be able to succeed with any and all commonly used port types. The extension of this would be to not only insert the port into the connector, but also to detect what type of port it is prior to be able to interact with something that multiple types of ports (like a typical USB-C hub or the IO apparatus on a computer).
Pick up functionality - the very last stretch goal of port insertion is the ability to pick up the cable itself with our robot prior to insertion. This would fully cover all the bases of what a human would be doing when prompted with the need of plugging a cable in.

Autonomous USB-C Insertion

1. Introduction

The Problem

Our Solution

The Impact

Project Goals

Why is this interesting?

Real-World Applications

2. Design

Design Criteria

Final Design:

Design Trade-offs

Impact of Design Choices: Real-World Implications

1. Hybrid Strategy: Mimicking Industrial "Blind" Assembly

2. Cartesian Planning: Enhancing Cobot Safety

3. Software-Compensated Hardware (Calibration)

3. Implementation

Hardware Setup

CAD Models

Mechanical Architecture

Software A. Perception: The 'Calibration' Node

Software B. Motion Planning: The 'Move' Node

1. The Control Loop (FSM)

2. Cartesian Path Planning

3. The "Shift"

Software C. Object Detection: The 'CV' Node

Software D. Force Control: The 'SpiralSearch' Node

The Algorithm

4. Results

Quantitative Performance

Video Demonstration

5. Conclusion

Challenges & Improvements

Meet Our Team

Gabriel Han

Larry Hui

Erik Ma

7. Additional Materials