Autonomous USB-C Insertion

A robust robotic manipulation strategy combining Computer Vision and Force Feedback using the Universal Robots UR7e.

Watch the Demo

1. Introduction

The Problem

In the field of industrial automation, the "Peg-in-Hole" problem is a classic benchmark for manipulation capability. While humans perform this task using intuition and high-fidelity tactile feedback, robots struggle with the tight tolerances involved. Our project focuses specifically on inserting a USB-C connector—a task that requires sub-millimeter precision (±0.5mm) and delicate force control to avoid damaging the small, reversible port.

Our Solution

We developed a hybrid approach combining Eye-in-Hand visual servoing for coarse alignment with a force-feedback spiral search algorithm for fine insertion. The system uses a monocular camera mounted on the robot's end-effector to locate the USB-C port, then transitions to a "blind" tactile search phase that compensates for calibration errors and ensures safe, successful insertion.

The Impact

By solving this precision manipulation challenge with low-cost sensors (a webcam and the UR7e's built in force sensor), we demonstrate that sophisticated insertion tasks don't require expensive force/torque sensors or sub-millimeter positioning systems. This approach can be adapted for automated testing, electronics assembly, and any application requiring reliable connector mating in industrial environments.

Project Goals

The primary objective was to develop a "blind-capable" insertion system. We aimed to build a robotic system that could:

  1. Visually identify a USB-C port on a flat surface using a monocular camera.
  2. Align a robot arm (UR7e) with the target using visual servoing.
  3. Execute a tactile "spiral search" maneuver to compensate for visual calibration errors.
  4. Insert the connector without exceeding a safety force threshold of 10N.

Why is this interesting?

Standard industrial robots rely on perfect calibration and fixed fixtures. If a part moves by 2mm, a standard robot will crash. Our project solves the Unstructured Environment problem. By combining vision (for "getting close") and force (for "getting it right"), we created a system that works even if the charging port is moved or placed under different lighting conditions.

Real-World Applications

2. Design

Our approach to creating a precision USB-C insertion system

Design Criteria

To achieve a successful insertion, our system had to meet the following criteria:

Final Design:

Our final design is in 4 stages:

  1. Calibration: First move the robot to 18 unique calibration positions and measure the chessboard pose. Using the chessboard pose in camera's frame and end effector positions (joint states), calculate the end effector to camera transform.
  2. Blind Shift: Move the end-effector down to a pre-align height and align the camera directly with the chessboard. That way, as long as the USB-C is aligned with the chessboard, we will know how to insert.
  3. Visual Shift and Dive: Use the camera to detect the USB-C port and iteratively align the camera (not end-effector) above it, then move down to the insertion point.
  4. Force-Guided Spiral Search: Transform so that the end effector (instead of camera) is aligned with the port. Engage force mode and perform a spiral search pattern until the port is found, then insert.

Design Trade-offs

We originally chose to assume the transform between the camera and end effector was fixed. That way, we can align the camera and move straight down to insert into the port. However, we eventually realized even the slightest error in camera mounting would cause significant misalignment at the tool tip. Thus, we used camera calibration to reduce error in each step.

In order to achieve higher accuracy, we revised our scope to not picking up any cable for the rigidity of a fixed tool. This reduced the complexity of cable detection as well as cable pick up, allowing us to focus on the insertion logic and execution. Ultimately, we pivoted a decent amount from our initial proposal (due to our conversation with our assigned TA) to prioritize robust core functionality of the USB-C twist of the traditional peg insertion problem. In terms of project specific alterations, we also switched from joint-space planning to Cartesian path planning to ensure the camera remained perpendicular to the workspace.

Impact of Design Choices: Real-World Implications

Our system was designed not just for a lab demo, but to model how robust assembly lines function under uncertainty. We focused on bridging the gap between "computer vision" and "mechanical reality."

1. Hybrid Strategy: Mimicking Industrial "Blind" Assembly

Design Choice: We handed off the "last millimeter" of insertion to a tactile force-guided spiral search rather than relying on perfect vision.

Real-World Value: In factory environments, lighting is inconsistent and metallic parts (like USB-C ports) cause specular reflections that blind depth sensors. By decoupling success from perfect visibility, our approach mirrors cost-effective industrial automation where "compliance" (tactile sensing) is prioritized over expensive, infinite-resolution cameras.

2. Cartesian Planning: Enhancing Cobot Safety

Design Choice: We enforced strictly linear Cartesian paths rather than the curved, energy-efficient arcs of joint-space interpolation.

Real-World Value: In collaborative workspaces (Cobots), predictability is safety. Operators can intuitively anticipate a linear path, whereas joint-space arcs can swing unpredictably into human zones. Linearity also ensures sensors remain perfectly aligned with the workspace, eliminating the "perspective drift" common in dynamic environments.

3. Software-Compensated Hardware (Calibration)

Design Choice: We utilized a custom 18-point hand-eye calibration routine to solve for physical offsets, rather than relying on fixed CAD assumptions.

Real-World Value: This approach drastically lowers the "barrier to entry" for hardware. It demonstrates that high-precision tasks do not require expensive, perfect manufacturing. By using software to mathematically correct for physical imperfections (like loose mounts or 3D printing tolerances), manufacturers can achieve sub-millimeter accuracy using cheaper, off-the-shelf components.

3. Implementation

Hardware Setup

End Effector Closeup

CAD Models

End Effector with USB-C Attachment

End Effector with USB-C Attachment

Baseplate with Bigger Holes and Tolerance USB-C

Baseplate with Bigger Holes & Tolerance USB-C

To Spec USB-C Baseplate

To Spec USB-C Baseplate

Mechanical Architecture

We opted for a Fixed-Tool, Eye-in-Hand configuration.

Flowchart Diagram Files

Software A. Perception: The 'Calibration' Node

The core perception logic relies on a robust OpenCV pipeline encapsulated within the CalibrationNode. We utilize cv2.findChessboardCorners to identify the grid pattern in the raw video feed, followed immediately by cv2.cornerSubPix to refine these detections to sub-pixel accuracy. This refinement step is crucial for high-precision insertion tasks, as it minimizes quantization error before any geometric calculations occur. Once the 2D pixel coordinates are locked, cv2.solvePnP matches them against the known 3D geometry of the chessboard (obj_points). This algorithm solves the Perspective-n-Point problem, outputting the rotation and translation vectors that define exactly where the chessboard sits relative to the camera's optical center.

To make this spatial data actionable for the robot, we perform a rigorous coordinate transformation sequence. The raw output from the vision pipeline is in the camera's optical frame, which is distinct from the robot's coordinate system. The code converts the rotation vectors into quaternions using scipy.spatial.transform and leverages the ROS 2 tf2 buffer to transform the pose from the camera_link frame to the base_link frame. This ensures that the detection is not just relative to the sensor, but rooted in the robot's fixed base frame, allowing for consistent motion planning regardless of how the camera is mounted or moved.

Finally, the node functions as a managed state machine rather than a continuous stream, ensuring system stability. It utilizes a ROS service (/calibrate_height) to trigger detection only when the robot is static, preventing motion blur from corrupting the measurement. The integration of the PinholeCameraModel validates the intrinsics (focal length and principal point), allowing us to project rays from 2D pixels (u, v) into 3D space. Visual debugging tools embedded in the code—drawing the principal point and board center—allow operators to visually verify that the physical camera matches the mathematical model before calibration data is published.

Files

Pseudocode:
def process_frame(image):
    # 1. Detection: Find grid pattern in 2D image
    corners_2d = cv2.findChessboardCorners(image)
    
    if found:
        # 2. Refinement: Improve accuracy to sub-pixel level
        refined_corners = cv2.cornerSubPix(corners_2d)
        
        # 3. PnP Solver: Match 2D pixels to known 3D geometry
        # Returns position/rotation of board relative to Camera
        pose_in_camera = cv2.solvePnP(refined_corners, known_3d_points)
        
        # 4. Transformation: Convert to Robot Base Frame
        # Crucial for motion planning
        pose_in_base = tf_buffer.transform(pose_in_camera, 
                                          from="camera_link", 
                                          to="base_link")
        
        publish(pose_in_base)

Software B. Motion Planning: The 'Move' Node

This node acts as the system's "Brain," using a Finite State Machine to command the UR7e via MoveIt. It orchestrates the transition from visual alignment to tactile insertion.

1. The Control Loop (FSM)

2. Cartesian Path Planning

We prioritize Cartesian Path Planning over joint interpolation. By enforcing strictly linear motion, we prevent the arc-induced "perspective drift" common in joint moves. This ensures the camera remains perpendicular to the target throughout the descent, validating our coordinate transforms.

3. The "Shift"

The SHIFT state handles the critical hand-eye coordination. The code calculates the real-time inverse transform between camera_link and tool0, determining exactly how to position the gripper where the camera lens just was. This maneuver relies entirely on the accuracy of our calibration node, as the target becomes occluded by the robot itself.

Pseudocode:
state = "ALIGN"

while node_is_running:
    if state == "ALIGN":
        # Closed-loop Visual Servoing
        error_vector = get_usb_c_position()
        if magnitude(error_vector) < 1.0mm:
            state = "DIVE"
        else:
            # Move linearly to correct error
            move_cartesian(dx=error_vector.x, dy=error_vector.y)

    elif state == "DIVE":
        # Move straight down 12cm, maintaining X/Y lock
        move_cartesian(dz=-0.12)
        state = "SHIFT"

    elif state == "SHIFT":
        # The "Blind" Hand-Eye Swap
        # Calculate physical offset between camera lens and gripper tip
        offset = get_transform(from="camera_link", to="tool0")
        move_cartesian(offset) 
        state = "SPIRAL"

    elif state == "SPIRAL":
        # Hand off control to force feedback system
        trigger_service("/spiral_search")
        stop_cartesian_control()

Software C. Object Detection: The 'CV' Node

While the Calibration Node handles static alignment, the CV node is responsible for real-time target detection. Because the robot may approach the USB-C port from varying heights (changing the apparent size of the port), standard template matching is insufficient.

To solve this, we implemented a Multi-Scale Template Matching pipeline. The algorithm iteratively resizes the template across a defined range (0.1% to 1.0% of the image area) to ensure scale invariance. We utilize Normalized Cross-Correlation (NCC) (cv2.TM_CCOEFF_NORMED) as our matching metric, which makes the system robust to global lighting changes—a critical feature for detecting metallic ports that reflect light unpredictably.

Correlation Heatmap

Figure C.1: Correlation Heatmap. Red areas indicate high probability matches (NCC > 0.8), while blue areas indicate low correlation.

Finally, to prevent false positives from cluttering the data, we apply Non-Maximum Suppression (NMS). This filters out overlapping bounding boxes, ensuring that only the single strongest detection candidate is published to the /usbc_pose topic.

Pseudocode:
def find_usbc_port(image, template):
    all_matches = []

    # 1. Multi-Scale Search
    # Iterate scale from 0.05% to 1.5% of image size
    for scale in linspace(min_scale, max_scale, 20):
        
        # Resize template to current scale
        resized_template = cv2.resize(template, scale)
        
        # 2. Normalized Cross-Correlation
        # Robust against brightness differences
        heatmap = cv2.matchTemplate(image, resized_template, TM_CCOEFF_NORMED)
        
        # 3. Thresholding
        locations = where(heatmap > 0.8)
        all_matches.append(locations, score)

    # 4. Non-Maximum Suppression (NMS)
    # Remove overlapping boxes, keep only the highest score
    best_match = non_max_suppression(all_matches, overlap_thresh=0.3)

    return best_match.center_x, best_match.center_y

Software D. Force Control: The 'SpiralSearch' Node

Standard ROS topics (and our attempted use of a load cell) have too much latency (50-100ms) for force insertion. We solved this by injecting URScript directly into the controller to toggle force mode on the UR7e.

The Algorithm

Pseudocode:
# Snippet of our URScript Injection
def _perform_spiral_search(self):
    ur_script = """
    force_mode(p[0,0,0,0,0,0], [0,0,1,0,0,0], [0,0,-5.0,0,0,0], 2, ...)
    while step_count < max_steps:
        # Move in expanding spiral (0.5mm steps)
        # Check Z-height
        if drop > 0.003: # 3mm drop detected
            textmsg("HOLE FOUND!")
            break
    """
    self.script_publisher.publish(ur_script)

We put all of our services, MoveIt, camera initialization into launch files for easier management and reproducibility.

4. Results

Quantitative Performance

Alignment Precision: The visual servoing consistently brought the tool tip within ±1mm of the port center.

Insertion Success: In final trials, the system achieved an 80% success rate with <5 mm localization error pre-insertion. Failures were typically due to calibration drift causing the spiral search to start too far from the hole.

Video Demonstration

Detail: The "Wiggle" (Spiral Search) in action.

Debug: Computer Vision detecting the board.

5. Conclusion

We successfully engineered a robotic system capable of sub-millimeter insertion without expensive hardware. The project demonstrated that Hybrid Control (Vision + Force) is far more reliable than Vision alone. The visual system gets us "in the ballpark," but the tactile system "scores the run." Overall, we are quite content with our work as we ended up successfully accomplishing our intended target outcomes achieving a 80% success rate with <5mm localization error given our adjusted project (focusing only on USB-C insertion) while leaving extensive room for taking the project to the next level going forward.

Challenges & Improvements

Challenges:

Future Work:

Meet Our Team

Member 1

Gabriel Han

Role: Computer Vision & Control

Gabriel is a senior in EECS interested in deep learning and computer vision. He developed the OpenCV pipeline for chessboard calibration, port detection, and general movement.

Member 2

Larry Hui

Role: Hardware & Force Mode

Larry is a senior in MechE interested in optimization, optimal control theory, and multi-agent systems. He designed the all of the project's hardware including the 3D printed end-effector and pioneered the force-related components of PAIR.

Member 3

Erik Ma

Role: Search & Transforms

Erik is a senior in EECS interested in robotics, particularly in perception and sensing applications. He primarily handled the spiral search injection script as well as the coordinate transforms.

Our team worked collaboratively through the entirety of the project, and although each member took the lead of different components of the project, everyone was involved in every aspect of the project as a whole.

7. Additional Materials

Below are links to our source code and CAD files.

Please click the following links to view launch files, CAD files, etc.