倒立摆毕业设计-翻译

发布于:2021-10-21 16:57:33

Multi-Agent Quadrotor Testbed Control Design:
Integral Sliding Mode vs. Reinforcement Learning
Steven L. Waslander, Gabriel M. Hoffmann Ph.D. Candidate Aeronautics and Astronautics Stanford University
{stevenw, gabeh}@stanford.edu Jung Soon Jang Research Associate Aeronautics and Astronautics Stanford
University jsjang@stanford.edu Claire J. Tomlin Associate Professor Aeronautics and Astronautics
Stanford University tomlin@stanford.edu Abstract—The Stanford Testbed of Autonomous Rotorcraft for Multi-Agent Control (STARMAC) is a multi-vehicle testbed currently comprised of two quadrotors, also called X4-?yers, with capacity for eight. This paper presents a comparison of control design techniques, speci?cally for outdoor altitude control, in and above ground effect, that accommodate the unique dynamics of the aircraft. Due to the complex air?ow in- duced by the four interacting rotors, classical linear techniques failed to provide suf?cient stability. Integral Sliding Mode and Reinforcement Learning control are presented as two design techniques for accommodating the nonlinear disturbances. The methods both result in greatly improved performance over classical control techniques.
I. INTRODUCTION As ?rst introduced by the authors in [1],the Stanford Testbed of Autonomous Rotorcraft for Multi-Agent Control(STARMAC) is an aerial platform intended to validate novel multi-vehicle control techniques and present real-world problems for further investigation.The base vehicle for STARMAC is a four rotor aircraft

with ?xed pitch blades, referred to as a quadrotor, or an X4-?yer.They are capable of 15 minute outdoor ?ights in a 100m square area[1].
Fig. 1. One of the STARMAC quadrotors in action.
There have been numerous projects involving quadrotors to date,with the ?rst known hover occurring in October,1922[2]. Recent interest in the quadrotor concept has been sparked by commercial remote control versions, such as the DraganFlyer IV[3]. Many groups [4]–[7]have seen significant success in developing autonomous quadrotor vehicles. To date,however,STARMAC is the only operational multi-vehicle quadrotor platform capable of autonomous outdoor ?ight, without tethers or motion guides.
The ?rst major milestone for STARMAC was autonomous hover control,with closed loop control of attitude, altitude and position. Using inertial sensing, the attitude of the aircraft is simple to control, by applying small variations in the relative speeds of the blades. In fact, standard integral LQR techniques were applied to provide reliable attitude stability and tracking for the vehicle.Position control was also achieved with an integral LQR, with careful design in order to ensure spectral separation of the successive loops.

Unfortunately, altitude control proves less straightforward. There are many factors that affect the altitude loop specifically that do not amend themselves to classical control techniques. Foremost is the highly nonlinear and destabilizing effect of four rotor downwashes interacting. In our experience, this effect becomes critical when motion is not damped by motion guides or tethers. Empirical observation during manual ?ight revealed a noticeable loss in thrust upon descent through the highly turbulent ?ow ?eld.Similar aerodynamic phenomenon for helicopters have been studied extensively[8], but not for the quadrotor, due to its relative obscurity and complexity. Other factors that introduce disturbances into the altitude control loop include blade ?ex, ground effect and battery discharge dynamics. Although these effects are also present in generating attitude controlling moments, the differential nature of the control input eliminates much of the absolute thrust disturbances that complicate altitude control. Additional complications arise from the limited choice in low cost, high resolution altitude sensors. An ultrasonic ranging device[9] was used, which suffers from non-Gaussian noise-false echoes and dropouts. The resulting raw data stream includes spikes and echoes that are dif?cult to mitigate, and most successfully handled by rejection of infeasible measurements prior to Kalman ?ltering.
In order to accommodate this combination of noise and disturbances, two distinct approaches are adopted. Integral Sliding Mode (ISM) control[10]–[12] takes the approach that the disturbances

cannot be modeled, and instead designsa control law that is guaranteed to be robust to disturbances as long as they do not exceed a certain magnitude. Model-based reinforcement learning[13] creates a dynamic model based on recorded inputs and responses, without any knowledge of the underlying dynamics, and then seeks an optimal control law using an optimization technique based on the learned model. This paper presents an exposition of both methods and contrasts the techniques from both a design and implementation point of view.
II. SYSTEM DESCRIPTION
STARMAC consists of a ?eet of quadrotors and a ground station. The system communicates over a Bluetooth Class 1 network. The core of the aircraft are microcontroller circuit boards designed and assembled at Stanford, for this project. The microcontrollers run real-time control code, interface with sensors and the ground station, and supervise the system.
The aircraft are capable of sensing position, attitude, and proximity to the ground. The differential GPS receiver is theTrimble Lassen LP, operating on the L1 band, providing 1Hz updates. The IMU is the MicroStrain 3DM-G, a low cost, light weight IMU that delivers 76 Hz attitude, attitude rate, and acceleration readings. The distance from the ground is found using ultrasonic ranging at 12 Hz.
The ground station consists of a laptop computer, to interface with the aircraft, and a GPS receiver, to provide differential corrections. It also has a battery charger, and joysticks for

control-augmented manual ?ight, when desired. III. QUADROTOR DYNAMICS
The derivation of the nonlinear dynamics is performed in North-East-Down (NED) inertial and body ?xed coordinates. Let {eN , eE , eD } denote the inertial axes, and {xB , yB , zB } denote the body axes, as de?ned in Figure 2. Euler angles of the body axes are {φ, θ, ψ} with respect to the eN , eE and eD axes, respectively, and are referred to as roll, pitch andyaw. Let r be de?ned as the position vector from the inertial origin to the vehicle center of gravity (CG), and let ωB be de?ned as the angular velocity in the body frame. The current velocity direction is referred to as ev in inertial coordinates.

Fig.2. Free body diagram of a quadrotor aircraft.

The rotors, numbered 1?4, are mounted outboard on the

xB,yB,?xB and -yB axes,respectively, with position vectors ri with

respect to the CG. Each rotor produces an aerodynamic torque, Qi ,

and thrust, Ti , both parallel to the rotor’s axis of rotation, and both

used for vehicle control.Here,

Ti

?

ui kt 1? 0.1s

,

where

ui

is

the

voltage

applied to the motors, as determined from a load cell test. In ?ight, Ti

can vary greatly from this approximation. The torques, Qi , are

proportional to the rotor thrust, and are given by Qi = kr*Ti .

Rotors 1 and 3 rotate in the opposite direction as rotors 2 and 4, so

that counteracting aerodynamic torques can be used independently

for yaw control. Horizontal velocity results in a moment on the rotors,

Ri , about ?ev , and a drag force, Di , in the direction, ?ev.

The body drag force is de?ned as DB , vehicle mass is m, acceleration due to gravity is g, and the inertia matrix is I ∈ R3×3 . A free body diagram is depicted in Figure 2. The total force, F,

and moment, M, can be summed as,

4
F ? ?DBeV ? mgeD ? i??1(?Ti zB ? DieV ) (1)
4
M ? i??1(Qi zB ? RieV ? Di (ri *eV ) ? Ti (ri * zB )) (2)
The full nonlinear dynamics can be described as,
??
mr ?F

?
I ωB ? ωB * IωB ? M

(3)

where the total angular momentum of the rotors is assumed to be

near zero, because they are counter-rotating. Near hover conditions,

the contributions by rolling moment and drag can be neglected in
4
Equations (1) and (2). De?ne the total thrust as T ? ? Ti The i ?1
translational motion is de?ned by,

??
m r ? F ? ?Rψ * R θ * R φTZB ? mgeD

(4)

Where Rφ,Rθ, and Rψ are the rotation matrices for roll, pitch,

and yaw, respectively. Applying the small angle approximation to the

rotation matrices,

??r?x?

? ?

?1

?

??? 0 ? ? 0 ?

m??r?y?

? ?

?

???

1

?

?? ??

0

? ?

?

? ?

0

? ?

? ??

??
rz

? ??

???

-?

1????? T ?? ??mg??

(5)

Finally, assuming total thrust approximately counteracts

?
gravityT ? T ? mg , except in the eD axis.

? ?

??
rx

m??r?y?

? ??

??
rz

? ? ? ? ? ??

?

?0

? ?

0

??mg

? ? ? ??

?

? ?

0 ?

?? T

?

?? 0

?
?T
0
0

100???????????m00g

? ? ? ??

(6)

For small angular velocities, the Euler angle accelerations are

determined from Equation (3) by dropping the second order

term,ω×Iω, and expanding the thrust into its four constituents. The

angular equations become,

? ?

I

x

??
?

? ?

?0

? ?

I

y

??
?

? ?

?

? ?

l

? ??

I

z

??
?

? ??

??K?

l 0 ? K?

0 ?l K?

?l 0 ? K?

? ? ? ??

?T1 ??T2 ???TT43

? ? ? ? ? ?

(7)

Where the moment arm lengthl=||ri×zB||is identical for all rotors

due to symmetry. The resulting linear models can now be used for

control design.

IV. ESTIMATION AND CONTROL DESIGN

Applying the concept of spectral separation, inner loop control of

attitude and altitude is performed by commanding motor voltages,

and outer loop position control is performed by commanding attitude

requests for the inner loop. Accurate attitude control of the plant in

Equation (7) is achieved with an integral LQR controller design to

account for thrust biases.

Position estimation is performed using a navigation ?lter that

combines horizontal position and velocity information from GPS,

vertical position and estimated velocity information from the

ultrasonic ranger, and acceleration and angular rates from the IMU in

a Kalman ?lter that includes bias estimates. Integral LQR techniques

are applied to the horizontal components of the linear position plant

described in Equation (6). The resulting hover performance is shown

in Figure 6.

As described above, altitude control suffers exceedingly from

unmodeled dynamics. In fact, manual command of the throttle for

altitude control remains a challenge for the authors to this day. Additional complications arise from the ultrasonic ranging sensor, which has frequent erroneous readings, as seen in Figure 3. To alleviate the effect of this noise, rejection of infeasible measurements is used to remove much of the non-Gaussian noise component. This is followed by altitude and altitude rate estimation by Kalman ?ltering, which adds lag to the estimate. This section proceeds with a derivation of two control techniques that can be used to overcome the unmodeled dynamics and the remaining noise.

Fig. 3. Characteristic unprocessed ultrasonic ranging data, displaying spikes, false echoes and dropouts. Powered ?ight
commences at 185 seconds.
A. Integral Sliding Mode Control A linear approximation to the altitude error dynamics of a quadrotor aircraft in hover is given by,

?
x1 ? x2

?
x2 ? u ? ? (g, x)

(8)

where{x1, x2}={(rz,des?rz),( rz,des?r˙z)}are the altitude error

? states, u ?

u 4
i?1 i

ui

is

the

control

input,

andξ(·)

is

a

bounded

model of disturbances and dynamic uncertainty. It is assumed that ξ(·)

satis?es ||ξ||≤γ where γ is the upper bounded norm of ξ(·).

In early attempts to stabilize this system, it was observed that

LQR control was not able to address the instability and performance

degradation due to ξ(g, x). Sliding Mode Control (SMC) was adapted

to provide a systematic approach to the problem of maintaining

stability and consistent performance in the face of modeling

imprecision and disturbances. However, until the system dynamics

reach the sliding mani-fold, such nice properties of SMC are not

assured. In order to provide robust control throughout the ?ight

envelope, the Integral Sliding Mode (ISM) technique is applied. The

ISM control is designed in two parts. First, a standard successive loop

closure is applied to the linear plant. Second, integral sliding mode

techniques are applied to guarantee disturbance rejection. Let

u ? up ? ud

up ? ?K p x1 ? Kd x2

(9)

Where Kp and Kd are proportional and derivative loop gains that

stabilize the linear dynamics without disturbances. For disturbance

rejection, a sliding surface,s, is designed,

s ? s0 (x1, x2 ) ? z

s0 ? ? (x2 ? kx1)

(10)

such that state trajectories are forced towards the manifold s= 0. Here,s0 is a conventional sliding mode design, Z is an additional term that enables integral control to be included, and α, k∈R are positive constants. Based on the following Lyapunov function candidate,
V ? 1 s2 , the control component,ud, can be determined such that V
2
<0, guranteeing convergence to the sliding manifold.

?

?

?

?

?

V ? s s ? s[? (x2 ? k x1) ? z]

?

s[? (up

?

ud

??(g,

x)

?

kx2 )

?

?
z]

?

0 (11)

The above condition holds if z = ?α(up+kx2) and ud can be

guaranteed to satisfy,

s[ud ? ? (g, x)] ? 0,? ? 0

(12)

Since the disturbances,ξ(g, x), are bounded by γ, de?ne ud to be

ud=?λs with λ∈R. Equation (11) becomes,

?
V

?

s?? (??s

? ? (g, x))?

? ? ? ? ? ? s 2 ? ? s ? 0

(13)

and it can be seen that λ|s| ?γ >0. As a result, for up and ud as above,

the sliding mode condition holds when,

s

?

? ?

(14)

With the input derived above, the dynamics are guaranteed to

evolve such that s decays to within the boundary layer, ? , of the ?

sliding manifold. Additionally, the system does not suffer from

input chatter as conventional sliding mode controllers do, as the control law does not include a switching function along the sliding mode.
V. REINFORCEMENT LEARNING CONTROL An alternate approach is to implement a reinforcement learning controller. Much work has been done on continuous state-action space reinforcement learning methods[13], [14]. For this work, a nonlinear,nonparametric model of the system is ?rst constructed using ?ight data, approximating the system as a stochastic Markov process[15], [16]. Then a model-based reinforcement learning algorithm uses the model in policy-iteration to search for an optimal control policy that can be implemented on the embedded microprocessors. In order to model the aircraft dynamics as a stochastic Markov process, a Locally Weighted Linear Regression (LWLR) approach is used to map the current state,S(t)∈Rns, and input,u(t)∈Rnu, onto the subsequent state estimate,S(t+ 1).

In this application, S ? ???rz

?
rz

??
rz

V

? ??

,where

V

is

the

battery

level. In the altitude loop, the input,u∈R, is the total motor power,u. The subsequent state mapping is the summation of the traditional LWLR estimate, using the current state and input, with the random vector,v∈Rns, representing unmodeled noise. The value for v is drawn from the distribution of output error as determined by using a maximum likelihood estimate[16] of the Gaussian noise in the LWLR estimate. Although the true distribution is not perfectly Gaussian, this

model is found to be adequate. The LWLR method[17] is well suited to this problem, as it ?ts a
non-parametric curve to the local structure of the data. The scheme extends least squares by assigning weights to each training data point according to its proximity to the input value, for which the output is to be computed. The technique requires a sizable set of training data in order to re?ect the full dynamics of the system, which is captured from ?ights ?own under both automatic and manually controlled thrust, with the attitude states under automatic control.
For m training data points, the input training samples are stored in X∈R(m)×(ns+nu+1), and the outputs corresponding to those inputs are stored inY∈Rm×ns. These matrices are de?ned as

X

?1 ? ???

S (t1)T ?

??1 S(tm )T

u (t1 )T ?

? ? ?

Y
,

?

? ? ?

S (t1 ?1)T ?

? ? ?

u(tm )T ??

??S(tm ?1)T ??

(15)

The column of ones in X enables the inclusion of a constant offset in the solution, as used in linear regression.The diagonal weighting matrix W ∈ Rm×m , which acts on X , has one diagonal entry for each training data point. That entry gives more weight to training data points that are close to the S(t) and u(t) for which S? (t + 1) is to be computed.
The distance measure used in this work is

Wi ,i

? exp??? ? ?

xi ? x 2? 2

?? ? ?

(16)

? ? Where x(i) is the ith row of X, x is the vector 1 S (t)T u(t)T ,

and ?t parameter τ is used to adjust the range of in?uence of training points. The value for τ can be tuned by cross validation to prevent over- or under-?tting the data. Note that it may be necessary to scale the columns before taking the Euclidean norm to prevent undue in?uence of one state on the W matrix.
The subsequent state estimate is computed by summing the LWLR estimate with v,

^
S(t ?1) ? ( X TWX )?1 X TW T x ? v (17)

Because W is a continuous function of x and X, as x is varied, the resulting estimate is a continuous non-parametric curve capturing the local structure of the data. The matrix computations, in code, exploit the large diagonal matrix W; as each Wi,i is computed, it is multiplied by row x(i), and stored in W X.
The matrix being inverted is poorly conditioned, because weakly related data points have little in?uence, so their contribution cannot be accurately numerically inverted. To more accurately compute the numerical inversion, one can perform a singular value decomposition, (XTW X) =UΣVT. Then, numerical error during inversion can be

avoided by using the n singular values σi with values of

? max ?i

?

Cm ax ,

where the value of Cmax is chosen by cross validation. In this work,Cmax ≈10 was found to minimize numerical error, and was

typically satis?ed by n= 1. The inverse can be directly computed using the n upper singular values in the diagonal matrixΣn∈Rn×n, and the corresponding singular vectors, in Un∈Rm×n and Vn∈Rm×n. Thus,

the stochastic Markov model becomes

? ^
S (t ?1) ? Vn

?1U
n

T n

X

TW

T

x

?

v

(18)

Next, model-based reinforcement learning is implemented,

incorporating the stochastic Markov model, to design a controller. A

quadratic reward function is used,

?
R(S , Sref ) ? ?c1(rz ? rz,ref )2 ? c2 rz2

(19)

whereR:R2ns→R,C1>0 and C2>0 are constants giving reward for accurate tracking and good damping respectively, and

Sref ? ???rz,ref

?
rz ,re f

??
rz , re f

Vz ,re f

? ??

is the reference

state desired

for the system.

The control policy maps the observed state S onto the input

Command u. In this work, the state space has the constraint of rz ≥0,

and the input command has the constraint of 0≤u≤ u max. The control policy is chosen to be

?

??

? (S, w) ? w1 ? w2 (rz ? rz,ref ) ? w3 rz ? w4 rz (20)

Where w∈R nc is the vector of policy coef?cients w1, . . . , wnc. Linear functions were suf?cient to achieve good stability and

performance. Additional terms, such as battery level and integral of

altitude error, could be included to make the policy more resilient to

differing ?ight conditions. Policy iteration is performed as explained in Algorithm 1. The algorithm aims to ?nd the value of w that yields

the greatest total reward R total, as determined by simulating the system over a ?nite horizon from a set of random initial conditions,

and summing the values of R(S,S ref)at each state encountered.

Algorithm 1 Model-Based Reinforcement Learning

1: Generate set S0 of random initial states

2: Generate set T of random reference trajectories

3: Initialize w to reasonable values

4:R best← ?∞,W best←w

5: repeat

6: Rtotal←0

7: for s0∈S0, t∈T do

8: S(0)←s0

9: for t= 0 to tmax?1 do

10:

u(t)←π(S(t) , w)

11:

S(t+ 1)←LWL( R(S(t) , u(t) ) +v

12:

R total←R total+R(S(t+ 1))

13: end for

14: end for

15: if R total> R best then

16:

Rbest←Rtotal,wbest←w

17: end if

18: Add Gaussian random vector to w best, store as w

19: until w best converges

In policy iteration, a ?xed set of random initial conditions and

reference trajectories are used to simulate ?ights at each iteration,

with a given policy parameterized by w. It is necessary to use the

same random set at each iteration in order for convergence to be

possible[15]. After each iteration, the new value of w is stored as w best

if it outperforms the previous best policy, as determined by

comparing Rtotal to Rbest, the previous best reward encountered.

Then, a Gaussian random vector is added to wbest. The result is

stored as w, and the simulation is performed again. This is iterated

until the value of wbest remains ?xed for an appropriate number of

iterations, as determined by the particular application. The simulation

results must be examined to predict the likely performance of the

resulting control policy.

By using a Gaussian update rule for the policy weights,w, it is

possible to escape local maxima of Rtotal. The highest probability

steps are small, and result in re?nement of a solution near a local

maximum of Rtotal. However, if the algorithm is not at the global

maximum, and is allowed to continue, there exists a ?nite probability

that a suf?ciently large Gaussian step will be performed such that the algorithm can keep ascending.
VI. FLIGHT TEST RESULTS A. Integral Sliding Mode
The results of an outdoor ?ight test with ISM control can be seen in Figure 4. The response time is on the order of 1-2 seconds, with 5 seconds settling time, and little to no steady state offset. Also, an oscillatory character can be seen in the response, which is most likely being triggered by the nonlinear aerodynamic effects and sensor data spikes described earlier.
Fig. 4. Integral sliding mode step response in outdoor ?ight test.
Compared to linear control design techniques implemented on the aircraft, the ISM control proves a signi?cant enhancement. By explicitly incorporating bounds on the unknown disturbance forces in the derivation of the control law, it is possible to maintain stable altitude on a system that has evaded standard approaches. B. Reinforcement Learning Control
One of the most exciting aspects of RL control design is its ease

of implementation. The policy iteration algorithm arrived at the implemented control law after only 3 hours on a Pentium IV computer. Figure 5 presents ?ight test results for the controller. The high ?delity model of the system, used for RL control design, provides a useful tool for comparison of the RL control law with other controllers. In fact, in simulation with linear controllers that proved unstable on the quadrotor, ?ight paths with growing oscillations were predicted that closely matched real ?ight data.
The locally weighted linear regression model showed many relations that were not re?ected in the linear model, but that re?ect the physics of the system well. For instance, with all other states held ?xed, an upward velocity results in more acceleration at the subsequent time step for a throttle level, and a downward velocity yields the opposite effect. This is essentially negative damping. The model also shows a strong ground effect. That is, with all other states held ?xed, the closer the vehicle is to the ground, the more acceleration it will have at the subsequent time step for a given throttle level.

Fig. 5. Reinforcement learning controller response to manually applied step input, in outdoor ?ight test. Spikes in
state estimates are from sensor noise passing through the Kalman ?lter.
The reinforcement learning control law is susceptible to system disturbances for which it is not trained. In particular, varying battery levels and blade degradation may cause a reduction in stability or steady state offset. Addition of an integral error term to the control policy may prove an effective means of mitigating steady state disturbances, as was seen in the ISM control law. Comparison of the step response for ISM and RL control reveals both stable performance and similar response times, although the transient dynamics of the ISM control are more pronounced. RL does, however, have the advantage that it incorporates accelerometer measurement into its control, and as such uses a more direct measurement of the disturbances imposed on the aircraft. C. Autonomous Hover
Applying ISM altitude control and integral LQR position control

techniques, ?ight tests were performed to achieve the goal of autonomous hover. Position response was maintained within a 3 m circle for the duration of a two minute ?ight (see Figure 6), which is well within the expected error bound for the L1 band differential GPS used.
Fig. 6. Autonomous hover ?ight recorded position, with 3m error circle.
VII. CONCLUSION This paper summarizes the development of an autonomous quadrotor capable of extended outdoor trajectory tracking control. This is the ?rst demonstration of such capabilities on a quadrotor known to the authors, and represents a critical step in developing a novel, easy to use, multi-vehicle testbed for validation of multi-agent control strategies for autonomous aerial robots. Speci?cally, two design approaches were presented for the altitude control loop, which proved a challenging hurdle. Both techniques resulted in stable controllers with similar response times, and were a signi?cant improvement over linear controllers that failed to stabilize the system

adequately. Acknowledgments
The authors would like to thank Dev Gorur Rajnarayan and David Dostal for their many contributions to STARMAC development and testing, as well as Prof. Andrew Ng of Stanford University for his advice and guidance in developing the Reinforcement Learning control.

Multi-Agent 旋翼试验台控制系统设计: 积分滑模与强化学*
Steven L. Waslander,Gabriel M. Hoffmann 斯坦福大学航空航天博士 候选人 {stevenw, gabeh}@stanford.edu
荣顺长 斯坦福大学航空航天 副研究员 jsjang@stanford.edu Claire J. Tomlin 斯坦福大学航空航天 副教授 tomlin@stanford.edu
摘要:斯坦福大学 Multi-Agent 控制自主旋翼试验台(STARMAC) 是一种多飞行器试验台,目前包括两个旋翼,也被称为4轴飞行 器,有8轴能力。本文提出了控制设计算法比较,专为户外高度 控制,在地面效应及以上,可提供飞机独特的动态。由于四个相 互作用旋翼引起的复杂气流,经典的线性算法无法提供足够的稳 定性。积分滑模和强化学*控制作为适应非线性干扰的两个设计 算法。两种算法相对于经典控制算法都大大提高了控制性能。 一、引言 作为第一作者[1],STARMAC 是一台旨在验证新型的多飞行器控制 技术和目前现实世界问题作进一步搜索的空中*台。STARMAC 的 基本运载工具是带固定螺距桨片的四旋翼飞机。他们有在100米 的正方形面积15分钟的户外飞行能力[1]。
图1 一个飞行中的 STARMAC 旋翼机
迄今为止已经有许多项目涉及旋翼,已知的首次悬停发生在

1922年10月[2]。最*旋翼概念是由商业远程控制版本引发的关注, 例如 DraganFlyer IV。许多团体已经看到自主旋翼飞行器开发的 重大成功。然而,到今天为止,STARMAC 是唯一的可操作多旋翼 能够自主室外飞行的*台,没有滑轨或系绳。
STARMAC 第一主要里程碑是自主悬停控制,带姿态闭环控制 , 高度和位置。 使用惯性检测,飞行器的高度控制是和简单的, 采用叶片的相对速度的小差异。在事实上,标准积分型 LQR 技术 用来提供飞行器可靠的稳定姿态和跟踪。位置控制也采用一个积 分型 LQR 实现,为了确保连续回路的光谱分离而精心设计。遗憾 的是,高度控制证明是不那么简单的。有许多因素影响的高度回 路,特别是它不修改自己到经典控制技术。最重要的是高度非线 性和4个旋翼气流相互作用的不稳定因素。在我们的实验中,,当 运动在没有阻尼的滑轨或系绳上时,这种影响变得至关重要。在 手动飞行的实证观察发现,当下降通过强烈的湍流流场时会有明 显的推力损失。类似的直升机空气动力现象已被广泛研究[8],由 于其相对默默无闻和复杂性,这些研究不适用于旋翼机。其他引 入高度控制回路干扰的因素,包括叶片弯曲、地面效应和电池放 电动态。
虽然这些影响产生姿态控制的瞬间也存在,控制输入的微分 性质消除大部分使姿态控制变复杂的推力的绝对干扰,其他问题 的产生在选择低成本高分辨率的姿态传感器。用到的超声波测距 装置[9],受到非高斯噪声—虚假回波和漏失(dropout)的影响。 由 由此产生的原始数据流包括尖峰和回波难以缓解,最成功的处理 是在卡尔曼滤波之前拒绝不可能的测量值。
为了适应这种噪音和干扰的组合,采用了两种截然不同的方

法。积分滑模控制(ISM)[10] - [12]采用消除干扰的方法,而不是设计 一个控制法则,因只要干扰不超过一定幅度,保证对他的鲁棒性。 基于模型的强化学* 根 [13] 据记录的输入和响应创建了一个动态 模型,没有任何底层的动态知识,学*模型基础上利用优化技术 寻求一个最优控制规律。本文呈现了两种方法的论述和从设计和 实施的角度的对比。 二、系统描述
STARMAC 由一队旋翼和地面站组成。该系统通过蓝牙1代网络 通信。飞行器的核心是为这个项目设计并在斯坦福大学组装的微 控制器电路板。微控制器运行实时控制代码,使用传感器接口和 地面站及监控系统。
这架飞行器有感应位置、姿态、接*地面的能力。差分 GPS 接收机是 Trimble Lassen LP 型号,L1波段操作,提供1Hz 更新。 惯性测量装置(IMU)是低成本、重量轻的微应变三轴陀螺仪,提 供76赫兹的姿态、姿态率、加速读数。利用12Hz 范围超声波测量 到地面的距离。
地面站由一台笔记本电脑,与飞行器、GPS 接收机接口,提 供差分校正。它也有一个电池充电器,当需要手动飞行扩展控制 时的操纵杆。 三、旋翼动力学
非线性动力学的推导是在东-北-下(NED)惯性系和自身固定 坐标系下进行的。{en,ee,ed}表示惯性轴,{Xb,Yb,Zb}表示 机体轴,如图2中定义。 机体轴的欧拉角{φ,θ,ψ }分别对 应于 en、ee 和 ed 轴,分别被称为横滚、俯仰和偏航。定义 R 为 从惯性原点到飞行器重心(CG)的位置向量,定义ωB 为机身边框

的角速度。当前的速度方向对应惯性坐标系中的 ev。

图2 旋翼飞机的自由受力分析图

转子的,编号为1 - 4,装上舷外编号为1 - 4的旋翼,分别安装

在 Xb、Yb、-Xb 和-Yb 轴的外侧,和相对于重心的位置向量 ri。

每个转子产生都产生气动力矩 Qi,和推力 Ti

两个力都*行于转子的旋转轴,都用于飞行器控制。其中,

Ti

?

ui kt 1? 0.1s



其中 ui 是施加到电机的电压,取决于电池负载测试。

在飞行中,从这种*似得到的 Ti 变化很大。 扭矩 Qi,与转

子推力成正比,Qi= Kr * Ti。转子1和3向相反的方向旋转,转

子2和4也是,因此抵消气动力矩,可以独立用于偏航控制。水*

速度有时对转子的 Ri 产生作用,对于-ev 和-ev。

机体的阻力被定义为 DB,飞行器质量设为 m,重力加速度为

g,惯性矩阵为 I∈ R3*3 。图2描绘了一个自由受力分析图。

总作用力 F 和时间 M 可以概括为:

4
F ? ?DBeV ? mgeD ? i??1(?Ti zB ? DieV )

(2)

4
M ? i??1(Qi zB ? RieV ? Di (ri *eV ) ? Ti (ri * zB ))

完整的非线性动力学可描述为:
??
mr ?F

?
I ωB ? ωB * IωB ? M

(3)

转子总的角动量假设接*零,因为他们是反旋转。*悬停条件下,

滚转力矩和阻力的贡献可以忽略不计,在方程(1)(2)中。定

4
义总推力为T ? ? Ti 。*移运动定义为, i ?1

??
m r ? F ? ?Rψ * R θ * R φTZB ? mgeD

(4)

其中 Rφ、Rθ和 Rψ分别是横滚、俯仰、偏航的旋转矩阵。对旋 转矩阵运用小角度*似,

??r?x?

? ?

?1

?

??? 0 ? ? 0 ?

m??r?y?

? ?

?

???

1

?

?? ??

0

? ?

?

? ?

0

? ?

? ??

??
rz

? ??

???

-?

1????? T ?? ??mg??

(5)

?
最后,假设总推力*似抵消重力,T≈T =mg,除了 ed 轴,

? ?

??
rx

m??r?y?

? ??

??
rz

? ? ? ? ? ??

?

?0

? ?

0

??mg

? ? ? ??

?

? ?

0 ?

?? T

?

?? 0

?
?T
0
0

100???????????m00g

? ? ? ??

(6)

对于小的角速度,欧拉角加速度由方程(3)丢弃二次阶项ω ×

Iω决定,并推力扩展到四阶。角度方程成为:

? ?

I

x

??
?

? ?

?0

? ?

I

y

??
?

? ?

?

? ?

l

? ??

I

z

??
?

? ??

??K?

l 0 ? K?

0 ?l K?

?l 0 ? K?

? ? ? ??

?T1 ??T2 ???TT43

? ? ? ? ? ?

(7)

此时臂长 l ? ri * zB 由于对称性对所有转子是相同的。由此产
生的线性模型已可以用于控制设计。

四、估算和控制设计

应用光谱分离的概念,内环的态度和高度的控制靠控制电机

电压,外环位置控制靠命令内环的姿态请求。 机械的精确姿态

控制在方程(7)中实现,设计了一个积分型 LQR 控制器考虑到

推力偏差。位置估算使用结合了 GPS 水*位置和速度信息,垂直

位置和的超声波测距仪估计的速度信息,从包括偏差估计的卡尔

曼滤波器中的 IMU 获得的加速度和角速度值的导航过滤器得出。

积分型 LQR 技术应用于方程(6)中所描述的设备线性位置的水

*分量。由此产生的悬停性能如图6所示。

如上所述,姿态控制非常受未建模动态的影响。事实上,到

今天姿态控制油门手动命令,对作者仍然是一个挑战。额外复杂

性由超声波测距传感器产生的,其中有频繁的错误读数,如图3

所示。 从超声波测距传感器,其中有频繁的错误读数,如图3所

示。为了缓解这种噪音的影响,丢弃不可行的测量量用来消除许

多非高斯噪声分量。这些是在高度和高度率的卡尔曼滤波估计之

后,同时也增加了估计的滞后。

这部分介绍两个控制技术的推导,可用于克服未建模动态和剩余

噪声的

图3 特征未处理的超声波测距数据,显示尖峰,虚假回波和丢失。动力飞 行 185秒后开始。
A:积分滑模控制 在旋翼飞机悬停的高度误差动态的线性*似如下,

?
x1 ? x2

?
x2 ? u ? ? (g, x)

(8)

?

?

? ? 其中 x1, x2 ? {(rz,des ? rz ), (rz,des ? rz )} 是高度的错误状

? 态,u ?

u 4
i?1 i

是控制输入,ξ(·)是干扰和不确定动态的

有界模型。假定ξ(·)满足||ξ||≤γ,其中γ为范数ξ(·)

的上限。

在早期使这个系统稳定的尝试中,据观察,由于ξ(g, x ),

LQR 控制不能应对不稳定和性能退化。滑模控制(SMC)被改进提

供一个系统化的方法解决保持稳定和在建模不精确和干扰面前

性能稳定的问题。

然而,直到系统动力学达到滑动流形,这么好性能的 SMC 才变得 不太放心。为了在整个飞行包线(一系列飞行点的连线。以包络 线的形式表示允许航空器飞行的速度、高度范围。)采用了积分 滑模(ISM)技术。
ISM 控制分两部分设计。首先,一个标准的连续循环闭环应 用于线性设备。其次,应用积分滑模技术以保证抗扰。使:

u ? up ? ud

up ? ?K p x1 ? Kd x2

(9)

其中 Kp 和 Kd 比例和导数的回路增益使非线性动力系统稳定无干

扰。为抑制干扰,设计了一个滑动面 s

s ? s0 (x1, x2 ) ? z s0 ? ? (x2 ? kx1)

(10)

这样的状态轨迹被迫走向 s= 0的管道(manifold)。其中 s0 是
一个传统的滑模设计,z 是一个附加的项,使积分控制被包含,

α,K∈R 是正常数。 基于下列 Lyapunov 候选函数,V ? 1 s2 , 2

控制部分 ud 可由 V<0决定,从而保证收敛到滑动流形。

?

?

?

?

?

V ? s s ? s[? (x2 ? k x1) ? z]

?

s[? (up

?

ud

??(g,

x)

?

kx2 )

?

?
z]

?

(11)
0

?
上述条件成立,如果 z ? ??(up ? kx2 ) 和 ud 可以保证满足,

s[ud ? ? (g, x)] ? 0,? ? 0

(12)

由于干扰ξ(g,x)限定在γ范围内,定义 ud ? ??s ,λ∈R。

方程(11)成为,

?
V

?

s?? (??s

? ? (g, x))?

? ? ? ? ? ? s 2 ? ? s ? 0

(13)

可以看出, ? s ?? ? 0 。 因此,如上所述的 up 和 ud,滑模条

件当, 时,成立。

s ?? ?

(14)

以上推导得出的输入,保证动力学得到发展,使得 S 衰变到 滑动流形的边界层 ? 内。另外,由于在滑动模式下控制规律不包
? 含开关函数,系统不会像传统滑模控制器一样受到输入震颤的影

响。

五、强化学*控制

另一种方法是实施强化学*控制器。强化学*的连续状态—

动作空间的大部分工作已经在方法[13][14]中完成。

对于这项工作,该系统的非线性、非参数模型首先使用飞行数据

构建,使系统*似为随机马尔可夫过程[15][16]。然后一个基于模型

的强化学*算法在迭代策略下使用这个模型搜索可以在嵌入式

微处理器下实施的最优控制策略。

为了模拟飞机的随机马尔可夫过程动态,局部加权线性回归

( LWLR ) 方 法 用 于 映 射 当 前 状 态 , S (t ) ? R ns , 和 输 入 u(t ) ? R nu 到 随 后 的 状 态 估 计 S ( t +1 )。 在 此 应 用 中

S ? ???rz

?
rz

??
rz

V

? ??

其中

V

是电池电量。

在高度闭环,输入,

u∈R,是电机总功率。随后的状态映射是传统 LWLR 估算的总和,

V ? R 使用目前的状态和输入,和随机向量,

ns 一起代表未建

模噪声。V 的值是取自输出误差的分布,取决于高斯噪声在 LWLR

估算所使用的最大似然估计[16]。尽管真正的分布是不完全高斯,

但这个模型已经足够了。 LWLR[17]方法非常适合于这个问题,因为它适合非参数曲线的

局部数据结构。该方案扩展了最小二乘法,根据输入值的接*性

分配权重到每个训练数据点,这样计算出输出值。该技术需要庞

大的训练数据,以反映完整的系统动态,这是从航班飞行捕获的

自动和手动控制下的推力和自动控制下的高度状态。

X ? R 对于 m 的训练数据点,输入的训练样本储存

m*(ns ?nu ?1)

中,对应这些输入的输出都存储在 Y ? Rm*n 中。这些矩阵被定

义为:

X

?1 ? ???

S (t1)T ?

??1 S(tm )T

u (t1 )T ?

? ? ?



Y

?

? ? ?

S

(t1

? 1)T ?

? ? ? (15)

u(tm )T ??

??S(tm ?1)T ??

X 中的单位列,使解决方案中包含恒定偏移量就如在线性回归应

用一样。对角线加权矩阵W ? Rm*m ,在 X 上,每个训练数据
点有一个对角线项。该项为训练数据点接*的 S(t)和 u(t)提供 了更多的权重,其中 S(t+1)是要计算的。这项工作中距离测量使 用的是:

Wi ,i

? exp??? ? ?

xi ? x 2? 2

?? ? ?

(16)

? ? x 其中 i 是 X 的第 i 行,x 是向量 1 S(t)T u(t)T ,拟合参数
τ用来调整训练点的影响范围。τ值可以通过交叉验证法调整, 以防止数据带的过拟合或欠拟合。请注意,在取欧几里德范数前 缩放列是必要的,以防止 W 矩阵中状态的不当影响。
随后的状态估计是通过求 V 的 LWLR 估值的和计算的。

^
S(t ?1) ? ( X TWX )?1 X TW T x ? v (17)

因为 W 是 x 和 X 一个连续函数 ,由于 x 是变化的,由此产生的 估值结果是一个捕获局部结构数据的连续非参数曲线。在代码
中,矩阵计算利用矩阵 W 大对角特点,由于每个Wi,i 都要计算,
xi
他被行 相乘,并存储在 W X 中。 被转置的矩阵是不良条件,因为弱相关的数据点影响不大,
所以他们的贡献是不能准确数值反演(数值方法求逆变换)的。 为了更精确地计算数值反演,可以进行奇异值分解,
X TWX ? U?V T 。那么,反演过程中利用奇异值 n 的?i 数值

误差可则以避免其值

? max ?i

?

Cm ax ,其中

Cmax 值通过交叉验证

选择。在这项工作中,发现的 Cmax ≈10数值误差最小,通常在

n = 1时满足。逆运算可直接使用对角线矩阵 ?n ? Rn*n 中 n 的上奇

异值和Un ? Rm*n 与Vn ? Rm*n 中相应的奇异向量计算。
因此,随机马尔可夫模型变为:

? ^
S (t ?1) ? Vn

?1U
n

T n

X

TW

T

x

?

v

(18)

下一步,实施基于模型的强化学*,结合随机马尔可夫模型,设

计出控制器。使用一个二次奖励函数:

?
R(S , Sref ) ? ?c1(rz ? rz,ref )2 ? c2 rz2

(19)

其中 R: R2ns ? R , c1 ? 0 和 c2 ? 0 分别是的给予精确跟

踪和良好的阻尼恒定奖励 Sref ? ???rz,ref

?
rz ,re f

??
rz , re f

Vz ,re f

? ??

是系统所需的参考状态。

控制策略映射所观察到的状态 S 上到输入命令 u。在这项工

作中,状态空间约束条件为 rz ? 0 ,输入命令约束条件为 0 ? u ? um ax 。控制策略选择为:

?

??

? (S, w) ? w1 ? w2 (rz ? rz,ref ) ? w3 rz ? w4 rz (20)

其中, w ? Rnc 是策略系数 w1,? , wnc 的向量。

线性函数足以达到良好的稳定性和表现。附加条款,如电池电量 和高度误差的积分,也应该考虑进来使策略更加适应不同的飞行 条件。
迭代策略按算法1中解释那样执行。该算法旨在找到产生最大

R 总奖励的 total 的 w 值,通过模拟一个从一组随机的初始条件下
开始的有限范围的系统确定,并对每个遇到的状态求

R(S, Sref ) 的和。
算法1 基于模型的强化学*

1:生成随机初始状态集合 S0
2:产生随机参考轨迹集合 T 3:初始化 W 为合理值

4: Rbest ? ??, wbest ? w
5:repeat

6: Rtotal ? 0

7: for s0 ? S0 , t ?T do

8: S0 ? s0

9: for t=0 to tm ax ? 1 do

10:

u(t) ? ? (S(t),w)

11:

S(t ?1) ? LWLR(S(t),u(t)) ? v

12:

Rtotal ? Rtotal ? R(S (t ?1))

13:

end for

14: end for

15: if Rtotal ? Rbest then

16:

Rbest ? Rtotal , wbest ? w

17: end if

18: 添加高斯随机向量 w,存储为 w

w 19:until best 收敛

在策略迭代中,每一次迭代一组固定随机初始条件和参考轨 迹用来模拟飞行,使用一个由 w 给定的策略参数。每次迭代中使 用相同的随机集合以便达到收敛可能[15]。每次迭代后,新 w 值
如果优于以前的最好策略则作为 wbest 存储。通过比较 Rtotal 与 Rbest 确定,先前的最好回报(best reward)则丢弃。然后, 一个高斯随机向量添加到 wbest 。结果存储为 w,模拟再次执行。
如是迭代,直到 w 值经历适当的迭代次数后保持稳定,由特定的 应用程序确定。
仿真结果必须加以检查,以预测控制策略所可能产生的性能。 通过为策略的权重 w 使用高斯更新规则,它有可能超过局部最大
的 Rtotal 。最高概率的步骤是小的,导致解决方案的完善, Rtotal 中局部最大值的附*。但是,如果该算法是全局最大的,
并允许继续执行,存在一个有限的概率一个足够大的高斯步骤将 执行,这样,该算法能保持递增。 六、 试飞结果

A、积分滑模 基于 ISM 控制的室外试飞结果在图4中可以看出。响应时间为
1-2秒的间隔, 5秒的稳定时间,几乎没有稳态偏移。此外,在 响应中可以看出振荡性质,这是最有可能被非线性气动效应和前 面所述的传感器数据尖峰触发。
图4、在户外飞行测试的积分滑模阶跃响应 相比线性控制技术在飞机上的应用,ISM 控制可以看到显著 提高。通过明确纳入对未知的边界在控制规律的推导中,通过明 确的包含未知干扰力量的范围,避开标准方法,保持系统的高度 稳定是有可能的。可能保持稳定的高度,系统具有回避的标准方 法。 B、强化学*控制 RL 的控制设计中最令人兴奋的方面之一是它易于实施。迭代 策略算法完成实施的控制策略在奔腾4电脑上只用了3个小时。 图5给出了控制器的飞行测试结果 用于 RL 控制设计的系统的高逼真度模型,为 RL 控制策略与其他 控制器比较提供了一个有用的工具。事实上,应用线性控制器仿 真被证明对旋翼不稳定,日益振荡的飞行路径可以预测它密切匹

配实际飞行数据。 局部加权线性回归模型显示了许多线性模型没有反映的关
系,但它很好的反映了系统物理。 例如,在所有其他状态保持固定,在油门状态下向上的速度在随 后的时间步长导致更多加速度,下降的速度会产生相反的效果。 这是基本上是负阻尼。该模型还显示出强大的地面效应。这是在 所有其他状态保持固定情况下,车辆越接*地面,在给定的油门 水*下随后的一个时间步长中,将有更大的加速度。
图5 强化学*控制器在户外飞行测试下手动应用阶跃输入响应。 尖峰状态估计是从传感器噪声通过卡尔曼滤波器传递的。
没有经过训练的强化学*控制规律是容易受到系统干扰。特 别是,不同的电池电量和叶片退化,可能会导致稳定或稳定状态 偏移的减弱。此外,控制策略的积分误差项是减缓稳态干扰的有 效手段,就如 ISM 控制规律中看到的一样。
比较 ISM 和 RL 控*自鞠煊Γ允境鱿嗤奈榷ㄐ院拖嗨频 响应时间,虽然 ISM 控制的瞬态动力特性更加明显。RL 拥有的优 势在于它将加速度测量值纳入其控制。这样,使用了测量施加在 飞机上干扰更直接的方法。

C 自主悬停 应用 ISM 高度控制和积分型 LQR 位置控制技术,飞行试验以
达到自主悬停的目标。在一个两分钟的飞行时间内位置响应保持 在直径3米的圆内(见图6),这是在使用 L1波段差分 GPS 预期的 必然误差之内的。
图6 自主悬停飞行位置记录与3m 的误差圆
七、结论 本文总结了自主旋翼能够扩展户外轨迹跟踪控制的发展。这
是作者已知具有这种能力的旋翼第一个示范,展现了开发一种新 型的、易于使用、多飞行器用于验证自主航空机器人多 Agent 控 制策略的实验台的关键一步。开具体来说,介绍了高度控制回路 的两种设计方法,这才是具有挑战性的*A街旨际醵即丛炝 稳定的控制器,具有相似的响应时间。相对于线性控制器未能使 系统充分稳定是一个显著的改善。
致谢 作 者 要 感 谢 Dev Gorur Rajnarayan 和 David Dostal 对 STARMAC 的开发和测试做出的贡献,以及斯坦福大学 Andrew Ng 教授在开发强化学*控制时的建议和指导。


相关推荐

最新更新

猜你喜欢