7 Mobile Sensing and Activity Recognition

analytics-ml

modeling

mobile

sensing

7.1 Start With the Story

Picture an IoT team using the ideas in Mobile Sensing and Activity Recognition during a live operations review. A device has produced messy evidence, an analytic step is about to change an alert or control decision, and someone has to explain why the result should be trusted.

Read this page as that path from sensor evidence to accountable action. Start with what the system observes, keep the model or data treatment visible, and finish with the check that would convince an operator, maintainer, or auditor to act.

Phoebe’s Field Notes: Why a Gyroscope Cannot Just Report an Angle

Phoebe the physics guide

Phoebe’s Why

This chapter’s gait-phase work leans on “accelerometer and gyroscope fusion estimates joint angles for each frame” – but a MEMS gyroscope never measures angle at all. Its vibrating proof mass senses Coriolis deflection, which is proportional to rotation rate, not position. Getting an angle back out means integrating that rate sample by sample, and every real-world rate sample carries two kinds of error: a small random jitter around the true value, and a slowly-varying offset (bias) that is not zero even when the sensor is perfectly still. Integration treats both the same way – it accumulates them into the angle estimate forever, which is exactly why this chapter’s own gait-phase and joint-tracking systems never trust a lone gyroscope over a long stretch of time and instead lean on HMM state constraints, Vicon ground truth, or fused accelerometer evidence.

The Derivation

The governing equation a gyroscope reading inverts is the integral relating rate to angle:

\[\theta(t) = \theta_0 + \int_0^t \omega(t')\,dt'\]

Discretized at the sample rate \(f_s\) this chapter’s own pipeline uses, with \(\Delta t = 1/f_s\):

\[\theta_n = \theta_0 + \Delta t\sum_{i=1}^{n}\omega_i\]

A real rate sample is the true rate plus a slowly varying bias \(b\) and zero-mean noise: \(\omega_i^{meas}=\omega_i^{true}+b+n_i\). The bias term integrates linearly in time; the independent noise term accumulates as a random walk, whose standard deviation is specified by the sensor’s angle random walk (ARW) rating:

\[\Delta\theta_{bias}(t) = b\,t, \qquad \sigma_{\theta,ARW}(t) = \mathrm{ARW}\sqrt{t}\]

Worked Numbers: This Chapter’s 50 Hz Stream, Integrated

This chapter’s own sample period: \(\Delta t = 1/50 = 0.0200\) s, the same 50 Hz this chapter’s windowed-feature worked example already samples at.
Inside a single 2-second HAR window (catalog-typical wearable MEMS gyro, ARW \(\approx1.00\) °/\(\sqrt{\text{hr}}\), residual bias \(b\approx0.0500\) °/s): \(\sigma_{ARW}(2\ \mathrm{s}) = 1.00\times\sqrt{2/3600} = 0.0236\)°; \(\Delta\theta_{bias}(2\ \mathrm{s}) = 0.0500\times2.00 = 0.100\)° – both negligible, which is why raw windowed features like magnitude and variance never need an absolute-angle correction.
Over a 60-second continuous trial, a realistic length for the gait recordings this chapter describes: \(\sigma_{ARW}(60\ \mathrm{s}) = 1.00\times\sqrt{60/3600} = 0.129\)°, but \(\Delta\theta_{bias}(60\ \mathrm{s}) = 0.0500\times60.0 = 3.00\)° – the bias term is now more than 23x the random-walk term and has grown to a few degrees, easily enough to shift a heel-strike or toe-off event outside this chapter’s own stated tolerance window.
Why this chapter’s own gait pipeline needs more than one sensor: bias drift grows linearly and unboundedly with pure integration, while a gravity-referenced accelerometer reading carries no time-dependent drift at all – exactly the complementary-error argument behind this chapter’s own “accelerometer and gyroscope fusion estimates joint angles for each frame” step, and one reason a stricter validation pairs the same IMU trial against independent Vicon ground truth rather than trusting integrated gyroscope angle alone across a multi-second trial.

7.2 Activity Recognition Labels

Mobile sensing uses phone or wearable sensors as a source of motion, location, audio, proximity, and context records. In human activity recognition, the usual signal is an inertial stream from accelerometers and gyroscopes. The model does not classify one sample at a time. It slices the continuous stream into windows, extracts features from each window, and predicts a bounded label such as walking, running, sitting, standing, stair movement, or unknown.

The important design question is not "can a phone recognize activity?" The useful question is which activity classes, phone placements, users, sampling rate, window size, labels, and validation setting the claim covers. A model trained with a phone in a trouser pocket may not mean the same thing when the phone is in a backpack, on a desk, or held in the hand.

If you only need the intuition, this layer is enough: mobile activity recognition is a windowed classification problem. Keep the window, label source, placement, user context, battery cost, privacy boundary, and retest trigger with every model claim.

Worked example: a walking-versus-running classifier may collect accelerometer and gyroscope streams at 50 Hz, group them into 2-second windows, extract magnitude and frequency features, and emit one label per second. The claim is only valid for the carrying style and label process used in the study. A waistband phone, a loose jacket pocket, and a phone held while texting can produce different axis directions, damping, and transitions, so placement metadata is part of the model evidence rather than a note added later. The output cadence also matters: one label per second may be fine for summaries, but fall alerts or coaching prompts need a latency target tied to the window and overlap.

Activity recognition is only one mobile-sensing target. The same evidence pattern can support transportation-mode detection from motion, GPS, Wi-Fi, and map context; environmental or social context recognition from microphone, camera, location, and co-location signals; and narrow behavior-inference tasks such as fitness feedback or intervention prompts. These claims should be more tightly bounded than the interface label suggests: emotion, context, and co-location inferences are sensitive, so the release record should state consent, data minimization, retention, unsupported classes, and the fallback behavior when the signal does not justify a label.

Human activity recognition pipeline showing smartphone accelerometer, gyroscope, and magnetometer data flowing into windowed motion segments, feature extraction, and machine-learning activity labels such as walking, running, sitting, standing, and climbing — Human activity recognition pipeline from smartphone sensors to activity labels

Signals

Accelerometer and gyroscope streams describe motion and orientation. Location, audio, and app context can add evidence, but each signal has its own permission and quality boundary.

Windows

Continuous samples become fixed-duration windows such as 1 to 3 seconds. Overlap smooths decisions but increases compute and energy use.

Features

Mean, standard deviation, magnitude, energy, dominant frequency, and axis correlation summarize each window before classification.

Boundaries

Labels are valid only for the phone placement, users, activities, sample rate, and validation population that the training and test evidence cover.

Overview Knowledge Check

7.3 Windowed HAR Feature Record

A practical human activity recognition pipeline starts with a small label set, a repeatable collection protocol, and a feature record that can be reviewed. The phone samples acceleration and angular velocity, windows the stream, extracts features, runs a classifier, and applies a conservative output rule. If confidence is low or the setup differs from training, the output should be unknown rather than a confident but unsupported label.

Worked example: 50 Hz inertial stream
sample rate: 50 samples/second
window length: 2 seconds
window samples: 50 x 2 = 100 samples per sensor axis
overlap: 50 percent
step size: 1 second
new feature vector: every 1 second
channels: accel x/y/z + gyro x/y/z = 6 streams

example per-window features:
- mean and standard deviation per channel
- min and max per channel
- signal magnitude area for acceleration
- dominant frequency or spectral energy for periodic motion
- axis correlation for gait or phone orientation effects

Sequence and classifier choices should match the evidence boundary. Dynamic time warping (DTW) can compare motion-window shapes when timing varies. It aligns two time series with dynamic programming, nonlinearly stretching or compressing one motion against a reference motion, then accumulates the point-to-point Euclidean distances along the selected path. The useful DTW path should stay monotonic and continuous; its ordinary one-dimensional running time is O(n^2), while naive higher-dimensional versions become impractical because the search grows with the number of dimensions. In multi-sensor motion work, teams often calculate DTW per corresponding dimension, average the distances, and weight dimensions when some streams, such as gyroscope axes, carry more classification evidence than magnetometer axes.

PCA or another dimensionality-reduction step can expose dominant feature directions before a compact model is trained. PCA projects a high-dimensional dataset onto new axes so that the first component explains the largest possible variance in the original data, and each subsequent component explains the next largest remaining variance while staying orthogonal to the earlier components. That ordering is why PCA can highlight pertinent motion structure, reduce noise, and lower the cost of downstream comparisons such as DTW. The projected components still need scaling, provenance, and deployment-time reproducibility. Common first-pass classifiers for this kind of sensor stream include Naive Bayes, k-nearest neighbours, decision trees, logistic regression, multilayer perceptrons, and SVMs; hidden Markov models or other sequence models become useful when the transition pattern between activities is part of the claim. The review record should name the feature set, scaling step, model family, sequence assumption, and validation split together so the result is not just a list of algorithms.

Clinical gait monitoring is a stricter version of the same pipeline. A body-worn IMU system may label heel strike, foot flat, heel off, toe off, and midswing for the left and right legs across a normalized gait cycle. That output is not just a walking label; it is a timed phase sequence that must be compared with ground truth and reported with per-leg sensitivity, specificity, tolerance, sensor placement, and any missed or ambiguous transitions.

A stricter gait-study validation can pair wearable Orient IMUs with optical Vicon markers. The wearable units capture body-adjacent inertial streams, while the marker-based system supplies ground truth for normal walking. The record should name unit placement, reflector pairing, trigger source, sampling rates, interpolation rule, calibration period, walkway protocol, subject count, and trial count. If one stream is sampled at 42.67 Hz and the reference at 100 Hz, the comparison is not valid until the timestamp alignment and resampling rule are preserved with the labels.

One common sequence-classification pattern combines a feedforward neural network with a hidden Markov model. The FNN sees a fixed-size sliding window of sensor frames and assigns a gait-phase label to each window, which is useful for high-dimensional inputs but weak at enforcing temporal order. The HMM then treats those window labels as emissions from hidden gait-phase states, uses transition probabilities to encode legal phase order, and applies a decoder such as Viterbi to infer the most likely phase sequence. That pairing is reviewable only when the window size, central-frame label rule, emission confusion rates, transition model, and final phase-tolerance metric are stored with the validation record.

For a concrete gait-phase classifier, the states may be heel strike, foot flat, heel off, toe off, and midswing, with each phase named by its initiating event. A compact FNN record might use three hidden layers of ten units, five output units, 15-frame windows, and a one-frame step. With seven Orient units, six accelerometer/gyroscope channels per unit, and 15 frames per window, that becomes 7 x 6 x 15 = 630 input values before classification. The same record should preserve the training objective, such as cross-entropy, the optimizer, such as scaled conjugate-gradient backpropagation, the leave-one-trial-out validation split, the HMM transition matrix, and the emission matrix that captures how often each hidden gait phase is observed as each FNN label. Event scoring also needs its own tolerance statement: exact-frame sensitivity and specificity can differ from scores that allow a +/- N-frame window around the ground-truth event. Larger windows may improve phase accuracy, but smaller windows reduce compute and can make local wearable inference practical; if the sensor evidence drops to a two-channel foot-accelerometer setup, a longer window may be needed to approach the performance of a richer 42-channel gait frame.

The validation report should expose the result shape, not only one average accuracy. In the Orient gait study, sensitivity rises from the high 80s toward the low 90s as the neural-network sliding window grows, while specificity stays close to 97 to 98 percent. A five-subject table is more useful than a single headline: averaged over four trials each, subject sensitivity ranges from 87.72 to 96.07 percent and specificity from 96.70 to 98.47 percent. The gait-summary record should also keep cycle time, steps per minute, and left/right stance-to-swing ratios. In one example, cycle time spans 0.90 to 1.13 seconds, cadence spans 52.89 to 66.71 steps per minute, and stance-to-swing ratios cluster around 0.62 to 0.65, which gives a quick sanity check that the detected phases resemble expected walking mechanics.

Those metrics become more informative when plotted against symmetry expectations. Left-foot versus right-foot phase-duration plots let reviewers see whether points sit near the symmetry line or whether one subject is consistently biased. A light-blue subject with right-biased phase durations and a black subject with a left-biased swing phase but right-biased stance phase are not just visualization trivia; they show why automated gait analytics should assist practitioners by surfacing candidate asymmetries, not by hiding the raw comparison behind a pass/fail label. Sensor-placement ablations make the same point. Reducing the seven Orient units lowers FNN+HMM complexity but can change detection performance by phase and body location. Accel-plus-gyro evidence from any lower-body location can remain strong, all sensors together perform best, foot sensors are especially strong for heel strike, and thigh-only evidence can be weaker; in a gyro-only placement study, thigh data was comparatively poor for foot-centric events except midswing. When accelerometer-only or gyroscope-only streams are used in isolation, performance becomes more placement sensitive: specificity often stays high, but sensitivity varies because the evidence has been reduced to a single pair of triaxial sensors and split again by gait phase.

Instrumented skill-training simulators use the same evidence pattern outside walking. A take-home laparoscopic trainer can attach Orient-style inertial tags to tool handles and compare a trainee's instrument trace with cohort or expert records. The record should distinguish device setup evidence, such as tag placement, calibration pose, sampling rate, wireless link, participant id, and task id, from derived skill features. Roll and pitch from the accelerometer are gravity-referenced orientation cues, not a complete competence score: for a normalized acceleration vector G, common estimates use roll = atan2(G_y, G_z) and pitch = atan2(-G_x, sqrt(G_y^2 + G_z^2)), with signs tied to the mounted instrument frame. A simple economy-of-movement feature can compute the mean variance of x, y, and z acceleration over the previous one-second window; high variance suggests jerky novice motion while lower variance can indicate smoother expert control, but the claim is only reviewable when window length, axis units, filtering, and task segment are recorded with the raw trace.

The broader laparoscopic metric set should also be recorded, because it separates raw movement summaries from derived skill claims. Task duration is the elapsed time T; angular distance is the sum of frame-to-frame angle changes theta; average speed is theta / T; and average angular acceleration is omega / T. Motion smoothness is a normalized integral of instantaneous jerk, computed from the second derivative of frame-to-frame angle differences and averaged across left and right hands, with values closer to zero indicating smoother motion. Handedness subtracts left-hand frame distance from right-hand frame distance and averages the difference after multiplying by frame rate. Speed variance measures variance in consecutive-frame angular distances, and ambidexterity measures the correlation between left and right frame speeds, so positive correlation means concurrent instrument motion while negative correlation means one instrument dominates at a time.

Validation of those derived metrics should preserve the statistical question. In the Orient simulator study, one-way ANOVA tested whether novice, intermediate, and expert groups differed; Pearson correlation compared equivalent Orient and camera-based metrics; and a least-squares linear regression could combine the useful metrics into an overall scoring function only after an equal train/test split. The scoring equation should be documented as an equation, not summarized as "the model": a constant term plus weighted average acceleration, total duration, motion smoothness, angular distance, and ambidexterity. In the example record, acceleration raises the score, while duration, angular distance, and ambidexterity carry negative weights and motion smoothness is scaled by a very small coefficient because of its units. A leave-one-out validation view of the combined score still needs group evidence; the reported box plot separated novices from intermediates at p = 0.012, intermediates from experts at p = 0.001, and novices from experts at p < 0.001. The result pattern is not "all motion is faster for experts." Experts had shorter task duration, lower total angular distance, lower motion-smoothness values, and higher average acceleration, while average speed was not a strong separator and ambidexterity separated experts from less experienced groups more weakly. That shape is a useful teaching point: skill evidence comes from a bundle of bounded movement metrics, p-values, and validation splits, not one attractive box plot.

A cello tutor is another example of the same instrumented-skill-training pattern. A take-home setup can attach one Orient IMU to the player's wrist and another to the frog of the bow, then recognize whether the bowing technique matches the exercise and return feedback. The label set should be written in musical terms before it becomes a classifier target: legato is smoothly connected, staccato is a short detached stopped stroke, martele is a percussive accented stroke, spiccato is a detached or bouncing stroke, tremolo alternates rapid up and down strokes, col legno strikes with the bow stick, and ricochet groups multiple notes in one bow stroke. The protocol should also say whether the technique is performed on a single string or while crossing strings, because that changes the motion evidence without changing the technique name.

The cello record also shows why segmentation choices belong next to model accuracy. Orient data and video labels were aligned by timestamp to about 50 milliseconds, then split into candidate windows from 1 to 2 seconds. A 1-second segment can capture short bowing events while still encoding macroscopic motion; for example, a 25-second legato single-string capture yields 25 training examples. The per-window features were deliberately simple: average acceleration, acceleration variance, average smoothness of acceleration as the first derivative of the accelerometer signal, average angular velocity, angular-velocity variance, and average angular acceleration. A variance-in-acceleration plot for legato and martele illustrates the review question: the technique boundary can be much stronger than the single-string versus crossing-string condition, so the validation report should preserve both labels.

Model comparisons for this cello tutor are useful only inside their data-capture boundary. In the reported experiment, evaluation used 10-fold cross-validation on an artificially shuffled dataset made by randomly concatenating 1-second segments of individual bowing techniques. Within that setting, NaiveBayes and k-NN each reached 93.75 percent accuracy, J48 decision trees reached 90.83 percent, logistic regression reached 97.5 percent, and both a multilayer perceptron and an SVM with a normalised PolyKernel reached 98.33 percent. Those numbers are a compact model-selection example, not proof that a continuous live tutor is solved unless the same timing, segmentation, labels, users, and feedback latency are preserved.

Object-attached sensing extends the same pattern beyond body-worn coaching. A fast-moving consumer goods study can attach a coin-scale mini Speck or NanoSpeck accelerometer to a product, let a phone app collect the local stream, and infer daily-ritual events such as pouring liquid detergent, squeezing moisturiser or soap, and shaking ketchup. That setup replaces or complements interviews, diaries, and questionnaires: the manual methods are low tech, but they are intrusive, time-consuming, error-prone, and often produce noisy self-reported usage data. The sensor record should therefore name the product attachment point, the phone relay path, the programmed action labels, and the timestamp/server upload rule, because the business question is not "did acceleration change?" but "which product-use event happened, when, and with what confidence?"

A reviewable FMCG feature record can be deliberately small. In the slide example, MA is a 10-minute moving average of z-axis acceleration, JC is jerk count, and length is the number of z-axis accelerometer values in the event window. Scatter views of MA against JC x length separate pour, squeeze, and shake into visible clusters, while MA against JC exposes more overlap and outliers. A phone result screen that counts 13 daily-ritual events, for example four other events, four shakes, two squeezes, and three pours, is only useful if those feature definitions, window lengths, class labels, unknown/other handling, and app-device connection identifiers are stored with the model evidence.

Multi-IMU motion records become high-dimensional quickly. If a study uses N devices, S sensor types per device, and A axes per sensor, the raw dimensionality is D = N x S x A before any window features are added. A gait setup with seven body-worn units, three sensor streams, and three axes already produces 7 x 3 x 3 = 63 channels. The review should say whether the pipeline analyzes that full channel set, projects it into a lower-dimensional representation d < D, or converts it into joint-angle and joint-position features before segmentation, classification, or reference-motion matching.

Model-based gait segmentation adds a kinematic assumption on top of the inertial stream. Accelerometer and gyroscope fusion estimates joint angles for each frame; limb lengths and forward kinematics turn those angles into joint positions such as p_i = [x, y, z] relative to a fixed body reference, often the hip. The segmentation algorithm then scans sagittal, coronal, and transverse motion-plane traces for local minima and maxima, using successive extrema to define phase intervals. Its weakness is that the designer must explicitly choose the salient joints and planes: transverse knee motion, sagittal ankle motion, and left-versus-right joint traces may produce conflicting extrema sets, which is hard for a non-expert to resolve. That method is reviewable only when the sensor coordinate frames, body-plane transform, filtering, joint or plane selection, and extrema thresholds are recorded, because small discontinuities and sensor noise can create false positive segmentation points.

A model-free latent-space segmentation path reduces that selection burden. For J tracked joints, the pipeline can aggregate each frame into a single feature vector such as q_t = [x_1, y_1, z_1, x_2, y_2, z_2, ..., x_J, y_J, z_J]^T, learn a lower-dimensional representation r_t of the high-dimensional motion sequence q_t, and then run the same min-max segmentation logic in latent space. A manifold method such as Isomap is one way to learn that representation. The practical claim is not that Isomap is automatically better; it is that the review now has one segmentation set for the whole motion rather than separate, potentially conflicting sets for each hand-picked joint or plane.

Raw inertial traces are still useful evidence before classification. An arm-spin dataset, for example, may look like several noisy sensor channels over time; a PCA projection can turn those channels into a lower-dimensional trajectory that reveals repeated motion structure without pretending the projection is already an activity label. For a horizontal arm-spin or vertical-arm exercise record, the audit trail should keep the raw MAG, GYRO, and ACCEL x/y/z streams, the first and second principal-component trajectory, and the segmentation rule together. If the first principal component is used to place cycle boundaries, those boundaries are part of the method, not a visual afterthought.

Similarity matrices make the same point from another angle. A high-dimensional comparison can run DTW on each matching sensor channel and average the distances into one cell for each pair of motions. A PCA-reduced comparison can instead compute distances in the lower-dimensional component space, often making repeated arm-angle clusters easier to read. The learner should be able to see which distance rule, channel set, PCA fit scope, and dimensionality were used before accepting a coloured matrix as evidence that two exercises match. Keep that distinction when mobile sensing moves from personal feedback to social comparison or public-health summaries: the same sense-learn-act loop must preserve the raw-window source, feature transform, aggregation scope, and action boundary before it supports a decision.

Design Item

Practitioner Choice

Review Evidence

Window

Choose window length and overlap based on how quickly the activity changes and how much latency the app can tolerate.

Record sample rate, window length, step size, missing-sample handling, and whether labels align to the start, middle, or end of the window.

Features

Use simple time-domain features first; add frequency-domain features only when periodic motion matters.

Keep feature definitions, units, filtering steps, and any normalization applied before inference.

Labels

Use activity labels that observers can define consistently, and include an unknown or transition state when evidence is weak.

Keep label source, user instructions, placement notes, and examples of ambiguous windows.

Classifier

Use a model family that fits the device budget and explanation need: decision tree, random forest, SVM, small neural network, or rule-assisted baseline.

Validate per class, per user group, and per placement. Report confusion, not just a single average metric.

Practitioner Knowledge Check

7.4 Context Changes Break Deployment

Mobile sensing models often fail when training conditions and deployment conditions drift apart. The raw physics may still be valid, but the inference claim changes when the phone moves from pocket to hand, when sampling is throttled, when the user population changes, when labels were collected inconsistently, or when battery policy changes the available windows. The model should expose quality states instead of hiding these boundaries.

A common failure is training on clean, prompted activities but deploying into messy daily life. A participant may start walking before the label button is pressed, pause at a crossing, carry a bag in the same hand as the phone, or move through a transition that is neither sitting nor walking for the full window. If these transition windows were removed during training, the deployed model needs an unknown or transition state instead of forcing every window into a tidy class.

The deployment contract should preserve sample rate, timestamp tolerance, window length, overlap, placement assumption, feature code, classifier version, threshold, and fallback behavior together. If the operating system later batches sensors to save power, the same feature extractor may receive irregular intervals and stale windows. If a quantized model replaces a larger classifier, class boundaries and confidence calibration need a replay check before release. The review should treat those changes as evidence changes, not just implementation changes. A useful release record stores replay results by class and placement, so a later regression can be traced to sensor timing, feature drift, threshold choice, or model conversion instead of being described as a vague accuracy drop. Include failed windows too, because their pattern often reveals the actual deployment boundary.

Continuous sensing also has an energy contract. Always-on GPS, gyroscope, audio, and network upload can turn a useful prototype into a short-lived app, so many releases combine cheap motion triggers, lower-rate background checks, and duty cycling. A duty-cycled classifier should say which low-power signal keeps watch, how long the device sleeps when no event is detected, what shortens the sleep interval, and what accuracy or freshness is lost while sleeping. Audio keyword spotting follows the same rule: record the vocabulary, silence and unknown classes, sample rate, feature pipeline, class balance, and privacy boundary before treating a keyword label as an action trigger.

Late or Missing Samples

Mobile operating systems may batch, pause, or throttle sensors. Feature code must mark missing windows, stale timestamps, and irregular sample intervals.

Personalization

A population model can be useful, but gait, carrying style, device model, and mobility aids can change class boundaries. Calibration or per-user adaptation may be needed.

On-Device Inference

Running inference on the phone or wearable can protect latency and privacy, but it must fit CPU, memory, battery, and update constraints. Quantized models need validation after conversion.

Privacy Boundary

Location, audio, and behavioral labels can be sensitive. Keep only the features and labels needed for the stated application, and document retention and review rules.

Boundary

Failure Mode

Retest Trigger

Placement

Phone orientation and damping change, so features learned from pocket data are applied to hand-held or bag data.

New carrying instruction, new mount, new wearable position, or mixed placement in deployment.

Energy policy

Sampling rate, overlap, or inference frequency is reduced without checking label latency and class confusion.

New battery target, OS sensor throttling, background execution change, or duty-cycling rule.

Label source

Training labels were self-reported, delayed, or ambiguous, then treated as exact ground truth.

New label collection process, new activity class, new annotator instruction, or more transition windows.

Model update

A smaller or quantized model changes class behavior after deployment.

New feature extractor, model family, quantization scheme, threshold, or fallback state.

Under-the-Hood Knowledge Check

7.5 Summary

Mobile sensing and activity recognition turn phone or wearable streams into bounded activity labels. The durable design is a windowed pipeline: collect sensor streams, align labels, extract features, classify each window, smooth conservatively, and expose quality states. The technical work is not only model choice; it is preserving placement, windowing, label, battery, privacy, and validation evidence so the activity claim can be reviewed.

Key Takeaway

Human activity recognition is only as strong as its window, label, placement, and validation record. A confident label is useful only when the deployment context matches the evidence used to train and test it.

7 Mobile Sensing and Activity Recognition

7.1 Start With the Story

Phoebe’s Why

The Derivation

Worked Numbers: This Chapter’s 50 Hz Stream, Integrated

7.2 Activity Recognition Labels

Signals

Windows

Features

Boundaries

Overview Knowledge Check

7.3 Windowed HAR Feature Record

Practitioner Knowledge Check

7.4 Context Changes Break Deployment

Late or Missing Samples

Personalization

On-Device Inference

Privacy Boundary

Under-the-Hood Knowledge Check

7.5 Summary

7.6 See Also

IoT Machine Learning Fundamentals

Feature Engineering for ML

IoT Machine Learning Pipeline

Edge ML and TinyML Deployment