I had explored a majority of what the Echo, my new buddy, could do as built-in functionality. Now was the time to think and explore. So what did my friend Alexa look like? To be honest, she looks slick, but fairly plain.
The top had an LED ring, a volume dial, a “mute” button and an “action” button. Since I had previously spent some time playing with the Echo, particularly the wake word I noticed the Echo’s LEDs patterns. I found that the LED ring showed blue when starting up, cyan when processing, orange when connecting to WiFi, red when muted, oscillating violet with WiFi issues, blinking orange when connected to WiFi without Alexa Voice Services access and no light while active and waiting. I started to wonder whether or not there was a way to trigger a wake word and suppress the light?
I would have to further investigate the hardware of Alexa…what’s on the inside? Well…here is a short list!
-802.11a/b/g/n, Dual-band, dual-antenna WiFi (MIMO), no p2p
-Bluetooth : advanced Audio Distribution Profile (A2DP) support for audio streaming
-Texas Instruments TLV320DAC3203 Ultra Low Power Stereo Audio Codec
-Texas Instruments TPA3110D2 15W Filter-Free Class D Stereo Amplifier
-Texas Instruments DM3725CUS100 Digital Media Processor
-ARM Cortex A8
-This has 4 UARTs…suspecting pins TP127 TP120, TP121 and TP126
-Load-store Arch with non-aligned support
-Samsung K4X2G323PD-8GD8 256 MB LPDDR1 RAM
-SanDisk SDIN7DP2-4G 4 GB iNAND Ultra Flash Memory
-Qualcomm Atheros QCA6234X-AM2D Wi-Fi and Bluetooth Module
-Texas Instruments TPS65910A1 Integrated Power Management IC
Peeling back the black footing for the Echo revealed what appeared to be a DEBUGGING PORT!
I poked around a bit with a multimeter which quickly led me to find one pin varying in voltage and one 15V. One voltage was presumably ground, when it was, in fact, a step function. I wanted to be careful, since we only have one Echo but other researchers online showed that someone had gotten ASCII text out from an oscilloscope view using that debugging port but not much else.
In looking at the boards inside the Echo, I found a few other areas that would require additional exploration. It appears that one board may contain a JTAG header. Another vector to look into is the possible UART next to a TI board.
It definitely appears to be worthwhile to investigate the hardware more than I was able to do in my short amount of time. The most profitable attack vector seems to be the debugging port. However, if those are actually JTAG and UART interfaces, those may also prove of interest in further research.
After my quick investigation of the hardware, I wanted to learn more about the voice processing and a way to disable the light. After further research, I learned way more about the inner workings of AWS. This was because I realized that only the wake work was processed on board. All subsequent commands were sent to Alexa Voice Services (AVS) and then to the chain of Amazon’s “cloud.” Apparently, what is sent to the “cloud” includes a “fraction of a second of audio before the wake word,” according to Amazon. I had run into a length limit in my commands previously, and found that the wake word and commands were store in the Echo app’s history.
I made great use of my brightly colored dry-erase markers and set to work, to understand the AWS cloud. After some research on the internet, I found that ‘Skills’ are tiered and accessed differently. Core Skills (built in features such as ToDo, Shopping list, etc.) are accessed directly, and issue a directive/card. Custom Skills, created through the Amazon SDK, are the same, receiving an intent and acting upon that via the cloud. However, lighting/locking/thermo skills are accessed via Coho Core Skills calling these, getting a return value and then our Core Skills block returning the directive.
Amazon Web Services Key:
Collect, Annotate, Ingest, Train, Evaluate, Deploy
◎◎S3 (Simple Storage Service, online file storage)
◎◎DynamoDB (NoSQL database with a multi-master design requiring version conflict resolution, synchronous replication across data centers. Can be purchased based on throughput instead of storage. Java, Perl, php, .NET, Node.js, Python, Ruby, Erlang)
◎◎Glacier (online file storage, low-cost long term storage, low-frequent access time.)
◎Mechanical Turk (Crowdsourcing internet marketplace. Place requests for Human Intelligence Tasks)
◎◎RDS (Relational Database Service, distributed and automatically managed admin procs like patching, backing up db and enabling point-in-time recovery. MySQL, Oracle, PostgreSQL, MariaDB)
◎◎SQS (Simple Queue Service, distributed queue messaging service, amazon server for msg over web service applications)
◎◎SNS (Simple Notification Service, hosted multi-protocol “push” messaging handler)
◎◎◎EMR (Elastic MapReduce using Hadoop on EC2 and S3 for computation) (Train/Eval/Deploy)
◎◎SWF (Simple Workflow)
◎◎RedShift (Massive Parallel Proc. data warehouse handle analytic workloads, reporting and data analysis)
Once I returned from that rabbit hole more knowledgeable for no reason, I started looking into the remote and how that affected the wake word. It seems as though there is a button on the Echo’s remote (purchased separately) that initiates the voice processing the same way the wake word does. Thus, using the remote DOESN’T require the wake word! I synced the remote via Bluetooth and played around a little. I ended up somehow syncing my phone to the remote and doing some strange tangential things. It seems as though the Echo’s Bluetooth profile only includes ‘media connection’ to the phone and the remote is only an ‘input device.’ I attempted to play around with my phone, the Echo and the remote without much luck because of the built in Bluetooth profiles. What I did find was that when the remote was connected to both the Echo and my phone, when I hit the voice button on the remote, Alexa stops, but the remote was still controlling my phone. I was curious if that was a spoof-able or profitable avenue.
I had spent some time addressing the Alexa web app prior to the Bluetooth remote research. I had found the format for JSON and poured over that for some time. I had no exposure to JSON prior to this project. So I naturally analyzed the JSON when the remote was introduced. The remote cards are not stored the same way other cards were. If I hit a regular button on the remote (play/pause, fast forward, etc) it wouldn’t show in the JSON that I could find. And these commands still worked while the microphone was muted. Even with the mic muted, the voice commands from the remote are processed.