Thursday, March 3, 2011

Device Encryption in Android 3.0

Transparent encryption of block devices in Android 3.0.

The Motorola Xoom and several other new tablets on the market run Android 3.0, Honeycomb, which is built on the 2.6.36 Linux kernel. Most, if not all, of these Android tablets feature an Nvidia Tegra 2 processor. The 2.6.36 Linux kernel on these Android 3.0 Tegra 2 tablets introduces transparent, whole-disk encryption to everyday users. This encryption is provided by the dm-crypt device-mapper target in the Linux kernel, which creates a virtual layer on top of an existing block device and uses the crypto APIs in the Linux kernel for encryption and decryption of the underlying block devices.

Whether you are typing commands via a shell over a serial port or using the email application to check your email, the reads and writes to the file system are performed in the same manner with no changes to the upper-level applications.

After pressing the power button on the back of the Xoom tablet, the device boots up, and the user is presented with the desktop environment, from which he or she can choose to play a game, check email, or read an e-book. By tapping on Settings and then Location & Security, one can choose to "Encrypt tablet" from the screen. The encryption process takes approximately one hour, and the user is presented with a few basic screens upon completion.

After the encryption process is complete, the tablet is powered down. Upon rebooting the tablet, the user is prompted to input a PIN code, which is used to unlock the device. After entering the correct PIN code, the tablet powers up as normal, and the user can proceed with standard activities such as checking email, reading e-books, etc.

The Linux 2.6.36 kernel supports the device mapper framework, which allows virtual layers to be mapped on top of block devices for doing things like striping and mirroring. The device-mapper also provides a convenient target called dm-crypt, which is a device-mapper crypto target. The dm-crypt target provides transparent encryption of block devices.

Before the encryption operation, the output of the mount command shows the device name and mount point, indicating the partition where the user's data is stored, and this is the partition that will be encrypted.

/dev/block/platform/sdhci-tegra.3/by-name/userdata on /data type ext4 (rw,nosuid,nodev,noatime,barrier=1,data=ordered)
A few mount options to take note of:  noatime, barriers and data=ordered

...And after the encryption operation
/dev/block/dm-0 /data ext4 rw,nosuid,nodev,noatime,barrier=1,data=ordered 0 0

dmsetup will give us more information. As you can see from the below command, a dm-crypto device mapper target called crypt, has been setup in the kernel.  The dm-crypt target provides transparent encryption and decryption of data on the block device using the crypto APIs in the Linux kernel.

# dmsetup targets
crypt            v1.7.0
striped          v1.3.0
linear           v1.1.0
error            v1.0.1
# dmsetup status
datadev: 0 61326304 crypt           v1.0.1
Albeit the details surrounding key storage (see kernel source), supported ciphers (cat /proc/crypto), and hardware acceleration (see kernel source), here are some rudimentary performance tests that I ran before and after encrypting /data.  For the interested reader, there are some kernel level details related to the Tegra 2 processor which one can discover by going through the source code for the Linux 2.6.36 Tegra 2 branch.

The initial results of the the basic tests look good. There is a dedicated kernel thread for handling IO.  The read latency appears to be related to the kernel IO thread since reads on flash based storage devices can usually be performed in near constant time.

Unencrypted (2 GB Write - 104857 2k blocks)
/data/local/tmp # time dd if=/dev/zero of=ofile bs=2k count=1048572

1048572+0 records in
1048572+0 records out
2147475456 bytes (2.0GB) copied, 255.912521 seconds, 8.0MB/s
real    4m 17.25s
user    0m 0.73s
sys     0m 24.55s
Unencrypted (2 GB Read - 104857 2k blocks)
/data/local/tmp # time dd of=/dev/null if=ofile bs=2k count=1048572

1048572+0 records in
1048572+0 records out
2147475456 bytes (2.0GB) copied, 101.749864 seconds, 20.1MB/s
real    1m 41.79s
user    0m 1.15s
sys     0m 17.62s
Encrypted (2 GB Write - 104857 2k blocks)
/data/local/tmp # time dd if=/dev/zero of=ofile bs=2k count=1048572

1048572+0 records in
1048572+0 records out
2147475456 bytes (2.0GB) copied, 260.219584 seconds, 7.9MB/s
real    4m 26.94s
user    0m 0.64s
sys     0m 24.12s
Encrypted (2 GB Read - 104857 2k blocks)
/data/local/tmp # time dd of=/dev/null if=ofile bs=2k count=1048572

1048572+0 records in
1048572+0 records out
2147475456 bytes (2.0GB) copied, 124.291204 seconds, 16.5MB/s
real    2m 4.31s
user    0m 0.47s
sys     0m 7.74s
 

Sunday, February 27, 2011

Androidified

I picked up a Motorola Xoom tablet on February 24th, the day of its release. To start with, the Xoom is equipped with a Dual-Core ARM Cortex A9 processor called the Nvidia Tegra 2. The Xoom runs Android 3.0, Honeycomb. There are no third party UI replacements or additions. The Xoom runs pure Android 3.0 and it is very fast.

One of the nice things that you can do is customize the keyboard. For a 10 inch tablet, it is convenient to be able to type with your left and right thumbs and Thumb Keyboard is available on the Android Marketplace just for this.

Thumb Keyboard greatly speeds up typing on the Xoom.

Next, let's build a statically linked Busybox binary so that you can do something useful with the device.
# wget http://busybox.net/downloads/busybox-1.18.3.tar.bz2
# wget http://www.busybox.net/downloads/busybox-1.18.3.tar.bz2.sign
# http://busybox.net/~vda/vda_pubkey.gpg
# gpg --check-sigs vda.linux
# gpg --verify busybox-1.18.3.tar.bz2.sign
# sha1sum busybox-1.18.3.tar.bz2
# md5sum busybox-1.18.3.tar.bz2
# bzip2 -cd busybox-1.18.3.tar.bz2 | tar xf -
# cd busybox-1.18.3

# make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- defconfig
# make CFLAGS=--static LDFLAGS=--static ARCH=arm CROSS_COMPILE=arm-linux-gnueabi-
# adb remount
# adb push busybox /system/xbin/busybox
# adb shell chmod 0755 /system/xbin/busybox
# adb shell
# cd /system/xbin
# ./busybox --install -s /system/xbin
# cd /mnt/sdcard
# echo "export PATH="export PATH=/system/xbin:/sbin:/vendor/bin:/system/sbin:/system/bin" > profile
# echo "ENV=/mnt/sdcard/profile /system/xbin/ash" > /system/bin/alsh
# chmod 0755 /system/bin/alsh
# alsh
# uname -ia
# Linux localhost 2.6.36.3-g2a65edc #1 SMP PREEMPT Mon Feb 7 15:24:33 PST 2011 armv7l GNU/Linux

Saturday, July 31, 2010

Agile Data modeling, de-normalization, and other important factors

This subject probably deserves an entire book, but in the essence of keeping this blog post short, I've tried to highlight some of the more important data modeling tasks related to conceptual, logical, and physical data model design. Please note, this is not meant to be exhaustive. Relational models, object-relational models, and object models are not always the right solution for the problem. There are many great solutions available in the NoSQL area, and there is nothing wrong with a hybrid solution.

Software applications that are compiled on specific machine architectures often store and retrieve information from database software systems. Web server platforms often interpret machine-independent script that stores and retrieves information from database software systems. Java bytecode is executed by a Java Virtual Machine, and data is stored and retrieved from database software systems.

Efficient storage and retrieval of information from database software systems are certainly dependent on underlying hardware and operating system software characteristics. However, efficient storage and retrieval of information is also dependent on the structure and efficient execution of query statements that are used to store and retrieve the information. In a purely academic world, data is modeled and normalized in 3NF, 4NF, and 5NF forms. However, higher-order normalized forms do not always promote the creation of optimal query statements. Therefore, certain parts of a data model are often denormalized to improve the read performance of the database software system.

At a high level, joins between multiple, large relations are generally expensive, so denormalization reduces this cost. Reducing the cost of something typically results in inconsistencies or redundancies in the data, so the developer(s) are left with the job of minimizing the amount of redundant data. A formal process typically alleviates this strain.

Defining the I/O characteristics of a database is an important precursor to data modeling. Often times, the I/O characteristics of the database are not defined before modeling the data. Nevertheless, there are a few basic tasks and questions that can help define the I/O characteristics of the database. What is the purpose of the database and what types of problem(s) does the database solve? What types of operations (read, write) will be performed on the database? How many users will the database support? Define user stories - short, succinct 1-3 sentence action statements. Define as many as needed from both the developer perspective and the user perspective. The user stories will help define the conceptual data model for the domain from which the logical data model can be created. A clear definition of the purpose of the system will aid in defining the read/write (I/O) characteristics of the database software system, thus forming a picture of the system. For instance, OLTP databases typically support short transactions, so it is important that write latency is minimized. The answers to the questions above will also help determine the type of hardware storage configuration needed. Another important question when determining the type of hardware configuration is the following. How much downtime can be afforded, and how much failover is needed?

At this point, the types of queries can be seen that will be run on the database with heavy consideration given to the I/O characteristics of the database and the types of joins and scans preferred for the query optimizer to ideally use.

Agile data modeling can be defined as continual iterative feedback with the ability to add entities and relationships, or slice off sections of a data model in various sizes, at any point in the project. This may seem difficult but can be accomplished. Whether the database architecture adheres to the relational model, object-relational model, or object model, the proper classification of entities and attributes, in conjunction with a balance of normalized and de-normalized groups of data, will allow the addition and subtraction of small and large chunks of relations from the data model throughout the development process; for a specific domain. Formal algorithms can be used to map entity relationship models to relations.

And how is this done? Continually modeling entities within the real world. Abstracting classes, re-factoring, keeping the hierarchy semi-flat, avoiding deep polymorphic behavior, and always embracing changes to the data model while never generalizing with a one size fits all approach.

It's important to note that data modeling is a crucial step in designing a database software system that efficiently stores and retrieves information. This process involves defining the I/O characteristics of the database, understanding the purpose of the system, and creating a conceptual data model for the domain. From there, a logical data model can be created, followed by a physical data model that considers the hardware storage configuration needed. Agile data modeling allows for continual iterative feedback, and the proper classification of entities and attributes, in conjunction with a balance of normalized and de-normalized groups of data, will allow for the addition and subtraction of small and large chunks of relations from the data model throughout the development process. Overall, data modeling is a critical aspect of creating an efficient and effective database software system.

Close to Optimal Data Modeling

Agile Data modeling, de-normalization, and other important factors

This subject probably deserves an entire book but in the essence of keeping this blog post short, I've tried to highlight some of the more important data modeling tasks related to conceptual, logical, and physical data model design. Please note, this is not meant to be exhaustive. Relational models, object relational models, and object models are not always the right solution for the problem. There are many great solutions available in the noSQL area and there is nothing wrong with a hybrid solution.

Software applications that are compiled on specific machine architectures often store and retrieve information from database software systems. Web server platforms often interpret machine independent script that stores and retrieves information from database software systems. Java byte code is executed by a Java Virtual Machine and data is stored and retrieved from database software systems.

Efficient storage and retrieval of information from database software systems is certainly dependent on underlying hardware and operating system software characteristics. However; efficient storage and retrieval of information is also dependent on the structure and efficient execution of query statements that are used to store and retrieve the information. In a purely academic world, data is modeled and normalized in 3NF, 4NF, and 5NF forms. However; higher order normalized forms do not always promote the creation of optimal query statements. Therefore, certain parts of a data model are often de-normalized to improve the read performance of the database software system. At a high level, joins between multiple, large relations are generally expensive so de-normalization reduces this cost. Rreducing the cost of something typically results in inconsistencies or redundancies in the data so the developer(s) is left with the job of minimizing the amount of redundant data. Formal process typically alleviates this strain.

Defining the I/O characteristics of a database is an important precursor to data modeling. Often times, the I/O characteristics of the database are not defined before modeling the data. Nevertheless, there are a few basic tasks and questions that can help define the I/O characteristics of the database. What is the purpose of the database and what types of problem(s) does the database solve? What types of operations (read, write) will you be performing on the database? How many users will the database support? Define user stories - short, succinct 1-3 sentence action statements. Define as many as you need from both the developer perspective and the user perspective. The user stories will help define the conceptual data model for the domain from which the logical data model can be created. A clear definition of the purpose of the system will aid in defining the read/write (I/O) characteristics of the database software system. Thus forming a picture of the system. For instance, OLTP databases typically support short transactions so it is important that write latency is minimized. The answers to the questions above will also help determine the type of hardware storage configuration needed. Another important question when determining the type of hardware configuration is the following. How much downtime can we afford and how much failover do we need?

At this point, we can begin to see the types of queries that will be run on the database with heavy consideration given to the I/O characteristics of the database and the types of joins and scans that we would like for the query optimizer to ideally use.

Agile data modeling - Continual, iterative feedback. The ability to add entities and relationships, or slice off sections of a data model in various sizes, at any point in the project. This may seem difficult but can be accomplished. Whether the database architecture adheres to the relational model, object-relational model, or object model, the proper classification of entities and attributes, in conjunction with a balance of normalized and de-normalized groups of data, will allow the addition and subtraction of small and large chunks of relations from the data model throughout the development process; for a specific domain.

Formal algorithms can be used to map entity relationship models to relations.

And how is this done? Continually modeling entities within the real world. Abstracting classes, re-factoring, keeping the hierarchy semi-flat, avoiding deep polymorphic behavior, and always embracing changes to the data model while never generalizing with a one size fits all approach.

Saturday, July 24, 2010

High Performance Database Development

Fast, High performance, database driven, application architectures that support true rapid development are little talked about. AOLServer, OpenACS, PostgreSQL, and TCL are such a combination.

I was introduced to AOLServer and OpenACS in 2002 by a close friend who I grew up with. We built a very robust application architecture over the course of the following 6 years.

PostgreSQL was sitting behind the AOLServer/OpenACS instance and AOLServer nicely managed the database connection pools. AOLServer is multi-threaded and internally handles TCL interpretation and execution. TCL makes heavy use of lists (implementation notes aside), so many LISP programmers have enjoyed working with it.

The OpenACS developer toolbar is a tool that provides developers with access to various debugging and profiling information about their OpenACS application. It is a feature that is built into the OpenACS application framework and can be enabled or disabled depending on the needs of the developer.

Once enabled, the developer toolbar is accessible via a web browser and provides a range of information about the current page, including SQL queries, response times, and server resources. This information can be used to optimize and debug an OpenACS application, allowing developers to quickly identify and fix performance issues.

The OpenACS developer toolbar is just one of the many tools available to developers using the AOLServer, OpenACS, PostgreSQL, and TCL stack. Together, these technologies provide a fast, high-performance, database-driven application architecture that supports rapid development.

Monday, July 19, 2010

Dtrace on FreeBSD 8

 Dtrace on FreeBSD 8 - new server with no load

# uname -msri
FreeBSD 8.0-RELEASE-p3 amd64 DEV_A64_KERNEL

From the OpenSolaris DTrace Toolkit

# /usr/local/share/DTTraceToolkit/syscallbypid.d
Tracing... Hit Ctrl-C to end.
  77913 perl                     sigprocmask                    48
  1267 svscan                   stat                           51
 45001 httpd                    read                           52
 26063 python2.4                recvfrom                       61
 26922 sshd                     ioctl                          69
 27104 more                     read                           69
 26888 postgres                 getppid                        78
 26888 postgres                 gettimeofday                   78
 26888 postgres                 select                         78
 26889 postgres                 getppid                        78
 26889 postgres                 select                         78
 45001 httpd                    poll                           81
 27102 postgres                 read                           82
 27103 postgres                 read                           82
 27105 postgres                 read                           82
 27101 dtrace                   clock_gettime                  97
 26063 python2.4                select                        102
 45001 httpd                    writev                        118
 44966 python2.4                read                          121
 26890 postgres                 read                          162
 26063 python2.4                read                          179
 26063 python2.4                _umtx_op                      186
 27104 more                     write                         205
 26063 python2.4                write                         208
 26891 postgres                 write                         300
 27101 dtrace                   ioctl                         301
 26922 sshd                     write                         304
 26922 sshd                     read                          306
 27054 psql                     poll                          378
 27054 psql                     recvfrom                      378
 27055 postgres                 sendto                        379
 26922 sshd                     select                        609
 26922 sshd                     sigprocmask                  1218
 27054 psql                     write                        3203

 

# /usr/local/share/DTTraceToolkit/rwbypid.d
78168  sshd                        R       39
 97062 squid                       R       58
 45164 httpd                       W       81
 97062 squid                       W       95
 97474 python2.4                   R       98
   555 pflogd                      R      132
 45164 httpd                       R      186

 

# dtrace -n 'syscall::open*:entry { printf("%s %s",execname,copyinstr(arg0)); }'

CPU     ID                    FUNCTION:NAME
  0  25202                       open:entry svscan .
  0  25202                       open:entry imapd ./cur
  0  25202                       open:entry imapd ./new
  0  25202                       open:entry imapd ./courierimapkeywords/:list
  0  25202                       open:entry imapd ./courierimapkeywords
  0  25202                       open:entry svscan .
  2  25202                       open:entry postgres pg_stat_tmp/pgstat.tmp
  7  25202                       open:entry postgres global/pg_database
  7  25202                       open:entry postgres pg_stat_tmp/pgstat.stat
  7  25202                       open:entry postgres pg_stat_tmp/pgstat.stat
  7  25202                       open:entry postgres pg_stat_tmp/pgstat.stat
  4  25202                       open:entry tinydns data.cdb
  5  25202                       open:entry tinydns data.cdb
  0  25202                       open:entry svscan .
  0  25202                       open:entry svscan .
  0  25202                       open:entry svscan .
Syscalls count by process

# dtrace -n 'syscall:::entry { @num[pid,execname] = count(); }'
 28820  qmailmrtg7                                                      158
    28825  qmailmrtg7                                                      158
    27504  postgres                                                        159
    28776  cron                                                            234
    28853  bash                                                            245
    28861  bash                                                            245
    28869  bash                                                            245
    28877  bash                                                            245
    28798  qmailmrtg7                                                      264
    28777  newsyslog                                                       293
    28772  dtrace                                                          322
    28778  sh                                                              327
    28281  httpd                                                           542
    28900  svn                                                             691
    28903  svn                                                             740
    28902  svn                                                             748
    28904  svn                                                             748
    28901  svn                                                             758
    28780  perl                                                           3023

 

# Files opened by process
dtrace -n 'syscall::open*:entry { printf("%s %s",execname,copyinstr(arg0)); }'
CPU     ID                    FUNCTION:NAME
  4  25202                       open:entry svscan .
  0  25202                       open:entry squid /usr/local/plone/<domain-name>-dev.com/var/squidstorage/08/0C
  4  25202                       open:entry svscan .
  4  25202                       open:entry svscan .
  4  25202                       open:entry svscan .
  0  25202                       open:entry imapd ./.Spam/tmp/1279591336.M11620P234573_courierlock.dev.<domain-name>.com
0 25202 open:entry imapd ./.Spam/tmp/1279591336.M11620P234573_courierlock.dev.<domain-name>.com
0 25202 open:entry imapd ./.Spam/tmp 0 25202 open:entry imapd ./.Spam/tmp/1279591336.M11867P28573_imapuid_15.dev.<domain-name>.com
0 25202 open:entry imapd ./.Spam/courierimapuiddb 0 25202 open:entry imapd ./.Spam/tmp/1279591336.M113467P28573_imapuid_15.dev.<domain-name>.com 0 25202 open:entry imapd ./.Spam/cur 0 25202 open:entry imapd ./.Spam/new 0 25202 open:entry imapd ./.Spam/courierimapkeywords/:list 0 25202 open:entry imapd ./.Spam/courierimapkeywords 0 25202 open:entry squid /usr/local/plone/abcdexcrfdsf-dev.com/var/squidstorage/09/0C 4 25202 open:entry svscan .

Thursday, May 27, 2010

bsdtalk186 - Soft Updates w/ Journaling & ULE Sched

FreeBSD Soft Updates with Journaling, ULE Scheduler

Very good short interview with Jeff Roberson, FreeBSD Committer.Discussion on the addition of journaling to soft updates and the development of the ULE scheduler.

BSD Talk Blog - Jeff Roberson (February 05, 2010)

You can also listen on your Android device. Just download Google Listen from the marketplace. After you setup Google Listen on your Android Device, search for bsdtalk or FreeBSD from within Google Listen.