Application-Specific Instruction Sets Processor with Implicit Registers to Improve Register Bandwidth
Application-Specific Instruction (ASI ) set Processors
(ASIP) have become an important design choice for embedded
systems due to runtime flexibility, which cannot be provided by
custom ASIC solutions. One major bottleneck in maximizing ASIP
performance is the limitation on the data bandwidth between the
General Purpose Register File (GPRF) and ASIs. This paper presents
the Implicit Registers (IRs) to provide the desirable data bandwidth.
An ASI Input/Output model is proposed to formulate the overheads of
the additional data transfer between the GPRF and IRs, therefore,
an IRs allocation algorithm is used to achieve the better performance
by minimizing the number of extra data transfer instructions. The
experiment results show an up to 3.33x speedup compared to the
results without using IRs.
[1] CoWare LISATek Tools. http://www.coware.com/.
[2] Tensilica. http://www.tensilica.com/.
[3] Altera Corp. http://www.altera.com/.
[4] MIPS CorExtend. http://www.mips.com/.
[5] IBM PowerPC. http://www.ibm.com/
[6] M. Jain et al., "ASIP Design Methodologies: Survey and Issues,"
Proceedings of the 14 International Conference on VLSI Design, 2001, pp.
3-7, Jan. 2001.
[7] D. Fischer, J. Teich, M.Thies, and R.Weper, "Efficient
architecture/compiler co-exploration for asips," in Proc. Int. Conf.
Compilers, Arch., Synth. Embedded Syst., 2002, pp.27-34.
[8] N. Clark, H. Zhong, and S. Mahlke, "Processor acceleration through
automated instruction set customization," in Proc. 36th Annu. Int. Symp.
Microarchitecture, Dec. 2003, pp. 129-140.
[9] P. Yu and T. Mitra, "Scalable custom instructions identification for
instruction set extensible processors," in Proc. Int. Conf. Compilers
Architectures Synthesis Embedded Syst., Sep. 2004, pp. 69-78.
[10] K. Atasu, L. Pozzi, and P. Ienne, "Automatic application-specific
instruction-set extensions under microarchitectural constraints," in Proc.
40th Des. Autom. Conf., Jun. 2003, pp. 256-261.
[11] L. Pozzi, K. Atasu, and P. Ienne, "Exact and approximate algorithms for
the extension of embedded processor instruction sets," IEEE Trans.
Comput.-Aided Des. Integr. Circuits Syst., vol. 25, no. 4, pp. 1209-1229,
Jul. 2006.
[12] P. Yu and T. Mitra, "Disjoint pattern enumeration for custom instruction
identification," in Proc. 17th Int. Conf. Field-Programmable Logic Appl.,
Aug. 2007, pp. 273-278.
[13] P. Bonzini and L. Pozzi, "Polynomial-time subgraph enumeration for
automated instruction set extension," in Proc. Des. Autom. Test Eur. Conf.
Exhibition, Apr. 2007, pp. 1331-1336.
[14] X. Chen, D. L. Maskell, and Y. Sun, "Fast identification of custom
instructions for extensible processors," IEEE Trans. Comput.-Aided Des.
Integr. Circuits Syst., vol. 26, no. 2, pp. 359-368, Feb. 2007.
[15] N.T. Clark, H. Zhong, S.A. Mahlke, "Automated custom instruction
generation for domain-specific processor acceleration," IEEE Transactions
on Computers, Vol. 54, Issue. 10, p1258-1270, Oct. 2005.
[16] P. Ienne, L. Pozzi, and M. Vuletic, "On the limits of processor
specialization by mapping dataflow sections on ad-hoc functional units,"
Comput. Sci. Dept., Swiss Federal Inst. Technol. Lausanne, Lausanne,
Switzerland, Tech. Rep. 01/376, 2001.
[17] F. Sun, S. Ravi, A. Raghunathan, and N. K. Jha, "Synthesis of custom
processors based on extensible platforms," in Proc. Int. Conf. Comput.-
Aided Des., 2002, pp. 256-261.
[18] J. Cong, G. Han, Z. Zhang, "Architecture and Compiler Optimizations for
Data Bandwidth Improvement in Configurable Processors," IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, Vol.
14, no. 9, pp. 986 - 997, 2006.
[19] Pozzi L. Pozzi and P. Ienne. Exploiting pipelining to relax register file
port constraints of instruction-set extensions. In CASES 2005, San
Francisco, CA, Sept. 2005.
[20] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R.
B. Brown, "MiBench: A free, commercially representative embedded
benchmark suite," Proc. IEEE 4th Ann. Workshop Workload
Characterization (WWC 01), Dec. 2001, pp. 3-14.
[21] MPEG Audio Decoder. http://www.underbit.com/products/mad/.
[1] CoWare LISATek Tools. http://www.coware.com/.
[2] Tensilica. http://www.tensilica.com/.
[3] Altera Corp. http://www.altera.com/.
[4] MIPS CorExtend. http://www.mips.com/.
[5] IBM PowerPC. http://www.ibm.com/
[6] M. Jain et al., "ASIP Design Methodologies: Survey and Issues,"
Proceedings of the 14 International Conference on VLSI Design, 2001, pp.
3-7, Jan. 2001.
[7] D. Fischer, J. Teich, M.Thies, and R.Weper, "Efficient
architecture/compiler co-exploration for asips," in Proc. Int. Conf.
Compilers, Arch., Synth. Embedded Syst., 2002, pp.27-34.
[8] N. Clark, H. Zhong, and S. Mahlke, "Processor acceleration through
automated instruction set customization," in Proc. 36th Annu. Int. Symp.
Microarchitecture, Dec. 2003, pp. 129-140.
[9] P. Yu and T. Mitra, "Scalable custom instructions identification for
instruction set extensible processors," in Proc. Int. Conf. Compilers
Architectures Synthesis Embedded Syst., Sep. 2004, pp. 69-78.
[10] K. Atasu, L. Pozzi, and P. Ienne, "Automatic application-specific
instruction-set extensions under microarchitectural constraints," in Proc.
40th Des. Autom. Conf., Jun. 2003, pp. 256-261.
[11] L. Pozzi, K. Atasu, and P. Ienne, "Exact and approximate algorithms for
the extension of embedded processor instruction sets," IEEE Trans.
Comput.-Aided Des. Integr. Circuits Syst., vol. 25, no. 4, pp. 1209-1229,
Jul. 2006.
[12] P. Yu and T. Mitra, "Disjoint pattern enumeration for custom instruction
identification," in Proc. 17th Int. Conf. Field-Programmable Logic Appl.,
Aug. 2007, pp. 273-278.
[13] P. Bonzini and L. Pozzi, "Polynomial-time subgraph enumeration for
automated instruction set extension," in Proc. Des. Autom. Test Eur. Conf.
Exhibition, Apr. 2007, pp. 1331-1336.
[14] X. Chen, D. L. Maskell, and Y. Sun, "Fast identification of custom
instructions for extensible processors," IEEE Trans. Comput.-Aided Des.
Integr. Circuits Syst., vol. 26, no. 2, pp. 359-368, Feb. 2007.
[15] N.T. Clark, H. Zhong, S.A. Mahlke, "Automated custom instruction
generation for domain-specific processor acceleration," IEEE Transactions
on Computers, Vol. 54, Issue. 10, p1258-1270, Oct. 2005.
[16] P. Ienne, L. Pozzi, and M. Vuletic, "On the limits of processor
specialization by mapping dataflow sections on ad-hoc functional units,"
Comput. Sci. Dept., Swiss Federal Inst. Technol. Lausanne, Lausanne,
Switzerland, Tech. Rep. 01/376, 2001.
[17] F. Sun, S. Ravi, A. Raghunathan, and N. K. Jha, "Synthesis of custom
processors based on extensible platforms," in Proc. Int. Conf. Comput.-
Aided Des., 2002, pp. 256-261.
[18] J. Cong, G. Han, Z. Zhang, "Architecture and Compiler Optimizations for
Data Bandwidth Improvement in Configurable Processors," IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, Vol.
14, no. 9, pp. 986 - 997, 2006.
[19] Pozzi L. Pozzi and P. Ienne. Exploiting pipelining to relax register file
port constraints of instruction-set extensions. In CASES 2005, San
Francisco, CA, Sept. 2005.
[20] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R.
B. Brown, "MiBench: A free, commercially representative embedded
benchmark suite," Proc. IEEE 4th Ann. Workshop Workload
Characterization (WWC 01), Dec. 2001, pp. 3-14.
[21] MPEG Audio Decoder. http://www.underbit.com/products/mad/.
@article{"International Journal of Information, Control and Computer Sciences:62530", author = "Ginhsuan Li and Chiuyun Hung and Desheng Chen and Yiwen Wang", title = "Application-Specific Instruction Sets Processor with Implicit Registers to Improve Register Bandwidth", abstract = "Application-Specific Instruction (ASI ) set Processors
(ASIP) have become an important design choice for embedded
systems due to runtime flexibility, which cannot be provided by
custom ASIC solutions. One major bottleneck in maximizing ASIP
performance is the limitation on the data bandwidth between the
General Purpose Register File (GPRF) and ASIs. This paper presents
the Implicit Registers (IRs) to provide the desirable data bandwidth.
An ASI Input/Output model is proposed to formulate the overheads of
the additional data transfer between the GPRF and IRs, therefore,
an IRs allocation algorithm is used to achieve the better performance
by minimizing the number of extra data transfer instructions. The
experiment results show an up to 3.33x speedup compared to the
results without using IRs.", keywords = "Application-Specific Instruction-set Processors, data
bandwidth, configurable processor, implicit register.", volume = "5", number = "5", pages = "510-5", }